The following blog post, unless otherwise noted, was written by a member of Gamasutra’s community.
The thoughts and opinions expressed are those of the writer and not Gamasutra or its parent company.
Article originally published on ASoundEffect blog (source: https://www.asoundeffect.com/build-audio-vr-games)
Lots of my fellow game audio professionals are often asking me the question about the difference working with Virtual Reality compared to flat-screen games except for Binaural positioning. Well, my short answer is that to create an alternate reality that provides a natural listening experience in (gaming) Virtual Environments, our game audio creative approach needs to be adapted, and binaural positioning plays a major role into revisiting it. The long answer is below…
With the lack of full-sensory usage in Virtual Reality games (only Vision, Hearing, and Touch through haptics are integrated), the human brain has the power to adapt the focus or the perception of the senses that are integrated. Don’t get me wrong, you won’t become a super-hero with sharpen sense(s) by playing VR games, but the brain can rewire itself through training and learning, to enhance the use of integrated senses, with the goal to use information at its disposal to analyze and interact with the surroundings.
Your smell and taste are not turned off while playing VR games, so those two senses are providing contradictory information between what you see and hear from the game versus what you smell and possibly taste from your physical surrounding environment. Touch (and proprioceptors – sensory receptor which receives stimuli from within the body, especially one that responds to position and movement) would send mixed information between the virtual world and the real world.
The senses that are used need to be reproduced respectfully to immerse the player mentally in the virtual world, combining well-designed and controlled simultaneous visual, auditory, and haptic cues to create a believable Virtual Reality experience. The brain allows to react when subtle sensory signals that might not seems important on their own gets trigger simultaneously, and that’s the power of multisensory integration (MSI).
“A basic tenet of multisensory integration is the ability of one sensory modality to enhance or to suppress information from another sensory modality.” (Calvert et al., 2004)
A good example would be the “Virtual barber shop” demo released about 10 years ago on Youtube.
Most people experiencing this binaural audio demo for the first time, if mentally immersed (close your eyes while listening to not be distracted by your surroundings, and take a seat), would feel someone is literally behind them touching their scalp, talking to them, and cutting their hairs.
Human Evolution & Game Audio
The modern world we’re living in as well as the interactive medias audio language that have been developed for decades have been bringing modern sounds and audio cues to us, and we are constantly analyzing them without even necessarily being conscious of it. Our ears and brain can distinguish thousands of sounds at the same time, being highly informative, sounds are giving us the ability to analyze situations or events and react to it instinctively.
Sounds bring physical and spatial information about objects and environments that our brain can analyze, so we can learn about those sound sources and understand better a situation of objects and events.
In VR, hearing is the only sense able to provide full spatial information going beyond our field of view
Our auditory system provides a lot of information about the world surrounding us. In real life, vision, audition, and our sense of smell provide information that helps us to identify object, situations and navigate in our environment. But in VR, hearing is the only sense able to provide full spatial information going beyond our field of view, including elevation, 360 degrees and depth, allowing us to guide our decisions and behaviors as well as understanding our virtual surroundings.
In other terms, Audio in VR is the only medium able to make you turn your head, grabbing your attention and orientation to possible scripted events, or pre-defined directions and paths.
Binaural or Hard Panning Stereo?
To create a sense of presence in new medias (feeling of being physically present in the virtual scene), besides the usual narrative, tactical and strategic immersion, spatial immersion has been added to immersive design rules. Does it mean we need to position every single sound with HRTF processing then? Well, not necessarily.
In VR, you have the choice between using binaural positioning in place of stereo positioning (and we have the choice for any flat screen games as well now, which would likely open the door to new possibilities and discussion for first-person games).
Though less real, stereo positioning creates a stronger sense of directionality
Binaural positioning allows us to create a more realistic experience, as it accounts for real world physics in the calculations. But due to the long history of using only stereo positioning in games, players are accustomed to it, and using stereo positioning may offer certain advantages over binaural positioning. Though less real, stereo positioning creates a stronger sense of directionality, as the sound is hard panned, making it more obvious where the source is positioned, relative to the listener.
Depending on your game, you might want to use different positioning systems for different components.
If your purpose is to completely immerse the player in the world and create a sense of connection with it, you could use the binaural positioning to establish a sense of realness. For game-play feedbacks, you could possibly use hard-panning, as players are more familiar with it, to enable them to focus their attention on the gameplay, creating immersion with the gameplay rather than the environment. The stronger sense of directionality offered by the stereo positioning provides more obvious cues for the player, allowing them to receive stronger audio feedback.
The decision between using stereo positioning and/or binaural positioning should be a creative one, rather than a technical one, and depends on your game needs per component and situation. It’s all about experimentation and choices.
Preferably, your choices should be made prior starting the audio production of your game, as your positioning system(s) will affect your technical pipeline on the implementation front, as well as the creative approach when building assets and implementing them.
Creative Audio Language
Your creative audio approach when starting to work on a VR game is then affected by (at least) all of the above. You should ask yourself: When to use audio for multi-sensory integration purpose(s)? How human evolution (realistic approach) and game audio language (gameplay feedbacks and players expectation) impact your creative decision and technical choices and pipeline?
But then, what do you do with any extra-diegetic sources? Let’s take a concrete example with Music. From an end-user perspective, people need to be guided to understand what to think of a scene, what’s the context, what to feel, or what’s the setting. Those are some of the primarily Music purposes. It really depends of your game creative approach, so aim for what works best for your game. VR needs to be interactive, and Players actions need to feel meaningful in the game. Adaptive music needs to be think of carefully and creatively to build something that works both for the VR medium, and the game itself.
On the other hand, players have expectation of realism when playing VR game or experiences, and will notice anything done wrong on the soundscape, anything that could take them out of the experience. Contents production value, audio repetition, and the degree of direct or indirect interactivity to be built in Virtual Worlds can be extremely challenging.
You need to decide your intimate zone, the zone where you could interact with object, the zone where you need to add a lot of details in your sounds, both for positioning and dynamic layering matters.
Interactivity is a big topic of itself for audio in VR. You need to decide your intimate zone, the zone where you could interact with object, the zone where you need to add a lot of details in your sounds, both for positioning and dynamic layering matters. Audio can also be used as an input in VR, which once again reinforce the multi-sensory integration with haptic and visuals, and could possibly suggest taste or smell, if the MSI moment is totally mastered. Concrete example, let’s say you could grab a cigarette in the game, take the controller to your mouth, inhale and exhale would trigger respectively the subtle and visceral sound of a cigarette being burned (plus its visual feedback) and the visual cue of the smoke VFX along the sound of your character exhaling, or the direct sound of yourself (from the VR microphone) going through the acoustic rendering of the virtual environment.
To sum up
The main rule when doing audio for VR is that…there are no rules! Creative language is currently being developed by pioneers in the industry, both for the VR medium itself, and the type of games that works well for it.
Rome wasn’t built in a day, and as for all great things, VR and VR audio need time to reach its full potential. We are at the beginning of the VR in terms of technology and development, and there is no perfect recipe when working on VR audio.
Audio is not limited to what’s on screen anymore and should play an even bigger role than in flat screen games, serving all gameplay feedbacks, emotional states, engagement, storytelling, spatial immersion, multi-sensory integration, player’s position relative to the game action (and to some extend body balance).