If you've ever played Marco Polo, or had someone call your phone to help you locate it, you are already familiar with the importance of spatial sound. We use sound cues in our daily lives to locate objects, get someone's attention, or get a better understanding of our environment. The more closely your app's sound behaves like it does in the real world, the more convincing and engaging your holograms will be. Spatial sound does four key things for holographic development:
A few best practices when using spatial sound:
Some general concepts to keep in mind when using spatial sound:
Spatial sound is a simulation. The most frequent use of spatial sound is making a sound seem as though it is emanating from a real or virtual object in the augmented world. Thus, spatialized sounds may make the most sense coming from such objects.
Note that the perceived accuracy of spatial sound means that a sound shouldn't necessarily emit from the center of an object, as the difference will be noticeable depending on the size of the object and distance from the user. With small objects, the center point of the object is usually sufficient. For larger objects, you may want a sound emitter or multiple emitters at the location on the object that is supposed to be producing the sound.
Normalize all sounds. Distance attenuation happens quickly within the first meter from the device user, as it does in the real world. All audio files should be normalized, and most sounds should be played at unity gain. The spatial audio engine will handle the attenuation necessary for a sound to "feel" like it's at a certain distance (we call this "distance cues"), and applying any attenuation on top of that could reduce the effect. Outside of simulating a real object, the initial distance decay of spatial sound sounds will likely be more than enough for a proper mix of your audio. If you feel like you need to be attenuating sounds, it's likely that the source is too close to the user, and that should be adjusted rather than the volume of the audio file or the emitter.
Spatial sound emitter movement. Because spatial sound is tied to the movement of the user's head, no sound emitter movement is needed for an accurate positional effect - the user's own head movement (even very slight) will provide the necessary cues for a sound's position.
If sound emitter motion is desired (i.e. a bird in flight), left/right movement is most effective for spatial sound, and should be incorporated into emitter motion whenever appropriate. For instance, if a sound moves from in front to behind the user, moving by on one side of the user will produce the best effect. Elevation changes are less obvious, so emitters should be close to eye level unless a simulated object is meant to be above or below the user.
Distance attenuation/dynamic compression. It can be tempting to reduce (or even nullify) the amount of distance attenuation for important sounds. However, distance attenuation is important for positionality, and the Min Gain should be kept to a low value (below -20). This is mostly because the "Min Gain" property for distance attenuation only applies to the direct path - all reflections will decay naturally regardless of this setting. This means that with no attenuation, your sound will become less reflective rather than more reflective, throwing the positional simulation off and making everything sound very close to the user's head. It is best to keep distance attenuation as natural as possible and add dynamic compression when needed for important sounds.
Sounds should be spatialized. On the HoloLens, as in the real world, experiences exist in 3D space. Sound emitting objects, including user interface elements, should be locatable using sound as well as sight. If a portion of an experience occurs outside of the user's view, for example music or a voice-over during a scene transition, there can be benefits to spatializing this audio as well. Using spatial sound on these objects provides a natural way for users to identify where their attention should be focused.
Object discovery and user interfaces. When using audio cues to direct the user's attention beyond their current view, the sound should be audible in the mix. For sounds and music that are associated with an element of the user interface (e.g. a menu), the sound emitter should be attached to that object. Stereo and other non-positional audio playing can make spatialized elements difficult for users to locate.
Use spatial sound over standard 3D sound as much as possible. On the HoloLens, for the best user experience, 3D audio should be acheived using spatial sound, not legacy 3DAudio technologies. In general, the improved spatialization is worth the small CPU cost over standard 3D sound.
Stream music, voice overs and long ambience tracks. To preserve system resources, longer sounds and sounds that don't always need to be loaded in memory for instant access should be streamed. Voice-overs are a great example, as they are often only played once, as in during a cut scene.