Spatial sound in DirectX

Add spatial sound to your HoloLens apps based on DirectX by using the XAudio2 and xAPO audio libraries.

This topic uses sample code from the HolographicHRTFAudioSample.

Overview of Head Relative Spatial Sound

Spatial sound is implemented as an audio processing object (APO) that uses a head related transfer function (HRTF) filter to spatialize an ordinary audio stream.

Include these header files in pch.h to access the audio APIs:

  • XAudio2.h
  • xapo.h
  • hrtfapoapi.h

To set up spatial sound:

  1. Call CreateHrtfApo to initialize a new APO for HRTF audio.
  2. Assign the HRTF parameters and HRTF environment to define the acoustic characteristics of the spatial sound APO.
  3. Set up the XAudio2 engine for HRTF processing.
  4. Create an IXAudio2SourceVoice object and call Start.

Implementing HRTF and spatial sound in your DirectX app

You can achieve a variety of effects by configuring the HRTF APO with different parameters and environments. Use the following code to explore the possibilities. Download the Universal Windows Platform code sample from here: Spatial sound sample

Helper types are available in these files:

Add spatial sound for an omnidirectional source

Some holograms in the user's surroundings emit sound equally in all directions. The following code shows how to initialize an APO to emit omnidirectional sound. In this example, we apply this concept to the spinning cube from the Windows Holographic app template for Visual Studio 2015. For the complete code listing, see OmnidirectionalSound.cpp.

Window's spatial sound engine only supports 48k samplerate for playback. Most middleware programs, like Unity, will automatically convert sound files into the desired format, but if you start tinkering at lower levels in the audio system or making your own, this is very important to remember to prevent crashes or undesired behaviour like HRTF system failure.

First, we need to initialize the APO. In our holographic sample app, we choose to do this once we have the HolographicSpace.

From HolographicHrtfAudioSampleMain::SetHolographicSpace():

// Spatial sound
   auto hr = m_omnidirectionalSound.Initialize(L"assets//MonoSound.wav");

The implementation of Initialize, from OmnidirectionalSound.cpp:

// Initializes an APO that emits sound equally in all directions.
   HRESULT OmnidirectionalSound::Initialize( LPCWSTR filename )
   {
       // _audioFile is of type AudioFileReader, which is defined in AudioFileReader.cpp.
       auto hr = _audioFile.Initialize( filename );

       ComPtr<IXAPO> xapo;
       if ( SUCCEEDED( hr ) )
       {
           // Passing in nullptr as the first arg for HrtfApoInit initializes the APO with defaults of
           // omnidirectional sound with natural distance decay behavior.
           // CreateHrtfApo fails with E_NOTIMPL on unsupported platforms.
           hr = CreateHrtfApo( nullptr, &xapo );
       }

       if ( SUCCEEDED( hr ) )
       {
           // _hrtfParams is of type ComPtr<IXAPOHrtfParameters>.
           hr = xapo.As( &_hrtfParams );
       }

       // Set the default environment.
       if ( SUCCEEDED( hr ) )
       {
           hr = _hrtfParams->SetEnvironment( HrtfEnvironment::Outdoors );
       }

       // Initialize an XAudio2 graph that hosts the HRTF xAPO.
       // The source voice is used to submit audio data and control playback.
       if ( SUCCEEDED( hr ) )
       {
           hr = SetupXAudio2( _audioFile.GetFormat(), xapo.Get(), &_xaudio2, &_sourceVoice );
       }

       // Submit audio data to the source voice.
       if ( SUCCEEDED( hr ) )
       {
           XAUDIO2_BUFFER buffer{ };
           buffer.AudioBytes = static_cast<UINT32>( _audioFile.GetSize() );
           buffer.pAudioData = _audioFile.GetData();
           buffer.LoopCount = XAUDIO2_LOOP_INFINITE;

           // _sourceVoice is of type IXAudio2SourceVoice*.
           hr = _sourceVoice->SubmitSourceBuffer( &buffer );
       }

       return hr;
   }

After the APO is configured for HRTF, you call Start on the source voice to play the audio. In our sample app, we choose to put it on a loop so that you can continue to hear the sound coming from the cube.

From HolographicHrtfAudioSampleMain::SetHolographicSpace():

if (SUCCEEDED(hr))
   {
       m_omnidirectionalSound.SetEnvironment(HrtfEnvironment::Small);
       m_omnidirectionalSound.OnUpdate(m_spinningCubeRenderer->GetPosition());
       m_omnidirectionalSound.Start();
   }

From OmnidirectionalSound.cpp:

HRESULT OmnidirectionalSound::Start()
   {
       _lastTick = GetTickCount64();
       return _sourceVoice->Start();
   }

Now, whenever we update the frame, we need to update the hologram's position relative to the device itself. This is because HRTF positions are always expressed relative to the user's head, including the head position and orientation.

To do this in a HolographicSpace, we need to construct a transform matrix from our SpatialStationaryFrameOfReference coordinate system to a coordinate system that is fixed to the device itself.

From HolographicHrtfAudioSampleMain::Update():

m_spinningCubeRenderer->Update(m_timer);

   SpatialPointerPose^ currentPose = SpatialPointerPose::TryGetAtTimestamp(currentCoordinateSystem, prediction->Timestamp);
   if (currentPose != nullptr)
   {
       // Use a coordinate system built from a pointer pose.
       SpatialPointerPose^ pose = SpatialPointerPose::TryGetAtTimestamp(currentCoordinateSystem, prediction->Timestamp);
       if (pose != nullptr)
       {
           float3 headPosition = pose->Head->Position;
           float3 headUp = pose->Head->UpDirection;
           float3 headDirection = pose->Head->ForwardDirection;

           // To construct a rotation matrix, we need three vectors that are mutually orthogonal.
           // The first vector is the gaze vector.
           float3 negativeZAxis = normalize(headDirection);

           // The second vector should end up pointing away from the horizontal plane of the device.
           // We first guess by using the head "up" direction.
           float3 positiveYAxisGuess = normalize(headUp);

           // The third vector completes the set by being orthogonal to the other two.
           float3 positiveXAxis = normalize(cross(negativeZAxis, positiveYAxisGuess));

           // Now, we can correct our "up" vector guess by redetermining orthogonality.
           float3 positiveYAxis = normalize(cross(negativeZAxis, positiveXAxis));

           // The rotation matrix is formed as a standard basis rotation.
           float4x4 rotationTransform =
               {
               positiveXAxis.x, positiveYAxis.x, negativeZAxis.x, 0.f,
               positiveXAxis.y, positiveYAxis.y, negativeZAxis.y, 0.f,
               positiveXAxis.z, positiveYAxis.z, negativeZAxis.z, 0.f,
               0.f, 0.f, 0.f, 1.f,
               };

           // The translate transform can be constructed using the Windows::Foundation::Numerics API.
           float4x4 translationTransform = make_float4x4_translation(-headPosition);

           // Now, we have a basis transform from our spatial coordinate system to a device-relative
           // coordinate system.
           float4x4 coordinateSystemTransform = translationTransform * rotationTransform;

           // Reinterpret the cube position in the device's coordinate system.
           float3 cubeRelativeToHead = transform(m_spinningCubeRenderer->GetPosition(), coordinateSystemTransform);

           // Note that at (0, 0, 0) exactly, the HRTF audio will simply pass through audio. We can use a minimal offset
           // to simulate a zero distance when the hologram position vector is exactly at the device origin in order to
           // allow HRTF to continue functioning in this edge case.
           float distanceFromHologramToHead = length(cubeRelativeToHead);
           static const float distanceMin = 0.00001f;
           if (distanceFromHologramToHead < distanceMin)
           {
               cubeRelativeToHead = float3(0.f, distanceMin, 0.f);
           }

           // Position the spatial sound source on the hologram.
           m_omnidirectionalSound.OnUpdate(cubeRelativeToHead);

           // For debugging, it can be interesting to observe the distance in the debugger.
           /*
           std::wstring distanceString = L"Distance from hologram to head: ";
           distanceString += std::to_wstring(distanceFromHologramToHead);
           distanceString += L"\n";
           OutputDebugStringW(distanceString.c_str());
           */
       }
   }

The HRTF position is applied directly to the sound APO by the OmnidirectionalSound helper class.

From OmnidirectionalSound::OnUpdate:

HRESULT OmnidirectionalSound::OnUpdate(_In_ Numerics::float3 position)
   {
       auto hrtfPosition = HrtfPosition{ position.x, position.y, position.z };
       return _hrtfParams->SetSourcePosition(&hrtfPosition);
   }

That's it! Continue reading to learn more about what you can do with HRTF audio and Windows Holographic.

Initialize spatial sound for a directional source

Some holograms in the user's surroundings emit sound mostly in one direction. This sound pattern is named cardioid because it looks like a cartoon heart. The following code shows how to initialize an APO to emit directional sound. For the complete code listing, see CardioidSound.cpp .

After the APO is configured for HRTF, call Start on the source voice to play the audio.

// Initializes an APO that emits directional sound.
   HRESULT CardioidSound::Initialize( LPCWSTR filename )
   {
       // _audioFile is of type AudioFileReader, which is defined in AudioFileReader.cpp.
       auto hr = _audioFile.Initialize( filename );
       if ( SUCCEEDED( hr ) )
       {
           // Initialize with "Scaling" fully directional and "Order" with broad radiation pattern.
           // As the order goes higher, the cardioid directivity region becomes narrower.
           // Any direct path signal outside of the directivity region will be attenuated based on the scaling factor.
           // For example, if scaling is set to 1 (fully directional) the direct path signal outside of the directivity
           // region will be fully attenuated and only the reflections from the environment will be audible.
           hr = ConfigureApo( 1.0f, 4.0f );
       }
       return hr;
   }

   HRESULT CardioidSound::ConfigureApo( float scaling, float order )
   {
       // Cardioid directivity configuration:
       // Directivity is specified at xAPO instance initialization and can't be changed per frame.
       // To change directivity, stop audio processing and reinitialize another APO instance with the new directivity.
       HrtfDirectivityCardioid cardioid;
       cardioid.directivity.type = HrtfDirectivityType::Cardioid;
       cardioid.directivity.scaling = scaling;
       cardioid.order = order;

       // APO intialization
       HrtfApoInit apoInit;
       apoInit.directivity = &cardioid.directivity;
       apoInit.distanceDecay = nullptr; // nullptr specifies natural distance decay behavior (simulates real world)

       // CreateHrtfApo fails with E_NOTIMPL on unsupported platforms.
       ComPtr<IXAPO> xapo;
       auto hr = CreateHrtfApo( &apoInit, &xapo );

       if ( SUCCEEDED( hr ) )
       {
           hr = xapo.As( &_hrtfParams );
       }

       // Set the initial environment.
       // Environment settings configure the "distance cues" used to compute the early and late reverberations.
       if ( SUCCEEDED( hr ) )
       {
           hr = _hrtfParams->SetEnvironment( _HrtfEnvironment::Outdoors );
       }

       // Initialize an XAudio2 graph that hosts the HRTF xAPO.
       // The source voice is used to submit audio data and control playback.
       if ( SUCCEEDED( hr ) )
       {
           hr = SetupXAudio2( _audioFile.GetFormat(), xapo.Get(), &_xaudio2, &_sourceVoice );
       }

       // Submit audio data to the source voice
       if ( SUCCEEDED( hr ) )
       {
           XAUDIO2_BUFFER buffer{ };
           buffer.AudioBytes = static_cast<UINT32>( _audioFile.GetSize() );
           buffer.pAudioData = _audioFile.GetData();
           buffer.LoopCount = XAUDIO2_LOOP_INFINITE;
           hr = _sourceVoice->SubmitSourceBuffer( &buffer );
       }

       return hr;
   }

Implement custom decay

You can override the rate at which a spatial sound falls off with distance and/or at what distance it cuts off completely. To implement custom decay behavior on a spatial sound, populate an HrtfDistanceDecay struct and assign it to the distanceDecay field in an HrtfApoInit struct before passing it to the CreateHrtfApo function.

Add the following code to the Initialize method shown previously to specify custom decay behavior. For the complete code listing, see CustomDecay.cpp.

HRESULT CustomDecaySound::Initialize( LPCWSTR filename )
   {
       auto hr = _audioFile.Initialize( filename );

       ComPtr<IXAPO> xapo;
       if ( SUCCEEDED( hr ) )
       {
           HrtfDistanceDecay customDecay;
           customDecay.type = HrtfDistanceDecayType::CustomDecay;               // Custom decay behavior, we'll pass in the gain value on every frame.
           customDecay.maxGain = 0;                                             // 0dB max gain
           customDecay.minGain = -96.0f;                                        // -96dB min gain
           customDecay.unityGainDistance = HRTF_DEFAULT_UNITY_GAIN_DISTANCE;    // Default unity gain distance
           customDecay.cutoffDistance = HRTF_DEFAULT_CUTOFF_DISTANCE;           // Default cutoff distance

           // Setting the directivity to nullptr specifies omnidirectional sound.
           HrtfApoInit init;
           init.directivity = nullptr;
           init.distanceDecay = &customDecay;

           // CreateHrtfApo will fail with E_NOTIMPL on unsupported platforms.
           hr = CreateHrtfApo( &init, &xapo );
       }
   ...
   }