Gaze, gestures, and motion controllers in DirectX

If you're going to build directly on top of the platform, you will have to handle input coming from the user - such as where the user is looking via gaze and what the user has selected with gestures or motion controllers. Combining these forms of input, you can enable a user to place a hologram in your app. The holographic app template has an easy to use example.

Gaze input

To access the user's gaze, you use the SpatialPointerPose type. The holographic app template includes basic code for understanding gaze. This code provides a vector pointing forward from between the user's eyes, taking into account the device's position and orientation in a given coordinate system.

void SpinningCubeRenderer::PositionHologram(SpatialPointerPose^ pointerPose)
    if (pointerPose != nullptr)
        // Get the gaze direction relative to the given coordinate system.
        const float3 headPosition    = pointerPose->Head->Position;
        const float3 headDirection   = pointerPose->Head->ForwardDirection;
        // The hologram is positioned two meters along the user's gaze direction.
        static const float distanceFromUser = 2.0f; // meters
        const float3 gazeAtTwoMeters        = headPosition + (distanceFromUser * headDirection);
        // This will be used as the translation component of the hologram's
        // model transform.

You may find yourself asking: "But where does the coordinate system come from?"

Let's answer that question. In our AppMain's Update function, we processed a spatial input event by acquiring it relative to the coordinate system for our StationaryFrameOfReference. Recall that the StationaryFrameOfReference was created when we set up the HolographicSpace, and the coordinate system was acquired at the start of Update.

// Check for new input state since the last frame.
SpatialInteractionSourceState^ pointerState = m_spatialInputHandler->CheckForInput();
if (pointerState != nullptr)
    // When a Pressed gesture is detected, the sample hologram will be repositioned
    // two meters in front of the user.

Note that the data is tied to a pointer state of some kind. We get this from a spatial input event. The event data object includes a coordinate system, so that you can always relate the gaze direction at the time of the event to whatever spatial coordinate system you need. In fact, you must do so in order to get the pointer pose.

Gesture and motion controller input

In Windows Mixed Reality, both hand gestures and motion controllers are handled through the same spatial input APIs, found in the Windows.UI.Input.Spatial namespace. This enables you to easily handle common actions like Select presses the same way across both hands and motion controllers.

There are two levels of API you can target when handling gestures or motion controllers in mixed reality:

Select/Menu/Grasp/Touchpad/Thumbstick interactions: SpatialInteractionManager

To detect low-level presses, releases and updates across hands and input devices in Windows Mixed Reality, you start from a SpatialInteractionManager. The SpatialInteractionManager has an event that informs the app when a hand or motion controller has detected input.

There are three key kinds of SpatialInteractionSource, each represented by a different SpatialInteractionSourceKind value:

  • Hand represents a user's detected hand. Hand sources are available only on HoloLens.
  • Controller represents a paired motion controller. Motion controllers can offer a variety of capabilities, for example: Select triggers, Menu buttons, Grasp buttons, touchpads and thumbsticks.
  • Voice represents the user's voice speaking system-detected keywords. This will inject a Select press and release whenever the user says "Select".

To detect presses across any of these interaction sources, you can handle the SourcePressed event on SpatialInteractionManager:


This pressed event is sent to your app asynchronously. Your app or game engine may want to perform some processing right away or you may want to queue up the event data in your input processing routine.

The template includes a helper class to get you started. This template forgoes any processing for simplicity of design. The helper class keeps track of whether one or more Pressed events occurred since the last Update call:


If so, it returns the SpatialInteractionSourceState for the most recent input event during the next Update:


Note that the code above will treat all presses the same way, whether the user is performing a primary Select press or a secondary Menu or Grasp press. The Select press is a primary form of interaction supported across hands, motion controllers and voice, triggered either by a hand performing an air-tap, a motion controller with its primary trigger/button pressed, or the user's voice saying "Select". Select presses represent the user's intention to activate the hologram they are targeting.

To reason about which kind of press is occurring, we will modify the SpatialInteractionManager::SourceUpdated event handler. Our code will detect Grasp presses (where available) and use them to reposition the cube. If Grasp is not available, we will use Select presses instead.

Add the following private member declarations to SpatialInputHandler.h:


Open SpatialInputHandler.cpp. Add the following event registration to the constructor:


This is the event handler code. If the input source is experiencing a Grasp, the pointer pose will be stored for the next update loop. Otherwise, it will check for a Select press instead.


Make sure to unregister the event handler in the destructor:


Recompile, and then redeploy. Your template project should now be able to recognize Grasp interactions to reposition the spinning cube.

The SpatialInteractionSource API supports controllers with a wide range of capabilities. In the example shown above, we check to see if Grasp is supported before trying to use it. The SpatialInteractionSource supports the following optional features beyond the common Select press:

  • Menu button: Check support by inspecting the IsMenuSupported property. When supported, check the SpatialInteractionSourceState::IsMenuPressed property to find out if the menu button is pressed.

For controllers, the SpatialInteractionSource has a Controller property with additional capabilities.

  • HasThumbstick: If true, the controller has a thumbstick. Inspect the ControllerProperties property of the SpatialInteractionSourceState to acquire the thumbstick x and y values (ThumbstickX and ThumbstickY), as well as its pressed state (IsThumbstickPressed).
  • HasTouchpad: If true, the controller has a touchpad. Inspect the ControllerProperties property of the SpatialInteractionSourceState to acquire the touchpad x and y values (TouchpadX and TouchpadY), and to know if the user is touching the pad (IsTouchpadTouched) and if they are pressing the touchpad down (IsTouchpadPressed).
  • SimpleHapticsController: The SimpleHapticsController API for the controller allows you to inspect the haptics capabilities of the controller, and it also allows you to control haptic feedback.

Note that the range for touchpad and thumbstick from -1 to 1 for both axes (from bottom to top, and from left to right). The range for the analog trigger, which is accessed using the SpatialInteractionSourceState::SelectPressedValue property, has a range of 0 to 1. A value of 1 correlates with IsSelectPressed being equal to true; any other value correlates with IsSelectPressed being equal to false.

Note that for both a hand and a controller, using SpatialInteractionSourceState::Properties::TryGetLocation will provide the user's hand position - this is distinct from the pointer pose representing the controller's pointing ray. If you want to draw something at the hand location, use TryGetLocation. If you want to raycast from the tip of the controller, get the pointing ray from TryGetInteractionSourcePose on the SpatialPointerPose.

You can also use the other events on SpatialInteractionManager, such as SourceDetected and SourceLost, to react when hands enter or leave the device's view or when they move in or out of the ready position (index finger raised with palm forward), or when motion controllers are turned on/off or are paired/unpaired.

Grip pose vs. pointing pose

Windows Mixed Reality supports motion controllers in a variety of form factors, with each controller's design differing in its relationship between the user's hand position and the natural "forward" direction that apps should use for pointing when rendering the controller.

To better represent these controllers, there are two kinds of poses you can investigate for each interaction source:

  • The grip pose, representing the location of either the palm of a hand detected by a HoloLens, or the palm holding a motion controller.
    • On immersive headsets, this pose is best used to render the user's hand or an object held in the user's hand, such as a sword or gun.
    • The grip position: The palm centroid when holding the controller naturally, adjusted left or right to center the position within the grip.
    • The grip orientation's Right axis: When you completely open your hand to form a flat 5-finger pose, the ray that is normal to your palm (forward from left palm, backward from right palm)
    • The grip orientation's Forward axis: When you close your hand partially (as if holding the controller), the ray that points "forward" through the tube formed by your non-thumb fingers.
    • The grip orientation's Up axis: The Up axis implied by the Right and Forward definitions.
    • You can access the grip pose through SpatialInteractionSourceState.Properties.TryGetLocation(...).
  • The pointer pose, representing the tip of the controller pointing forward.

Composite gestures: SpatialGestureRecognizer

A SpatialGestureRecognizer interprets user interactions from hands, motion controllers, and the "Select" voice command to surface spatial gesture events, which users target using their gaze.

Spatial gestures are a key form of input for Windows Mixed Reality apps. By routing interactions from the SpatialInteractionManager to a hologram's SpatialGestureRecognizer, apps can detect Tap, Hold, Manipulation, and Navigation events uniformly across hands, voice, and spatial input devices, without having to handle presses and releases manually.

SpatialGestureRecognizer performs only the minimal disambiguation between the set of gestures that you request. For example, if you request just Tap, the user may hold their finger down as long as they like and a Tap will still occur. If you request both Tap and Hold, after about a second of holding down their finger, the gesture will promote to a Hold and a Tap will no longer occur.

To use SpatialGestureRecognizer, handle the SpatialInteractionManager's InteractionDetected event and grab the SpatialPointerPose exposed there. Use the user's gaze ray from this pose to intersect with the holograms and surface meshes in the user's surroundings, in order to determine what the user is intending to interact with. Then, route the SpatialInteraction in the event arguments to the target hologram's SpatialGestureRecognizer, using its CaptureInteraction method. This starts interpreting that interaction according to the SpatialGestureSettings set on that recognizer at creation time - or by TrySetGestureSettings.

On HoloLens, interactions and gestures should generally derive their targeting from the user's gaze, rather than trying to render or interact at the hand's location directly. Once an interaction has started, relative motions of the hand may be used to control the gesture, as with the Manipulation or Navigation gesture.

See also