Gaze and gestures in DirectX

If you're going to build directly on top of the platform you will have to handle input coming from the user such as where the user is looking via gaze and what the user has selected with gestures. Combining these two forms of input, you can enable a user to place a hologram in your app. For an easy to use example, we'll take review the holographic app template.

Gaze input

To access the user's gaze, you use the SpatialPointerPose type. The holographic app template includes basic code for understanding gaze. This code provides a vector pointing forward from between the user's eyes, taking into account the device's position and orientation in a given coordinate system.

void SpinningCubeRenderer::PositionHologram(SpatialPointerPose^ pointerPose)
    if (pointerPose != nullptr)
        // Get the gaze direction relative to the given coordinate system.
        const float3 headPosition    = pointerPose->Head->Position;
        const float3 headDirection   = pointerPose->Head->ForwardDirection;
        // The hologram is positioned two meters along the user's gaze direction.
        static const float distanceFromUser = 2.0f; // meters
        const float3 gazeAtTwoMeters        = headPosition + (distanceFromUser * headDirection);
        // This will be used as the translation component of the hologram's
        // model transform.

You may find yourself asking: "But where does the coordinate system come from?"

Let's answer that question. In our AppMain's Update function, we processed a spatial input event by acquiring it relative to the coordinate system for our StationaryReferenceFrame. Recall that the StationaryReferenceFrame was created when we set up the HolographicSpace, and the coordinate system was acquired at the start of Update.

// Check for new input state since the last frame.
SpatialInteractionSourceState^ pointerState = m_spatialInputHandler->CheckForInput();
if (pointerState != nullptr)
    // When a Pressed gesture is detected, the sample hologram will be repositioned
    // two meters in front of the user.

Note that the data is tied to a pointer state of some kind. We get this from a spatial input event. The event data object includes a coordinate system, so that you can always relate the gaze direction at the time of the event to whatever spatial coordinate system you need. In fact, you must do so in order to get the pointer pose.

Gesture input

There are two levels of gestures that you can access on HoloLens:

Interactions: SpatialInteractionManager

To detect low-level presses, releases and updates across hand and clickers on Windows Holographic, you start from a SpatialInteractionManager. The SpatialInteractionManager has an event that informs the app when hand or clicker input is detected. Note that the "Select" voice command is injected here as a press and release.


This pressed event is sent to your app asynchronously. Your app or game engine may want to perform some processing right away or you may want to queue up the event data in your input processing routine.

The template includes a helper class to get you started. This template forgoes any processing for simplicity of design. The helper class keeps track of whether one or more Pressed events occurred since the last Update call:


If so, it returns the SpatialInteractionSourceState for the most recent input event during the next Update:


You can also use the other events on SpatialInteractionManager, such as SourceDetected and SourceLost to react when hands enter or leave the device's view or when they move in or out of the ready position (index finger raised with palm forward).

Gestures: SpatialGestureRecognizer

A SpatialGestureRecognizer interprets user interactions from hands, clickers, and the "Select" voice command to surface spatial gesture events, which users target using their gaze.

Spatial gestures are a key form of input for HoloLens. By routing interactions from the SpatialInteractionManager to a hologram's SpatialGestureRecognizer, apps can detect Tap, Hold, Manipulation, and Navigation events uniformly across hands, voice, and clickers.

SpatialGestureRecognizer performs only the minimal disambiguation between the set of gestures that you request. For example, if you request just Tap, the user may hold their finger down as long as they like and a Tap will still occur. If you request both Tap and Hold, after about a second of holding down their finger, the gesture will promote to a Hold and a Tap will no longer occur.

To use SpatialGestureRecognizer, handle the SpatialInteractionManager's InteractionDetected event and grab the SpatialPointerPose exposed there. Use the user's gaze ray from this pose to intersect with the holograms and surface meshes in the user's surroundings, in order to determine what the user is intending to interact with. Then, route the SpatialInteraction in the event arguments to the target hologram's SpatialGestureRecognizer, using its CaptureInteraction method. This starts interpreting that interaction according to the SpatialGestureSettings set on that recognizer at creation time or by TrySetGestureSettings.

On HoloLens, interactions and gestures should generally derive their targeting from the user's gaze, rather than trying to render or interact at the hand's location directly. Once an interaction has started, relative motions of the hand may be used to control the gesture, as with the Manipulation or Navigation gesture.