Gestures and motion controllers in Unity

There are two key ways to take action on your gaze in Unity, hand gestures and motion controllers. You access the data for both sources of spatial input through the same APIs in Unity.

Unity provides two primary ways to access spatial input data for Windows Mixed Reality, the common Input.GetButton/Input.GetAxis APIs that work across multiple Unity VR SDKs, and an InteractionManager/GestureRecognizer API specific to Windows Mixed Reality that exposes the full set of spatial input data available.

This article has been updated for the final shipping Unity 2017.2 API shapes:

  • If you are using Unity 5.6, you will see an older version of these APIs under the UnityEngine.VR namespace rather than UnityEngine.XR. Beyond the namespace change, there are other minor breaking API changes between Unity 5.6 and Unity 2017.2 that Unity's script updater will fix for you when moving to 2017.2.
  • If you are using an earlier beta build of Unity 2017.2, you will see these APIs under UnityEngine.XR as expected, but you may see some differences from what is described below, as the initial 2017.2 beta builds contain an older version of the API shape.

Unity button/axis mapping table

The button and axis IDs in the table below are supported in Unity's Input Manager for Windows Mixed Reality motion controllers through the Input.GetButton/GetAxis APIs, while the "Windows MR-specific" column refers to properties available off of the InteractionSourceState type. Each of these APIs are described in detail in the sections below.

The button/axis ID mappings for Windows Mixed Reality generally match the Oculus button/axis IDs.

The button/axis ID mappings for Windows Mixed Reality differ from OpenVR's mappings in two ways:

  1. The mapping uses touchpad IDs that are distinct from thumbstick, to support controllers with both thumbsticks and touchpads.
  2. The mapping avoids overloading the A and X button IDs for the Menu buttons, to leave those available for physical ABXY buttons.
Input Common Unity APIs
Windows MR-specific Input API
Left hand Right hand
Select trigger pressed Axis 9 = 1.0 Axis 10 = 1.0 selectPressed
Select trigger analog value Axis 9 Axis 10 selectPressedAmount
Select trigger partially pressed Button 14 (gamepad compat) Button 15 (gamepad compat) selectPressedAmount > 0.0
Menu button pressed Button 6 * Button 7 * menuPressed
Grip button pressed Axis 11 = 1.0 (no analog values)
Button 4 (gamepad compat)
Axis 12 = 1.0 (no analog values)
Button 5 (gamepad compat)
Thumbstick X (left: -1.0, right: 1.0) Axis 1 Axis 4 thumbstickPosition.x
Thumbstick Y (top: -1.0, bottom: 1.0) Axis 2 Axis 5 thumbstickPosition.y
Thumbstick pressed Button 8 Button 9 thumbstickPressed
Touchpad X (left: -1.0, right: 1.0) Axis 17 * Axis 19 * touchpadPosition.x
Touchpad Y (top: -1.0, bottom: 1.0) Axis 18 * Axis 20 * touchpadPosition.y
Touchpad touched Button 18 * Button 19 * touchpadTouched
Touchpad pressed Button 16 * Button 17 * touchpadPressed
6DoF grip pose or pointer pose Grip pose only: XR.InputTracking.GetLocalPosition
Pass Grip or Pointer as an argument: sourceState.sourcePose.TryGetPosition
Tracking state Position accuracy and source loss risk only available through MR-specific API sourceState.sourcePose.positionAccuracy

* These button/axis IDs differ from the IDs that Unity uses for OpenVR due to collisions in the mappings used by gamepads, Oculus Touch and OpenVR.

Grip pose vs. pointing pose

Windows Mixed Reality supports motion controllers in a variety of form factors, with each controller's design differing in its relationship between the user's hand position and the natural "forward" direction that apps should use for pointing when rendering the controller.

To better represent these controllers, there are two kinds of poses you can investigate for each interaction source, the grip pose and the pointer pose. Both the grip pose and pointer pose coordinates are expressed by all Unity APIs in global Unity world coordinates.

Grip pose

The grip pose represents the location of either the palm of a hand detected by a HoloLens, or the palm holding a motion controller.

On immersive headsets, the grip pose is best used to render the user's hand or an object held in the user's hand, such as a sword or gun. The grip pose is also used when visualizing a motion controller, as the renderable model provided by Windows for a motion controller uses the grip pose as its origin and center of rotation.

The grip pose is defined specifically as follows:

  • The grip position: The palm centroid when holding the controller naturally, adjusted left or right to center the position within the grip. On the Windows Mixed Reality motion controller, this position generally aligns with the Grasp button.
  • The grip orientation's Right axis: When you completely open your hand to form a flat 5-finger pose, the ray that is normal to your palm (forward from left palm, backward from right palm)
  • The grip orientation's Forward axis: When you close your hand partially (as if holding the controller), the ray that points "forward" through the tube formed by your non-thumb fingers.
  • The grip orientation's Up axis: The Up axis implied by the Right and Forward definitions.

You can access the grip pose through either Unity's cross-vendor input API (XR.InputTracking.GetLocalPosition/Rotation) or through the Windows MR-specific API (sourceState.sourcePose.TryGetPosition/Rotation, requesting pose data for the Grip node).

Pointer pose

The pointer pose represents the tip of the controller pointing forward.

The system-provided pointer pose is best used to raycast when you are rendering the controller model itself. If you are rendering some other virtual object in place of the controller, such as a virtual gun, you should point with a ray that is most natural for that virtual object, such as a ray that travels along the barrel of the app-defined gun model. Because users can see the virtual object and not the physical controller, pointing with the virtual object will likely be more natural for those using your app.

Currently, the pointer pose is available in Unity only through the Windows MR-specific API, sourceState.sourcePose.TryGetPosition/Rotation, passing in InteractionSourceNode.Pointer as the argument.

Controller tracking state

Like the headsets, the Windows Mixed Reality motion controller requires no setup of external tracking sensors. Instead, the controllers are tracked by sensors in the headset itself.

If the user moves the controllers out of the headset's field of view, in most cases Windows will continue to infer controller positions and provide them to the app. When the controller has lost visual tracking for long enough, the controller's positions will drop to approximate-accuracy positions.

At this point, the system will body-lock the controller to the user, tracking the user's position as they move around, while still exposing the controller's true orientation using its internal orientation sensors. Many apps that use controllers to point at and activate UI elements can operate normally while in approximate accuracy without the user noticing.

The best way to get a feel for this is to try it yourself. Check out this video with examples of immersive content that works with motion controllers across various tracking states:

Reasoning about tracking state explicitly

Apps that wish to treat positions differently based on tracking state may go further and inspect properties on the controller's state, such as SourceLossRisk and PositionAccuracy:

Tracking state SourceLossRisk PositionAccuracy TryGetPosition
High accuracy < 1.0 High true
High accuracy (at risk of losing) == 1.0 High true
Approximate accuracy == 1.0 Approximate true
No position == 1.0 Approximate false

These motion controller tracking states are defined as follows:

  • High accuracy: While the motion controller is within the headset's field of view, it will generally provide high-accuracy positions, based on visual tracking. Note that a moving controller that momentarily leaves the field of view or is momentarily obscured from the headset sensors (e.g. by the user's other hand) will continue to return high-accuracy poses for a short time, based on inertial tracking of the controller itself.
  • High accuracy (at risk of losing): When the user moves the motion controller past the edge of the headset's field of view, the headset will soon be unable to visually track the controller's position. The app knows when the controller has reached this FOV boundary by seeing the SourceLossRisk reach 1.0. At that point, the app may choose to pause controller gestures that require a steady stream of very high-quality poses.
  • Approximate accuracy: When the controller has lost visual tracking for long enough, the controller's positions will drop to approximate-accuracy positions. At this point, the system will body-lock the controller to the user, tracking the user's position as they move around, while still exposing the controller's true orientation using its internal orientation sensors. Many apps that use controllers to point at and activate UI elements can operate as normal while in approximate accuracy without the user noticing. Apps with heavier input requirements may choose to sense this drop from High accuracy to Approximate accuracy by inspecting the PositionAccuracy property, for example to give the user a more generous hitbox on off-screen targets during this time.
  • No position: While the controller can operate at approximate accuracy for a long time, sometimes the system knows that even a body-locked position is not meaningful at the moment. For example, a controller that was just turned on may have never been observed visually, or a user may put down a controller that's then picked up by someone else. At those times, the system will not provide any position to the app, and TryGetPosition will return false.

Common Unity APIs (Input.GetButton/GetAxis)

Namespace: UnityEngine, UnityEngine.XR

Types: Input, XR.InputTracking

Unity currently uses its general Input.GetButton/Input.GetAxis APIs to expose input for the Oculus SDK, the OpenVR SDK and Windows Mixed Reality, including hands and motion controllers. If your app uses these APIs for input, it can easily support motion controllers across multiple VR SDKs, including Windows Mixed Reality.

Getting a logical button's pressed state

To use the general Unity input APIs, you'll typically start by wiring up buttons and axes to logical names in the Unity Input Manager, binding a button or axis IDs to each name. You can then write code that refers to that logical button/axis name:

if (Input.GetButton("Fire1"))
  // ...

Getting a physical button's pressed state directly

You can also access buttons manually by their fully-qualified name, using Input.GetKey:

if (Input.GetKey("joystick button 8"))
  // ...

Getting a hand or motion controller's pose

You can access the position and rotation of the controller, using XR.InputTracking:

Vector3 leftPosition = InputTracking.GetLocalPosition(XRNode.LeftHand);
Quaternion leftRotation = InputTracking.GetLocalRotation(XRNode.LeftHand);

Note that this represents the controller's grip pose (where the user holds the controller), which is useful for rendering a sword or gun in the user's hand, or a model of the controller itself.

Note that the relationship between this grip pose and the pointer pose (where the tip of the controller is pointing) may differ across controllers. At this moment, accessing the controller's pointer pose is only possible through the MR-specific input API, described in the sections below.

Windows-specific APIs (XR.WSA.Input)

Namespace: UnityEngine.XR.WSA.Input

Types: InteractionManager, InteractionSourceState, InteractionSource, InteractionSourceProperties, InteractionSourceKind, InteractionSourceLocation

To get at more detailed information about Windows Mixed Reality hand input (for HoloLens) and motion controllers, you can choose to use the Windows-specific spatial input APIs under the UnityEngine.XR.WSA.Input namespace. This lets you access additional information, such as position accuracy or the source kind, letting you tell hands and controllers apart.

How to poll for the state of hands and motion controllers

You can poll for this frame's state for each interaction source (hand or motion controller) using the GetCurrentReading method.

var interactionSourceStates = InteractionManager.GetCurrentReading();

Each InteractionSourceState you get back represents an interaction source at the current moment in time. The InteractionSourceState exposes info such as:

if (interactionSourceState.selectPressed) {
          // ...
  • Other data specific to motion controllers, such the touchpad and/or thumbstick's XY coordinates and touched state
if (interactionSourceState.touchpadTouched && interactionSourceState.touchpadPosition.x > 0.5) {
          // ...
  • The head pose at the moment in time when this gesture data was captured, which can be used to determine what the user was gazing at. This is especially useful for targeting a user's hand gestures, since there is some latency before hand poses are processed by the system and provided to the app.
  • The grip pose and pointing pose of the interaction source at that point in time
  • The InteractionSourceKind to know if the source is a hand or a motion controller

How to start handling an interaction event

If you prefer to handle events rather than poll each frame:

  • Register for a InteractionManager input event. For each type of interaction event that you are interested in, you need to subscribe to it.
InteractionManager.SourcePressed += InteractionManager_SourcePressed;
  • Handle the event. Once you have subscribed to an interaction event, you will get the callback when appropriate. In the SourcePressed example, this will be after the source was detected and before it is released or lost.
void InteractionManager_SourcePressed(InteractionSourceState state)
    // state has information about:
       // targeting head ray at the time when the event was triggered
       // whether the source is pressed or not
       // properties like position, velocity, source loss risk
       // source id (which hand id for example) and source kind like hand, voice, controller or other

How to stop handling an event

You need to stop handling an event when you are no longer interested in the event or you are destroying the object that has subscribed to the event. To stop handling the event, you unsubscribe from the event.

InteractionManager.SourcePressed -= InteractionManager_SourcePressed;

Input Source Change Events

These events describe when an input source is:

  • detected (becomes active)
  • lost (becomes inactive)
  • updates (moves or otherwise changes some state)
  • is pressed (tap, button press, or select uttered)
  • is released (end of a tap, button released, or end of select uttered)


using UnityEngine.XR.WSA.Input;

void Start ()
    InteractionManager.SourceDetected += InteractionManager_SourceDetected;
    InteractionManager.SourceUpdated += InteractionManager_SourceUpdated;
    InteractionManager.SourceLost += InteractionManager_SourceLost;
    InteractionManager.SourcePressed += InteractionManager_SourcePressed;
    InteractionManager.SourceReleased += InteractionManager_SourceReleased;

void OnDestroy()
    InteractionManager.SourceDetected -= InteractionManager_SourceDetected;
    InteractionManager.SourceUpdated -= InteractionManager_SourceUpdated;
    InteractionManager.SourceLost -= InteractionManager_SourceLost;
    InteractionManager.SourcePressed -= InteractionManager_SourcePressed;
    InteractionManager.SourceReleased -= InteractionManager_SourceReleased;

void InteractionManager_SourceDetected(InteractionSourceState state)
    // Source was detected
    // state has the current state of the source including id, position, kind, etc.

void InteractionManager_SourceLost(InteractionSourceState state)
    // Source was lost. This will be after a SourceDetected event and no other events for this source id will occur until it is Detected again
    // state has the current state of the source including id, position, kind, etc.

void InteractionManager_SourceUpdated(InteractionSourceState state)
    // Source was updated. The source would have been detected before this point
    // state has the current state of the source including id, position, kind, etc.

void InteractionManager_SourcePressed(InteractionSourceState state)
    // Source was pressed. This will be after the source was detected and before it is released or lost
    // state has the current state of the source including id, position, kind, etc.

void InteractionManager_SourceReleased(InteractionSourceState state)
    // Source was released. The source would have been detected and pressed before this point. This event will not fire if the source is lost
    // state has the current state of the source including id, position, kind, etc.

High-level composite gesture APIs (GestureRecognizer)

Namespace: UnityEngine.XR.WSA.Input

Types: GestureRecognizer, GestureSettings, InteractionSourceKind

Your app can also recognize higher-level composite gestures for spatial input sources, Tap, Hold, Manipulation and Navigation gestures. You can recognize these composite gestures across both hands and motion controllers using the GestureRecognizer.

Each Gesture event on the GestureRecognizer provides the SourceKind for the input as well as the targeting head ray at the time of the event. Some events provide additional context specific information.

There are only a few steps required to capture gestures using a Gesture Recognizer:

  1. Create a new Gesture Recognizer
  2. Specify which gestures to watch for
  3. Subscribe to events for those gestures
  4. Start capturing gestures

Create a new Gesture Recognizer

To use the GestureRecognizer, you must have created a GestureRecognizer:

GestureRecognizer recognizer = new GestureRecognizer();

Specify which gestures to watch for

Specify which gestures you are interested in via SetRecognizableGestures():

recognizer.SetRecognizableGestures(GestureSettings.Tap | GestureSettings.Hold);

Subscribe to events for those gestures

Subscribe to events for the gestures you are interested in.

recognizer.TappedEvent += MyTapEventHandler;
recognizer.HoldEvent += MyHoldEventHandler;

Note: Navigation and Manipulation gestures are mutually exclusive on an instance of a GestureRecognizer.

Start capturing gestures

By default, a GestureRecognizer does not monitor input until StartCapturingGestures() is called. It is possible that a gesture event may be generated after StopCapturingGestures() is called if input was performed before the frame where StopCapturingGestures() was processed. Because of this, it is reliable if you want to start and stop gesture monitoring depending on which object the player is currently gazing at.


Stop capturing gestures

To stop gesture recognition:


Removing a gesture recognizer

Remember to unsubscribe from subscribed events before destroying a GestureRecognizer object.

void OnDestroy()
    recognizer.TappedEvent -= MyTapEventHandler;
    recognizer.HoldEvent -= MyHoldEventHandler;

Rendering the motion controller model in Unity

Motion Controller model and teleportation If you're looking to add the motion controller models that Windows provides into your Unity project, see our samples in the Mixed Reality Toolkit.

In particular, compare against the MotionControllerTest scene.

The files you will need to include are:

IMPORTANT You will need the Windows Fall Creators Update SDK or you will not be able to build these scripts. Please install the 16299 UWP SDK.

Throwing objects

Throwing objects in virtual reality is a harder problem then it may at first seem. As with most physically based interactions, when throwing in game acts in an unexpected way, it is immediately obvious and breaks immersion. We have spent some time thinking deeply about how to represent a physically correct throwing behavior, and have come up with a few guidelines, enabled through updates to our platform, that we would like to share with you.

You can find an example of how we recommend to implement throwing here. This sample follows these four guidelines:

  • Use the controller’s velocity instead of position. In the November update to Windows, we introduced a change in behavior when in the ''Approximate'' positional tracking state. When in this state, velocity information about the controller will continue to be reported for as long as we believe it is high accuracy, which is often longer then position remains high accuracy.
  • Incorporate the angular velocity of the controller. This logic is all contained in the throwing.cs file in the GetThrownObjectVelAngVel static method, within the package linked above:
    1. As angular velocity is conserved, the thrown object must maintain the same angular velocity as it had at the moment of the throw:
      objectAngularVelocity = throwingControllerAngularVelocity;
    2. As the center of mass of the thrown object is likely not at the origin of the grip pose, it likely has a different velocity then that of the controller in the frame of reference of the user. The portion of the object’s velocity contributed in this way is the instantaneous tangential velocity of the center of mass of the thrown object around the controller origin. This tangential velocity is the cross product of the angular velocity of the controller with the vector representing the distance between the controller origin and the center of mass of the thrown object.<br \>Vector3 radialVec = thrownObjectCenterOfMass - throwingControllerPos;<br \>Vector3 tangentialVelocity = Vector3.Cross(throwingControllerAngularVelocity, radialVec);
    3. The total velocity of the thrown object is thus the sum of velocity of the controller and this tangential velocity:
      objectVelocity = throwingControllerVelocity + tangentialVelocity;
  • Pay close attention to the time at which we apply the velocity. When a button is pressed, it can take up to 20ms for that event to bubble up through Bluetooth to the operating system. This means that if you poll for a controller state changes from pressed to not pressed or vice versa, the controller pose information you get with it will actually be ahead of this change in state. Further, the controller pose presented by our polling API is forward predicted to reflect a likely pose at the time the frame will be displayed which could be more then 20ms in the future. This is good for rendering held objects, but compounds our time problem for calculating the trajectory for the moment the user threw the object. Fortunately, with the November update, when a Unity event like InteractionSourcePressed or InteractionSourceReleased is sent, the historical pose information from when that button was pressed is sent with it. This means that in order to get the pose that most closely reflects the state of the controller when the user signaled a throw, we must use the pose given in the event.
  • Use the grip pose. Angular velocity and velocity are reported relative to the grip pose, not pointer pose.

Throwing will continue to improve with future Windows updates, and you can expect to find more information on it here.

See also