Gestures and motion controllers in Unity

There are two key ways to take action on your gaze in Unity, hand gestures and motion controllers. You access the data for both sources of spatial input through the same APIs in Unity.

Unity provides two primary ways to access spatial input data for Windows Mixed Reality, the common Input.GetButton/Input.GetAxis APIs that work across multiple Unity VR SDKs, and an InteractionManager/GestureRecognizer API specific to Windows Mixed Reality that exposes the full set of spatial input data available.

This article has been updated for the final shape of the Unity APIs expected in Unity 2017.2:

  • If you are using Unity 5.6, you will see an older version of these APIs under the UnityEngine.VR namespace rather than UnityEngine.XR. Beyond the namespace change, there are other minor breaking API changes between Unity 5.6 and Unity 2017.2 that Unity's script updater will fix for you when moving to 2017.2.
  • If you are using a beta build of Unity 2017.2, you will see these APIs under UnityEngine.XR as expected, but you may see some differences from what is described below, as the initial 2017.2 beta builds contain an older version of the API shape.

Unity button/axis mapping table

The button and axis IDs in the table below are supported in Unity's Input Manager for Windows Mixed Reality motion controllers through the Input.GetButton/GetAxis APIs, while the "Windows MR-specific" column refers to properties available off of the InteractionSourceState type. Each of these APIs are described in detail in the sections below.

The button/axis ID mappings for Windows Mixed Reality generally match the Oculus button/axis IDs.

The button/axis ID mappings for Windows Mixed Reality differ from OpenVR's mappings in two ways:

  1. The mapping uses touchpad IDs that are distinct from thumbstick, to support controllers with both thumbsticks and touchpads.
  2. The mapping avoids overloading the A and X button IDs for the Menu buttons, to leave those available for physical ABXY buttons.
Input Common Unity Input API Windows MR-specific Input API
Left hand Right hand
Select trigger pressed Axis 9 = 1.0 Axis 10 = 1.0 selectPressed
Select trigger analog value Axis 9 Axis 10 selectPressedAmount
Select trigger partially pressed Button 14 (gamepad compat) Button 15 (gamepad compat) selectPressedAmount > 0.0
Menu button pressed Button 6 * Button 7 * menuPressed
Grip button pressed Axis 11 = 1.0 (no analog values)
Button 4 (gamepad compat)
Axis 12 = 1.0 (no analog values)
Button 5 (gamepad compat)
grasped
Thumbstick X Axis 1 Axis 4 thumbstickPosition.x
Thumbstick Y Axis 2 Axis 5 thumbstickPosition.y
Thumbstick pressed Button 8 Button 9 thumbstickPressed
Touchpad X Axis 17 * Axis 20 * touchpadPosition.x
Touchpad Y Axis 18 * Axis 21 * touchpadPosition.y
Touchpad touched Button 18 * Button 19 * touchpadTouched
Touchpad pressed Button 16 * Button 17 * touchpadPressed
6DoF hand/pointing pose
XR.InputTracking.GetLocalPosition
XR.InputTracking.GetLocalRotation
Pointing poses not available in Unity's common input API yet
sourcePose.TryGetPosition **
sourcePose.TryGetRotation **
sourcePose.TryGetRay **

* These button/axis IDs differ from the IDs that Unity uses for OpenVR due to collisions in the mappings used by gamepads, Oculus Touch and OpenVR.

** These Windows-specific input APIs show the expected API naming at the final Unity 2017.2 release, after pending changes coming in future beta builds.

Grip pose vs. pointing pose

Windows Mixed Reality supports motion controllers in a variety of form factors, with each controller's design differing in its relationship between the user's hand position and the natural "forward" direction that apps should use for pointing when rendering the controller.

To better represent these controllers, there are two kinds of poses you can investigate for each interaction source:

  • The grip pose, representing the location of either the palm of a hand detected by a HoloLens, or the palm holding a motion controller.
    • On immersive headsets, this pose is best used to render the user's hand or an object held in the user's hand, such as a sword or gun.
    • The grip position: The palm centroid when holding the controller naturally, adjusted left or right to center the position within the grip.
    • The grip orientation's Right axis: When you completely open your hand to form a flat 5-finger pose, the ray that is normal to your palm (forward from left palm, backward from right palm)
    • The grip orientation's Forward axis: When you close your hand partially (as if holding the controller), the ray that points "forward" through the tube formed by your non-thumb fingers.
    • The grip orientation's Up axis: The Up axis implied by the Right and Forward definitions.
    • You can access the grip pose through either Unity's cross-vendor input API (XR.InputTracking.GetLocalPosition/Rotation) or through the Windows MR-specific API (sourceState.sourcePose.TryGetPosition/Rotation, requesting pose data for the Grip node).
  • The pointer pose, representing the tip of the controller pointing forward.
    • This pose is best used to raycast when pointing at UI when you are rendering the controller model itself.
    • Currently, the pointer pose is available only through the Windows MR-specific API (sourceState.sourcePose.TryGetPosition/Rotation, requesting pose data for the Pointer node).

These pose coordinates are all expressed in Unity world coordinates.

Common Unity APIs (Input.GetButton/GetAxis)

Namespace: UnityEngine, UnityEngine.XR

Types: Input, XR.InputTracking

Unity currently uses its general Input.GetButton/Input.GetAxis APIs to expose input for the Oculus SDK, the OpenVR SDK and Windows Mixed Reality, including hands and motion controllers. If your app uses these APIs for input, it can easily support motion controllers across multiple VR SDKs, including Windows Mixed Reality.

Getting a logical button's pressed state

To use the general Unity input APIs, you'll typically start by wiring up buttons and axes to logical names in the Unity Input Manager, binding a button or axis IDs to each name. You can then write code that refers to that logical button/axis name:

if (Input.GetButton("Fire1"))
{
  // ...
}

Getting a physical button's pressed state directly

You can also access buttons manually by their fully-qualified name, using Input.GetKey:

if (Input.GetKey("joystick button 8"))
{
  // ...
}

Getting a hand or motion controller's pose

You can access the position and rotation of the controller, using XR.InputTracking:

Vector3 leftPosition = InputTracking.GetLocalPosition(XRNode.LeftHand);
Quaternion leftRotation = InputTracking.GetLocalRotation(XRNode.LeftHand);

Note that this represents the controller's grip pose (where the user holds the controller), which is useful for rendering a sword or gun in the user's hand, or a model of the controller itself.

Note that the relationship between this grip pose and the pointer pose (where the tip of the controller is pointing) may differ across controllers. At this moment, accessing the controller's pointer pose is only possible through the MR-specific input API, described in the sections below.

Windows-specific APIs (XR.WSA.Input)

Namespace: UnityEngine.XR.WSA.Input

Types: InteractionManager, InteractionSourceState, InteractionSource, InteractionSourceProperties, InteractionSourceKind, InteractionSourceLocation

To get at more detailed information about Windows Mixed Reality hand input (for HoloLens) and motion controllers, you can choose to use the Windows-specific spatial input APIs under the UnityEngine.XR.WSA.Input namespace. This lets you access additional information, such as position accuracy or the source kind, letting you tell hands and controllers apart.

How to poll for the state of hands and motion controllers

You can poll for this frame's state for each interaction source (hand or motion controller) using the GetCurrentReading method.

var interactionSourceStates = InteractionManager.GetCurrentReading();

Each InteractionSourceState you get back represents an interaction source at the current moment in time. The InteractionSourceState exposes info such as:

if (interactionSourceState.selectPressed) {
          // ...
      }
  • Other data specific to motion controllers, such the touchpad and/or thumbstick's XY coordinates and touched state
if (interactionSourceState.touchpadTouched && interactionSourceState.touchpadPosition.x > 0.5) {
          // ...
      }
  • The head pose at the moment in time when this gesture data was captured, which can be used to determine what the user was gazing at. This is especially useful for targeting a user's hand gestures, since there is some latency before hand poses are processed by the system and provided to the app.
  • The grip pose and pointing pose of the interaction source at that point in time
  • The InteractionSourceKind to know if the source is a hand or a motion controller

How to start handling an interaction event

If you prefer to handle events rather than poll each frame:

  • Register for a InteractionManager input event. For each type of interaction event that you are interested in, you need to subscribe to it.
InteractionManager.SourcePressed += InteractionManager_SourcePressed;
  • Handle the event. Once you have subscribed to an interaction event, you will get the callback when appropriate. In the SourcePressed example, this will be after the source was detected and before it is released or lost.
void InteractionManager_SourcePressed(InteractionSourceState state)
{
    // state has information about:
       // targeting head ray at the time when the event was triggered
       // whether the source is pressed or not
       // properties like position, velocity, source loss risk
       // source id (which hand id for example) and source kind like hand, voice, controller or other
}

How to stop handling an event

You need to stop handling an event when you are no longer interested in the event or you are destroying the object that has subscribed to the event. To stop handling the event, you unsubscribe from the event.

InteractionManager.SourcePressed -= InteractionManager_SourcePressed;

Input Source Change Events

These events describe when an input source is:

  • detected (becomes active)
  • lost (becomes inactive)
  • updates (moves or otherwise changes some state)
  • is pressed (tap, button press, or select uttered)
  • is released (end of a tap, button released, or end of select uttered)

Example

using UnityEngine.XR.WSA.Input;

void Start ()
{
    InteractionManager.SourceDetected += InteractionManager_SourceDetected;
    InteractionManager.SourceUpdated += InteractionManager_SourceUpdated;
    InteractionManager.SourceLost += InteractionManager_SourceLost;
    InteractionManager.SourcePressed += InteractionManager_SourcePressed;
    InteractionManager.SourceReleased += InteractionManager_SourceReleased;
}

void OnDestroy()
{
    InteractionManager.SourceDetected -= InteractionManager_SourceDetected;
    InteractionManager.SourceUpdated -= InteractionManager_SourceUpdated;
    InteractionManager.SourceLost -= InteractionManager_SourceLost;
    InteractionManager.SourcePressed -= InteractionManager_SourcePressed;
    InteractionManager.SourceReleased -= InteractionManager_SourceReleased;
}

void InteractionManager_SourceDetected(InteractionSourceState state)
{
    // Source was detected
    // state has the current state of the source including id, position, kind, etc.
}

void InteractionManager_SourceLost(InteractionSourceState state)
{
    // Source was lost. This will be after a SourceDetected event and no other events for this source id will occur until it is Detected again
    // state has the current state of the source including id, position, kind, etc.
}

void InteractionManager_SourceUpdated(InteractionSourceState state)
{
    // Source was updated. The source would have been detected before this point
    // state has the current state of the source including id, position, kind, etc.
}

void InteractionManager_SourcePressed(InteractionSourceState state)
{
    // Source was pressed. This will be after the source was detected and before it is released or lost
    // state has the current state of the source including id, position, kind, etc.
}

void InteractionManager_SourceReleased(InteractionSourceState state)
{
    // Source was released. The source would have been detected and pressed before this point. This event will not fire if the source is lost
    // state has the current state of the source including id, position, kind, etc.
}

High-level composite gesture APIs (GestureRecognizer)

Namespace: UnityEngine.XR.WSA.Input

Types: GestureRecognizer, GestureSettings, InteractionSourceKind

Your app can also recognize higher-level composite gestures for spatial input sources, Tap, Hold, Manipulation and Navigation gestures. You can recognize these composite gestures across both hands and motion controllers using the GestureRecognizer.

Each Gesture event on the GestureRecognizer provides the SourceKind for the input as well as the targeting head ray at the time of the event. Some events provide additional context specific information.

There are only a few steps required to capture gestures using a Gesture Recognizer:

  1. Create a new Gesture Recognizer
  2. Specify which gestures to watch for
  3. Subscribe to events for those gestures
  4. Start capturing gestures

Create a new Gesture Recognizer

To use the GestureRecognizer, you must have created a GestureRecognizer:

GestureRecognizer recognizer = new GestureRecognizer();

Specify which gestures to watch for

Specify which gestures you are interested in via SetRecognizableGestures():

recognizer.SetRecognizableGestures(GestureSettings.Tap | GestureSettings.Hold);

Subscribe to events for those gestures

Subscribe to events for the gestures you are interested in.

recognizer.TappedEvent += MyTapEventHandler;
recognizer.HoldEvent += MyHoldEventHandler;

Note: Navigation and Manipulation gestures are mutually exclusive on an instance of a GestureRecognizer.

Start capturing gestures

By default, a GestureRecognizer does not monitor input until StartCapturingGestures() is called. It is possible that a gesture event may be generated after StopCapturingGestures() is called if input was performed before the frame where StopCapturingGestures() was processed. Because of this, it is reliable if you want to start and stop gesture monitoring depending on which object the player is currently gazing at.

recognizer.StartCapturingGestures();

Stop capturing gestures

To stop gesture recognition:

recognizer.StopCapturingGestures();

Removing a gesture recognizer

Remember to unsubscribe from subscribed events before destroying a GestureRecognizer object.

void OnDestroy()
{
    recognizer.TappedEvent -= MyTapEventHandler;
    recognizer.HoldEvent -= MyHoldEventHandler;
}

See also