Gesture design

Interaction on HoloLens is built around Gaze, Gesture, and Voice. Interactions are built on gaze to target and gesture or voice to act upon whatever element has been targeted. It should be stated up front that gesture should be thought of as an input that is good for some purposes and less good for others (like all inputs, ultimately).

Device support

Feature HoloLens Immersive headsets
Gesture ✔️

The three key component gestures of HoloLens

HoloLens currently recognizes three key component gestures, which can form the foundation for a variety of possible user actions:

Bloom

Bloom is the "home" gesture and is reserved for that alone. It is equivalent to pressing the Windows key. The user can use either hand.

Bloom gesture

Air tap

Air tap is a tapping gesture with the hand held upright, similar to a mouse click. This is used in most HoloLens experiences for the equivalent of a "click" on UI elements after targeting with Gaze.

Finger in ready position and then a tap or click motion

Tap and hold

Hold is simply maintaining the downward finger position of the air tap. This allows "mouseup"/"mousedown" interactions, "click and drag" interactions, and more.

The combination of air tap and hold allows a variety of more complex interactions when combined with arm movement, similar to a single-button mouse. Caution should be used in these designs however, as users can be prone to relaxing their hand postures during the course of any extended gesture.

Discrete vs. continuous gestures

In general, gestures can be thought of as being discrete or continuous in nature, and these have different potential uses:

  • Discrete gestures are those in which there is a binary ‘completed’/’not completed’ state of the gesture. The air tap is the simplest of these, in which the tap either is or is not completed, and an action is or is not taken within an experience. It is possible, though not recommended without some specific purpose, to create other discrete gestures from combinations of the main components (e.g. a double-tap gesture to mean something different than a single-tap).
  • Continuous gestures are those in which the amount of action within the gesture matters to the output. That is, the gesture takes place on some continuum. Movement is a key category of these continuous gestures, and is often used in HoloLens applications (with air tap and hold to invoke it) to allow users to manipulate objects by (for example) moving their arm in 3 dimensions. Continuous gestures allow for a greater range of possible outcomes in an application, but also have a greater possibility for accidental activation of ‘nearby’ gestures, interference with other goals, and user error as postures may degrade or change over the time of action.

Gesture frame

For both discrete and continuous gestures on HoloLens, the hand must be within a “gesture frame”, in a range that the gesture-sensing cameras can see appropriately (very roughly from nose to waist, and between the shoulders). Users need to be trained on this area of recognition both for success of action and for their own comfort (many users will initially assume that the gesture frame must be within their view through HoloLens, and hold their arms up uncomfortably in order to interact).

In the case of continuous gestures in particular, there is some risk of users moving their hands outside of the gesture frame while in mid-gesture (while moving some holographic object, for example), and losing their intended outcome.

There are three things that you should consider:

  • User education on the gesture frame's existence and approximate boundaries (this is taught during HoloLens setup).
  • Notifying users when their gestures are nearing/breaking the gesture frame boundaries within an application, to the degree that a lost gesture will lead to undesired outcomes. Research has shown the key qualities of such a notification system, and the HoloLens shell provides a good example of this type of notification (visual, on the central cursor, indicating the direction in which boundary crossing is taking place).
  • Consequences of breaking the gesture frame boundaries should be minimized. In general, this means that the outcome of a gesture should be stopped at the boundary, but not reversed. For example, if a user is moving some holographic object across a room, movement should stop when the gesture frame is breached, but not be returned to the starting point. The user may experience some frustration then, but may more quickly understand the boundaries, and not have to restart their full intended actions each time.

See also