Gaze, gesture and voice (GGV) are the primary means of interaction on HoloLens. Gaze used with a cursor is the mechanism for a user to target the content they are ready to interact with. Gesture or voice are the intention mechanisms. Gaze can be used with either gesture or voice to complete an interaction.
On immersive headsets, the primary means of interaction are gaze-and-commit and point-and-commit (with a motion controller). If the user has a headset with voice capabilities, voice can be used in combination with gaze or point to complete an action.
While designing apps, you should consider how you can make these interactions work together well.
|Voice||✔️||✔️ (with headset attached)|
Consider adding voice commands to any experience that you build. Voice is a powerful and convenient way control the system and apps. Because users speak with a variety of dialects and accents, proper choice of speech keywords will make sure that your users' commands are interpreted unambiguously.
Below are some practices that will aid in smooth speech recognition.
As a user targets any button through gaze or pointing, they can say the word "Select" to activate that button. "Select" is one of the low power keywords that is always listened for. Going further, a user can also use "button grammar" across the system or in apps. For example, while looking at an app, a user can say the command "Remove" (which is in the app bar) to remove the app from the world.
Windows Mixed Reality has employed a "see it, say it" voice model where labels on buttons are identical to the associated voice commands. Because there isn’t any dissonance between the label and the voice command, users can better understand what to say to control the system. To reinforce this, while dwelling on a button, a "voice dwell tip" appears to communicate which buttons are voice enabled.
Voice input is a natural way to communicate our intents. Voice is especially good at interface traversals because it can help users cut through multiple steps of an interface (a user might say "go back" while looking at Web page, instead of having to go up and hit the back button in the app). This small time savings has a powerful emotional effect on user’s perception of the experience and gives them a small amount superpower. Using voice is also a convenient input method when we have our arms full or are multi-tasking. On devices where typing on a keyboard is difficult, voice dictation can be an efficient alternative way to input. Lastly, in some cases when the range of accuracy for gaze and gesture are limited, Voice might be a user’s only trusted method input.
How using voice can benefit the user
Voice also has some weaknesses. Fine-grained control is one of them. (for example a user might say "louder," but can’t say how much. "A little" is hard to quantify. Moving or scaling things with voice is also difficult (voice does not offer the granularity of control). Voice can also be imperfect. Sometimes a voice system incorrectly hears a command or fails to hear a command. Recovering from such an errors is a challenge in any interface. Lastly, voice may not be socially acceptable in public places. There are some things that users can’t or shouldn’t say. These cliffs allow speech to be used for what it is best at.
When Voice is applied properly, the user understands what they can say and get clear feedback the system heard them correctly. These two signals make the user feel confident in using Voice as a primary input. Below is a diagram showing what happens to the cursor when voice input is recognized and how it communicates that to the user.