Voice design

Users will use gaze, gesture and voice (GGV) as the primary means of interacting with your content on HoloLens. Since gaze is the targeting mechanism, users will use this to target content via Cursors. Gesture and voice are the intention mechanisms. Users will use these to indicate that they are ready to interact with the content they are targeting.

The interaction model on HoloLens is GG and GV. Users will use gaze with either gestures or voice to complete their interactions. While designing apps, you should consider how you can make these interactions work together well.

Voice

Voice is one of the three main inputs to HoloLens, so you should also consider adding voice commands to any experience that you build. Voice is a powerful and convenient way control the system and apps. Because HoloLens users speak with a variety of dialects and accents, proper choice of speech keywords will make sure that your users' commands are interpreted unambiguously. Below are some practices that will aid in smooth speech recognition:

  • When possible, choose keywords of two or more syllables. One-syllable words tend to use different vowel sounds when spoken by persons of different accents.
  • Make sure any action that can be taken by a speech command is non-destructive or easily undoable in case another person speaking near the HoloLens user accidentally triggers a command.
  • Avoid registering multiple speech commands that sound very similar.
  • When your app is not in a state in which a particular speech command is valid, consider unregistering it so that other commands are not confused for that one.
  • Test your app with users of different accents.

WHAT USERS CAN SAY
There are two primary things a user can say on HoloLens. As a user targets any button through gaze, they can say the word "select" to activate that button on HoloLens. "Select" is one of the low power keywords HoloLens is always listening for. Going further, a user can also say a button grammar in the Shell or Apps. For example, while looking at an app, a user can say the command "Remove" (which is in the app bar) to remove the app from the world.

SEE IT SAY IT
HoloLens has employed a "see it, say it" voice model where labels on buttons are identical to the associated voice commands. Because there isn’t any dissonance between the label and the voice command, users can better understand what to say to control the system. To reinforce this, while dwelling on a button, a "voice dwell tip" appears to communicate which buttons are voice enabled.

App bar buttons

Standard Buttons

VOICE'S STRENGTHS
Voice input is a natural way to communicate our intents. Voice is especially good at interface traversals because it can help users cut through multiple steps of an interface. (a user might say "go back" while looking at Web page, instead of having to go up and hit the back button in the app) This small time savings has a powerful emotional effect on user’s perception of the experience and gives them a small amount superpower. Using voice is also a convenient input method when we have our arms full or are multi-tasking. On devices where typing on a keyboard is difficult, voice dictation can be an efficient alternative way to input. Lastly, in some cases when the range of accuracy for gaze and gesture are limited, Voice might be a user’s only trusted method input.

ENABLING VOICE CAN BENEFIT THE USER

  1. Reduces Time - Should make the end goal more efficient.
  2. Minimizes Effort - Should make tasks more fluid and effortless.
  3. Reduces Cognitive Load - Intuitive, easy to learn, and remember.
  4. Socially Acceptable - Should fit within societal norms in terms of behavior.
  5. Routine - Can readily become a habitual behavior.
  6. Beneficial – Using Voice has to be a significant improvement over how it’s done today.

SYSTEM RESERVED VOICE COMMANDS

There are some voice commands that are reserved for the system. These should not be used by applications.

  1. Hey Cortana
  2. Select

VOICE COMMANDS: DOs

  1. Concise commands
    Example: "Play video" is better than "Play the currently selected video"
  2. Use simple vocabulary
    Example: "Show note" is better than "Show placard"
  3. Voice command consistency
    Example: If "Go back" goes to the previous page, maintain this behavior in your applications.

VOICE COMMANDS: DONTs

  1. System commands
    Example: Please don't register for system reserved commands mentioned in the above section.
  2. Similar sounding commands
    Example: "Show more" and "Show store" can be very similar sounding.

VOICE’S WEAKNESSES
Voice also has some weaknesses. Fine-grained control is one of them. (for example a user might say "louder," but can’t say how much. "A little" is hard to quantify. Moving or scaling things with voice is also difficult (voice does not offer the granularity of control). Voice can also be imperfect. Sometimes a voice system incorrectly hears a command or fails to hear a command. Recovering from such an errors is a challenge in any interface. Lastly, voice may not be socially acceptable in public places. There are some things that users can’t or shouldn’t say. These cliffs allow speech to be used for what it is best at.

COMMON VOICE CONCERNS FROM USERS

  1. What can I say?
  2. How do I know the system heard me correctly?
  3. The system keeps getting my voice commands wrong.
    1. Doesn’t react when I give it a voice command.
    2. Reacts the wrong way when I give it a voice command.
  4. How do I target my voice to a specific app or app command?
  5. Can I use voice to command things out the holographic frame on HoloLens?

VOICE FEEDBACK STATES
When Voice is applied properly, the user understands what they can say and get clear feedback the system heard them correctly. These two signals make the user feel confident in using Voice as a primary input. Below is a diagram showing what happens to the cursor when voice input is recognized and how it communicates that to the user.

Voice feedback states for cursor

TOP THINGS USERS SHOULD KNOW ABOUT USING "SPEECH" ON HOLOLENS

  1. Say "select" while targeting a button. (you can use this anywhere to click a button)
  2. You can say labels on the app bar buttons and in some apps. (For example, while looking at an app, a user can say the command "Remove" (which is in the app bar) to remove the app from the world. (this saves you time from having to click it with your hand.
  3. You can initiate Cortana listening by saying "Hey Cortana." You can ask her questions ("Hey Cortana, how tall is the Eiffel tower"), tell her to open an app ("Hey Cortana, open Netflix"), or tell her to bring up the Start panel "Hey Cortana, take me home" and more.

Benefits of voice input

When used well, voice input provides key benefits to holographic applications:

  1. Reduces Time - Should make the end goal more efficient.
  2. Minimizes Effort - Should make tasks more fluid and effortless.
  3. Reduces Cognitive Load - Intuitive, easy to learn, and remember.
  4. Socially Acceptable - Should fit within societal norms in terms of behavior.
  5. Routine - Can readily become a habitual behavior.
  6. Beneficial - Using Voice has to be a significant improvement over how it's done today.

Voice feedback states

When Voice is applied properly, the user understands what they can say and get clear feedback the system heard them correctly. These two signals make the user feel confident in using Voice as a primary input. Below is a diagram showing what happens to the cursor when voice input is recognized and how it communicates that to the user.