Holograms 212

Voice input gives us another way to interact with our holograms. Voice commands work in a very natural and easy way. Design your voice commands so that they are:

  • Natural
  • Easy to remember
  • Context appropriate
  • Sufficiently distinct from other options within the same context

In Holograms 101, we used the KeywordRecognizer to build two simple voice commands. In Holograms 212, we'll dive deeper and learn how to:

  • Design voice commands that are optimized for the HoloLens speech engine.
  • Make the user aware of what voice commands are available.
  • Acknowledge that we've heard the user's voice command.
  • Understand what the user is saying, using a Dictation Recognizer.
  • Use a Grammar Recognizer to listen for commands based on an SRGS, or Speech Recognition Grammar Specification, file.

In this course, we'll revisit Model Explorer, which we built in Holograms 210 and Holograms 211.

Prerequisites

Project files

  • Download the files required by the project.
  • Unarchive the files to your desktop or other easy to reach location.

Errata and Notes

  • "Enable Just My Code" needs to be disabled (unchecked) in Visual Studio under Tools->Options->Debugging in order to hit breakpoints in your code.

Unity Setup

Instructions

  • Start Unity.
  • Select Open.
  • Navigate to your Desktop and find the HolographicAcademy-Holograms-212-Voice folder you previously unarchived.
  • Find and select the Starting\ModelExplorer folder.
  • Click Select Folder.
  • In Unity's Project panel click on the Scenes folder.
  • Double-click the ModelExplorer scene to open it in Unity.
  • In Unity, select File > Build Settings.
  • If Scenes/ModelExplorer is not listed in Scenes In Build, click Add Open Scenes to add the scene.
  • Select Windows Store in the Platform list and click the Switch Platform button.
  • Set SDK to Universal 10 and UWP Build Type to D3D.
  • Check Unity C# Projects.
  • Click Build.
  • Create a New Folder named "App".
  • Single click the App folder.
  • Press Select Folder.
  • When Unity is done, a File Explorer window will appear.
  • Open the App Folder.
  • Open the ModelExplorer Visual Studio Solution.
  • Using the top toolbar in Visual Studio, change the target from Debug to Release and from ARM to X86.
  • Click on the drop down arrow next to the Device button, and select Remote Machine.
  • Enter your device IP address and set Authentication Mode to Universal (Unencrypted Protocol). Click Select. If you do not know your device IP address, using your HoloLens look in Settings > Network & Internet > Advanced Options or ask Cortana "Hey Cortana, what's my IP address?"
  • Click Debug -> Start Without debugging in the menu or press Ctrl + F5. If this is the first time deploying to your device, you will need to pair it with Visual Studio. Follow these instructions: Pairing your HoloLens with Visual Studio
  • Note, you might notice some red errors in the Visual Studio Errors panel. It is safe to ignore them. Switch to the Output panel to view actual build progress. Errors in the Output panel will require you to make a fix (most often they are caused by a mistake in a script).

Chapter 1 - Awareness

Objectives

  • Learn the Dos and Don'ts of voice command design.
  • Use KeywordRecognizer to add gaze based voice commands.
  • Make users aware of voice commands using cursor feedback.

In this chapter, you'll learn about designing voice commands. When creating voice commands:

DO

  • Create concise commands. You don't want to use "Play the currently selected video", because that command is not concise and would easily be forgotten by the user. Instead, you should use: "Play Video", because it is concise and has multiple syllables.
  • Use a simple vocabulary. Always try to use common words and phrases that are easy for the user to discover and remember. For example, if your application had a note object that could be displayed or hidden from view, you would not use the command "Show Placard", because "placard" is a rarely used term. Instead, you would use the command: "Show Note", to reveal the note in your application.
  • Be consistent. Voice commands should be kept consistent across your application. Imagine that you have two scenes in your application and both scenes contain a button for closing the application. If the first scene used the command "Exit" to trigger the button, but the second scene used the command "Close App", then the user is going to get very confused. If the same functionality persists across multiple scenes, then the same voice command should be used to trigger it.

DON'T

  • Use single syllable commands. As an example, if you were creating a voice command to play a video, you should avoid using the simple command "Play", as it is only a single syllable and could easily be missed by the system. Instead, you should use: "Play Video", because it is concise and has multiple syllables.
  • Use system commands. The "Select" command is reserved by the system to trigger a Tap event for the currently focused object. Do not re-use the "Select" command in a keyword or phrase, as it might not work as you expect. For example, if the voice command for selecting a cube in your application was "Select cube", but the user was looking at a sphere when they uttered the command, then the sphere would be selected instead. Similarly app bar commands are voice enabled. Don't use the following speech commands in your CoreWindow View:
    1. Go Back
    2. Scroll Tool
    3. Zoom Tool
    4. Drag Tool
    5. Adjust
    6. Remove
  • Use similar sounds. Try to avoid using voice commands that rhyme. If you had a shopping application which supported "Show Store" and "Show More" as voice commands, then you would want to disable one of the commands while the other was in use. For example, you could use the "Show Store" button to open the store, and then disable that command when the store was displayed so that the "Show More" command could be used for browsing.

Instructions

  • In Unity's Hierarchy panel, use the search tool to find the holoComm_screen_mesh object.
  • Double-click on the holoComm_screen_mesh object to view it in the Scene. This is the astronaut's watch, which will respond to our voice commands.
  • In the Inspector panel, locate the Keyword Manager (Script) component.
  • Expand the Keywords and Responses section to see the supported voice command: Open Communicator.
  • Double-click on KeywordManager.cs to open it in Visual Studio.
  • Explore KeywordManager.cs to understand how it uses the KeywordRecognizer to add voice commands and respond to them with delegates.

Build and Deploy

  • In Unity, use File > Build Settings to rebuild the application.
  • Open the App folder.
  • Open the ModelExplorer Visual Studio Solution.

(If you already built/deployed this project in Visual Studio during set-up, then you can open that instance of VS and click 'Reload All' when prompted).

  • In Visual Studio, click Debug -> Start Without debugging or press Ctrl + F5.
  • After the application deploys to the HoloLens, dismiss the fit box using the air-tap gesture.
  • Gaze at the astronaut's watch.
  • When the watch has focus, verify that the cursor changes to a microphone. This provides feedback that the application is listening for voice commands.
  • Verify that a tooltip appears on the watch. This helps users discover the "Open Communicator" command.
  • While gazing at the watch, say "Open Communicator" to open the communicator panel.

Chapter 2 - Acknowledgement

Objectives

  • Record a message using the Microphone input.
  • Give feedback to the user that the application is listening to their voice.

Note

The Microphone capability must be declared for an app to record from the microphone. This is done for you already in Holograms 212, but keep this in mind for your own projects.

  1. In the Unity Editor, go to the player settings by navigating to "Edit > Project Settings > Player"
  2. Click on the "Windows Store" tab
  3. In the "Publishing Settings > Capabilities" section, check the Microphone capability

Instructions

  • In Unity's Hierarchy panel, verify that the holoComm_screen_mesh object is selected.
  • In the Inspector panel, find the Astronaut Watch (Script) component.
  • Click on the small, blue cube which is set as the value of the Communicator Prefab property.
  • In the Project panel, the Communicator prefab should now have focus.
  • Click on the Communicator prefab in the Project panel to view its components in the Inspector.
  • Look at the Microphone Manager (Script) component, this will allow us to record the user's voice.
  • Notice that the Communicator object has a Keyword Manager (Script) component for responding to the Send Message command.
  • Look at the Communicator (Script) component and double-click on the script to open it in Visual Studio.

Communicator.cs is responsible for setting the proper button states on the communicator device. This will allow our users to record a message, play it back, and send the message to the astronaut. It will also start and stop an animated wave form, to acknowledge to the user that their voice was heard.

  • In Communicator.cs, delete the following lines from the Start method. This will enable the 'Record' button on the communicator.

Communicator.cs[show]

Build and Deploy

  • In Visual Studio, rebuild your application and deploy to the device.
  • Gaze at the astronaut's watch and say "Open Communicator" to show the communicator.
  • Press the Record button (microphone) to start recording a verbal message for the astronaut.
  • Start speaking, and verify that the wave animation plays on the communicator, which provides feedback to the user that their voice is heard.
  • Press the Stop button (left square), and verify that the wave animation stops running.
  • Press the Play button (right triangle) to play back the recorded message and hear it on the device.
  • Press the Stop button (right square) to stop playback of the recorded message.
  • Say "Send Message" to close the communicator and receive a 'Message Received' response from the astronaut.

Chapter 3 - Understanding and the Dictation Recognizer

Objectives

  • Use the Dictation Recognizer to convert the user's speech to text.
  • Show the Dictation Recognizer's hypothesized and final results in the communicator.

In this chapter, we'll use the Dictation Recognizer to create a message for the astronaut. When using the Dictation Recognizer, keep in mind that:

  • You must be connected to WiFi for the Dictation Recognizer to work.
  • Timeouts occur after a set period of time. There are two timeouts to be aware of:
    • If the recognizer starts and doesn't hear any audio for the first five seconds, it will timeout.
    • If the recognizer has given a result but then hears silence for twenty seconds, it will timeout.
  • Only one type of recognizer (Keyword or Dictation) can run at a time.

Note

The Microphone capability must be declared for an app to record from the microphone. This is done for you already in Holograms 212, but keep this in mind for your own projects.

  1. In the Unity Editor, go to the player settings by navigating to "Edit > Project Settings > Player"
  2. Click on the "Windows Store" tab
  3. In the "Publishing Settings > Capabilities" section, check the Microphone capability

Instructions

We're going to edit MicrophoneManager.cs to use the Dictation Recognizer. This is what we'll add:

  1. When the Record button is pressed, we'll start the DictationRecognizer.
  2. Show the hypothesis of what the DictationRecognizer understood.
  3. Lock in the results of what the DictationRecognizer understood.
  4. Check for timeouts from the DictationRecognizer.
  5. When the Stop button is pressed, or the mic session times out, stop the DictationRecognizer.
  6. Restart the KeywordRecognizer, which will listen for the Send Message command.

Let's get started.

  • Complete all coding exercises for 3.a in MicrophoneManager.cs, or copy and paste the finished code found below:

MicrophoneManager.cs[show]

Build and Deploy

  • Rebuild in Visual Studio and deploy to your device.
  • Dismiss the fit box with an air tap gesture.
  • Gaze at the astronaut's watch and say "Open Communicator".
  • Select the Record button (microphone) to record your message.
  • Start speaking. The Dictation Recognizer will interpret your speech and show the hypothesized text in the communicator.
  • Try saying "Send Message" while you are recording a message. Notice that the Keyword Recognizer does not respond because the Dictation Recognizer is still active.
  • Stop speaking for a few seconds. Watch as the Dictation Recognizer completes its hypothesis and shows the final result.
  • Begin speaking and then pause for 20 seconds. This will cause the Dictation Recognizer to timeout.
  • Notice that the Keyword Recognizer is re-enabled after the above timeout. The communicator will now respond to voice commands.
  • Say "Send Message" to send the message to the astronaut.

Chapter 4 - Grammar Recognizer

Objectives

  • Use the Grammar Recognizer to recognize the user's speech according to an SRGS, or Speech Recognition Grammar Specification, file.

Note

The Microphone capability must be declared for an app to record from the microphone. This is done for you already in Holograms 212, but keep this in mind for your own projects.

  1. In the Unity Editor, go to the player settings by navigating to "Edit > Project Settings > Player"
  2. Click on the "Windows Store" tab
  3. In the "Publishing Settings > Capabilities" section, check the Microphone capability

Instructions

  1. In the Hierarchy panel, search for Jetpack_Center and select it.
  2. Look for the Interactible Action script in the Inspector panel.
  3. Click the little circle to the right of the Object To Tag Along field.
  4. In the window that pops up, search for SRGSToolbox and select it from the list.
  5. Take a look at the SRGSColor.xml file in the StreamingAssets folder.
  • The SRGS design spec can be found on the W3C website here.
  • In our SRGS file, we have three types of rules:
    • A rule which lets you say one color from a list of twelve colors.
    • Three rules which listen for a combination of the color rule and one of the three shapes.
    • The root rule, colorChooser, which listens for any combination of the three "color + shape" rules. The shapes can be said in any order and in any amount from just one to all three. This is the only rule that is listened for, as it's specified as the root rule at the top of the file in the initial <grammar> tag.

Build and Deploy

  • Rebuild the application in Unity, then build and deploy from Visual Studio to experience the app on HoloLens.
  • Dismiss the fit box with an air tap gesture.
  • Gaze at the astronaut's jetpack and perform an air tap gesture.
  • Start speaking. The Grammar Recognizer will interpret your speech and change the colors of the shapes based on the recognition. An example command is "blue circle, yellow square".
  • Perform another air tap gesture to dismiss the toolbox.

The End

Congratulations! You have now completed Holograms 212 - Voice.

  • You know the Dos and Don'ts of voice commands.
  • You saw how tooltips were employed to make users aware of voice commands.
  • You saw several types of feedback used to acknowledge that the user's voice was heard.
  • You know how to switch between the Keyword Recognizer and the Dictation Recognizer, and how these two features understand and interpret your voice.
  • You learned how to use an SRGS file and the Grammar Recognizer for speech recognition in your application.