Multi User Speech and Gesture Interaction

Return to Idea Sketches

Basic premise.

When people work face to face they often use a combination of speech and gestures to interact with the computer. We now have technologies that can detect the speech and gesture actions of multiple people over a digital table. In this project you'll have an opportunity to explore multi user speech and gesture interaction over digital walls and tables.


When multiple people are interacting over a shared digital tabletop, they often communicate with others using many speech and gestural actions. Example applications show that multi user speech and gestures can be used to interacti with existing single user applications such as Google Earth, Warcraft III, and The Sims. These actions serve dual purpose as commands to the computer and as awareness for other collaborators (e.g., "Fly to Calgary").

Recently, tools have been developed to dramatically simplify the development of speech and gesture enabled applications in two ways: first, wrappers can be built to allow speech and gesture actions to be converted to mouse and keyboard events on an existing single user application. Second, custom speech and gesture applications can be built from the ground up (e.g., GSI Colour Blender) However, there are a limited number of applications in this area, you're responsibility will be to try to develop applications that explore multi user speech and gesture interaction over digital tables.


Air Traffic Control Game

Air Traffic Control

Here's an image of air traffic control in the real world. Perhaps you could think of how this might work in a digital world using speech and gesture commands.

Multi Display Interaction

Clifton Forlines explored how multiple displays can be used for interaction with Digital Tabletop displays. Imagine exploring multiple display applications using speech and gestures. Some modern games also provide multiple display support (e.g., Supreme Commander)

Are you talking to me or the computer?

Always on speech recognition often causes errors during regular conversation. It is hard to know when someone is talking to the computer vs. talking to another person. One solution would be to track where people are looking. Natural Point sells a cheap head tracking device that can be used to find the x,y,z position of a person's head. This can be used for gaze tracking and might be useful for trying to find out when someone is looking at another person vs. looking at the screen. Might be interesting to run a study to see if this is actually effective or not.

Interaction on the Digital Wall

Sometimes on a large digital wall it is exceedingly difficult to interact with items at a distance. We have some software that allows graphics to be drawn on a large digital wall display. You could develop applications to perform tasks or games over the large display

Visualization of Multi User Input

While existing systems such as camtasia can synchronize mouse and keyboard actions with a live video there are few tools that allow this to be viewed for multiple users simultaneously. You could develop novel visualizations to show the speech and gesture actions of multiple concurrent users on a digital table.


GSI Demo is a toolkit that supports multi user speech and gesture interaction over large digital displays . Also check out the SDG Toolkit for multiple mice support. Contact Ed Tse if you'd like to learn more about using GSI Demo or SDG Toolkit.


  • Speech recognition is not perfect, there are lots of recognition errors that occur during everyday use.