Just ahead of next week’s public unveiling of its Project Natal motion control system for the Xbox 360, Microsoft has filed for several patents covering technology for controlling PCs and game systems with gestures and motion tracking.
The patent applications reiterate that Microsoft’s work on gesture input systems predate Natal and go far beyond the game console. They also makes it clear that this technology is no longer the realm of science fiction as in the movie “Minority Report.”
The applications don’t mention Natal specifically but describe related technology for controlling systems with depth-sensing cameras and voice commands that serve as interfaces to control computers with large visual displays.
They describe a system that could also use a wireless sensor along with the 3-D sensors and voice control.
“MSR [Microsoft Research] has been working on this for a long, long time,” said Andy Wilson, a Microsoft researcher named on the applications. “Now that the buzz has been turned up a couple notches around Natal, it’s good to keep in mind that we’ve been doing this stuff for a long time.”
Blogger Manan Kakkar called out the application for a PC control system that was published today.
When I looked into it, I found a related application for a “system and method for executing a game process” published June 3. It involves “a 3-D imaging system for recognition and interpretation of gestures to control a computer. The system includes a 3-D imaging system that performs gesture recognition and interpretation based on a previous mapping of a plurality of hand poses and orientations to user commands for a given user.”
Natal captures all sorts of motion by tracking players’ skeletons and overlaying voice commands. It also draws on the vast library of research Microsoft has done over the years.
The patent applications describe a standard set of gestures that can be combined with voice control and the use of a remote sensing device. It also hints at what could be one of the challenges with Natal — remembering particular control gestures, and the complexity of recognizing and processing such inputs.
The computer application is for a “perceptual user interface” architecture that “comprises alternative modalities for controlling computer application programs and manipulating on-screen objects through hand gestures or a combination of hand gestures and verbal commands. The perceptual user interface system includes a tracking component that detects object characteristics of at least one of a plurality of objects within a scene, and tracks the respective object.”
From the application:
“A small set of very simple gestures can offer significant bits of functionality where they are needed most. For example, dismissing a notification window can be accomplished by a quick gesture to the one side or the other, as in shooing a fly. Another example is gestures for ‘next’ and ‘back’ functionality found in web browsers, presentation programs (e.g., PowerPoint) and other applications.
Note that in many cases the surface forms of these various gestures can remain the same throughout these examples, while the semantics of the gestures depends on the application at hand. Providing a small set of standard gestures eases problems users have in recalling how gestures are performed, and also allows for simpler and more robust signal processing and recognition processes.”
A few images from the applications: