With the introduction of the iPhone 4, one very hyped feature of the new iOS platform was Siri, touted to be a "mobile concierge" of sorts, that would recognize your voice, and carry out your voice commands, or answer queries. Since its introduction, Siri has more or less failed to become a dominant way for iPhone users to interact with their phones.
This certainly hasn't stopped Android developers from trying to bring their platform up to speed. Apps like Iris, Utter and Vlingo are among the many applications to attempt to offer a voice command functionality, each executing to a varying level of success. Despite the numerous options, no solution on Android or iOS has really become a popular way of interacting with a phone.
Voice Commands: A Difficult Technical Problem
Voice command technology has been difficult to implement correctly as it relies on a number of different technologies working perfectly in concert, none of which are easy to implement alone, much less as a package. Voice recognition software is far from perfect, but has been refined over the years.
Dragon Dictate is a piece of software that has headed the charge on voice recognition, and even now, it provides a solution for recognizing voice that is far from perfect. That said, it offers a mobile app that offers one of the better instances of voice recognition for transcription purposes. The Dragon Dictate app created by Nuance uses the Dragon NaturallySpeaking engine to provide the voice recognition. You speak into the handset, and the app connects to its server, presumably running the latest version of the voice recognition software, and then returns a transcription.
This app produces surprisingly effective results, but as the recognition magic takes place on the back end, an Internet connection is required. However, this is true of many of the other voice command offerings.
However voice commands are more than merely voice recognition; Where voice recognition has its own issues, particularly with mobile solutions that typically don't require users to train the software to your voice, voice command software must also make semantic sense of the voice, and map that to a set of commands for the phone to execute.
This is an incredibly challenging problem, and one that is at the forefront of technology research. Indeed, IBM labs gained a great deal of PR by showcasing their progress into semantic language understanding by featuring its "Watson" computer on the television game show Jeopardy. Natural language processing is an incredibly difficult technical problem, and when combined with the already issue-laden problem of voice recognition, it becomes an extremely difficult problem to solve. In fact in many ways, it's quite impressive that solutions like Siri are as effective as they are.
A Solution in Search of a Problem?
An even bigger issue to the whole spectrum of mobile voice command offerings is whether customers are truly interested in interacting with their phone in that manner, or whether offerings like Siri are merely solutions searching for a problem.
The general impression given by the current class of voice offerings seems to indicate these programs manage to capture the imagination of users in the short term, but few of them have relative staying power. It's difficult to understand whether this is due to current technical limitations, or whether it's simply a case of users not being interested in controlling phones with their voice. After all, the average smartphone user has now become comfortable enough with the interface that it's debatable whether voice commands really save much time at all. And users often feel quite uncomfortable talking into a phone when there's no human t the other end.
Certainly, the next generation of voice command software will provide improvements over the current disappointing crop, and perhaps we'll gain more insight into whether mobile voice commands are a method of interaction that's here to stay.