Wonder why everything isn't speech controlled?
Last November, I wrote a post titled "Top 10 technology flops." One of the 10 was speech recognition. Judging by the feedback I got from all over the Web, you'd think I'd said Apple was a flop or Bush was a great president.
What I meant, at the time, was that I was disappointed that we're not rid of all the keyboards, buttons, and remote controls by now. So I did some research and discovered that speech technology is indeed proliferating in some industries: defense, medical, call centers, and rudimentary capability for cell phones, edutainment, and high-end automobiles.
That said, I don't really care that American Airlines can recognize my voice responses on the phone. The only speech application that actually benefits me on a day-to-day basis is on my cell phone, and that's pretty basic stuff.
For the most part, we're still banging away on computer keyboards and drowning in a sea of proprietary consumer electronics devices and remote controls.
(Credit: Nuance)And now I know why. When it comes to speech technology, one company is holding just about all the cards: Nuance Communications.
Courtesy of dozens of mergers and acquisitions (M&A) over the past 13 years, Nuance now owns much of the speech technology on planet Earth. The company boasts a $3.5 billion market cap on annual sales that will likely top $800 million this fiscal year but, remarkably, has never been profitable. I can see why. Nuance has been so busy acquiring companies it hasn't had a chance to worry about a little thing like profitability.
The company's history is a tribute to M&A gluttony. Let's see if I can get this right. In 1980, Xerox (so many of these stories begin with Xerox) bought inventor Raymond Kurzweil's optical character recognition (OCR) company and ultimately renamed it ScanSoft.
In 1999, a scanner software company called Visioneer bought ScanSoft and adopted the name. That seems to be about when current Nuance Chairman and CEO Paul Ricci entered the picture, and that's when all the fun began.
In late 2001, ScanSoft bought Lernout & Hauspie, a Belgian company that had previously acquired a host of other companies including Berkeley Speech Technologies and Dragon Systems. Amazingly, L&H--the leader in the speech technology field--was bankrupt so ScanSoft got it for a song: $39.5 million.
ScanSoft went on to acquire about a dozen other companies, including some that were themselves made up of acquired companies.
While all this was happening on the East Coast, Nuance spun off from Stanford Research Institute (SRI) in 1994. In 1996, the Menlo Park company deployed its first large-scale, call-center-based commercial speech application.
In September 2005, ScanSoft merged with Nuance and the combined company adopted the Nuance Communications name. Since then, Nuance has gobbled up another dozen companies, the largest of which being Dictaphone for $357 million and eScription for $400 million.
According to my math, the current incarnation of Nuance Communications is actually made up of 42 companies, with a $180 million acquisition of SNAPin Software in the works and an unsolicited offer of $40 million for Zi on the table. Got all that?
The company lists its competitors as AT&T, IBM, and Microsoft. Sounds formidable, but each of these giants competes with Nuance in specific, limited markets. Nuance is far and away the 800-pound gorilla of speech technology.
As for its business strategy, Nuance seems to have done a good job of focusing its limited resources on the largest vertical markets where it can optimize profit margins. The company's primary focus is on helping businesses improve efficiency and productivity while reducing costs.
The fact that the company says little about aggressively driving its technology into the consumer space is telling. That's simply not its business plan, and I can certainly understand why. The consumer electronics market is highly fragmented with thin margins and high support costs. And if Nuance wishes to avoid that, well, there really isn't much competition left to twist its arm.
I'd say Ricci is a shrewd businessman.
Still, the next time you get off the phone with an automated call-center that communicates eerily well, only to fumble around with the myriad of keyboards, buttons, and remote controls in your own life, at least you'll know what name to curse: Nuance Communications.
Steve Tobak is managing partner of Invisor Consulting LLC. He is a member of the CNET Blog Network, and is not an employee of CNET. Disclosure.
- Share:
- Digg
- Del.icio.us





The problem we are seeing here is that this is essentially not occurring. Developing speech reco technology and making it work reasonable well is really hard. Essentially all speech reco originates from academia and then branches out for licensing and proprietary applications and solutions.
We need a revolution something disruptive like the iPhone was to the cell-phone market.
For me the next big thing in speech reco is not the cellphone....its speech enabled flash in the browser. You tackle that problem (small footprint) and you've tackled the cellphone.
Rob Mitchell
Product Manager
neuroLanguage
www.neuroLanguage.com
I'm not so sure that what you propose is different from that. Get somethign that works in the low end. The rest follows.
"For me the next big thing in speech reco is not the cellphone....its speech enabled flash in the browser. You tackle that problem (small footprint) and you've tackled the cellphone."
This is possible using open source software, Red 5 (Open source flash media server), BlazeDS(Open source flash -> java backend server), SPHINX (Open source ASR).
Ross Hendrickson
Software Developer
PSST Research Group
http://psst.byu.edu/
I have also been wondering why voice recognition isn't in the consumer space
(I figured out the Bush question 2 elections ago)
tapping away at a key board or worse still (mobile pad) is a patently anti-customer centric way of using all these new gizmo's regardless of what millenials may think
maybe at some point we'll catch up to Roddenberry's communicator vision
till then lets all qwerty
cheers
Miro
I believe there are a few reasons that speech recognition is not being widely used
The first is because it actually has very limited practical application (unless you work and live alone). A room full of people can type on keyboards and interact with a mouse without disturbing their coworkers, but imagine if all of them were dictating documents. Activating lights in your home and turning on the stereo might work fine if you live alone, but if you have kids...forget it.
The second reason I think voice rec. is not widely used is because when writing, people edit on the fly. They backspace, move paragraphs and delete entire sections, and it's just too difficult to do that with voice rec.
So where could it work - I think the practical applications come into play in situations where people are already speaking - here are a few examples:
* Sales rep on the phone with a client - phone call is documented in a CRM system using Voice recognition
* Police Reports over the phone
* assistance to people with disabilities
* University Lecture Transcripts
* Updating a contact record on your cell phone (ie - someone gives you a phone number during the conversation, and at the end of the call, any phone numbers that were mentioned are listed in context with the option to save them...
I think most of the reasonable applications for vr will be in the telephony space.
cheers,
Keith www.endsville.ca
"Speech on"
"Command Prompt"
"C colon"
"CD backslash"
"Del star dot star slash s"
Just to prove how insecure it was. :-)
I think speech has come a long way. I just called the cable company and said to the automated system "my cable is out". It understood me.
Just yesterday I called my doctor's office...
Answering system:
... if you would like to speak to an operator, say "operator"
Me:
"operator"
Answering system:
I think you said "operator" is that correct?
Me:
"Yes"
Answering system:
I'm sorry I didn't understand your response. I'll connect you to the operator.
Me:
"Ha Ha Ha"
I despise talking to inanimate objects.
To the issue of kids or others being around, each person has a unique voice print that devices can recognize (technology already exists), plus any command is of course proceeded by a key word that gets the device's attention. Think controlling relatively simple consumer devices, i.e. TVs, DVRs, phones, etc., relative to complex computer commands. And yes, a crowded meeting or conference room is a unique situation.
Nuance's technology lead and patent portfolio is of course the issue. As a result of their shrewd acquisition strategy, they have a commanding lead over everyone, including IBM who has been at this speech game for decades. When it takes hundreds of millions of investment capital and a decade or two of R&D to bring an advanced technology like this to market, you can forget "open source." There is no trendy phrase that will magically produce competition.
Steve Tobak
Steve Tobak
Now to the main topic. We're not interested in running our technology by speaking because speech is a highly voluntary activity, meaning we exercise great care in what we say, since saying the wrong thing can get us into serious trouble. We taught that as children and spend our adult lives mastering it.
On the other hand, we're quite relaxed with using our other motor skills, because there's far less risk involved and because our brains are designed to deal with them in the background. We drive, eat, and work with our computers by depending on motor skills that are almost automatic. We can type with cheerful abandon because we can edit before I send. On the other hand, we have to be careful with what we say because most of the time someone is listening.
Nothing in the foreseeable future is going to change that, and as long as that's true, speaking to computers is going to feel unnatural and uncomfortable.
--Michael W. Perry, editor of Chesterton on War and Peace: Battling the Ideas and Movements that Led to Nazism and World War II
How can Bush be a "marvelous" president compared to two people that never were? Typical Fox "logic".
I guess it is tru, you can fool 21% of the people all of the time. Too bad your hero is going to go down as the worst president ever. One who brought totalitarianism to our shows and scared people into accepting it.
But why am I bothering, you've apparently already been hopelessly programmed by the nazi repuke party. Do your country a favor please and miss the next election. You haven't got the brains to vote like an adult.
Tone the hate down, son, your comment smacks a bit of schoolyard idiocy. You might as well have said "I heard that if you looked directly at Vista the wrong way it melts down burning a hole to the center of the earth that releases a demon which finds and violates your dear-sweet-saint of a grandmother".
The commentary gets old- and I'm not defending either- but what the hell does Bush have to do with this? Or did I miss the part where we were reminded that he was never mentioned as having been in any way related to or involved at all in any way to anything that was mention anywhere in the entirety of this post?
Embracing new tech for its own sake is total nonsense. If it doesn't increase productivity then there is no point, which is exactly why it is a niche application.
Like it or not, they're stuck with it because typing has become too physically painful. Failing to improve this technology is really a health issue. As we age, and spend more of our lives in front of keyboards, the need for voice-activated software is going to be imperative, not just convenient.
But in hands and eyes busy environment it is an absolute requirement for interfacing with most practical technology. I have been witness to some of the first disatrous attempts at using speech as a gimmick. Years ago the Chrysler folks decided that sppech in a car would sell. What did the marketing types pick, "Your washer fluid is low" and best of all, "Your door is ajar". (The latter one helped launch Eddie Murphy's career as he mocked the absurdity.) When we mere techies opined that maybe letting the driver check and change things like radio station or heat / ventilation systems while not pulling one's eyes down to the controls for 20 seconds while travelling at 65, we were dismissed. So was the stupid speech in cars -- and helped kill the entire concept.
So I am still sitting in front of my "portable" computer that is "Light" at 2.5 pounds and typing on a 14 inch keyboard. Thanks Eddie Murphy!
-
by reddot1975
August 24, 2008 6:27 AM PDT
- Hopefully, this will be interesting information. I have been in the speech rec industry for quite some time, and even worked at Nuance for a while. The main reason that Nuance is not making speech recognition consumer devices is because the company is not a hardware company. If a consumer product company came to Nuance and wanted to license their embedded speech rec technology, Nuance would do it. I recall definitely 1, perhaps 2, projects over the years to put speech recognition into a remote control, for example. It doesn't seem like they really went anywhere. An example of embedded speech recognition has been used (and that supposedly people like) is the Ford Sync, by Microsoft, using the Nuance embedded engine.
-
Reply to this comment
-
See all 25 Comments >>Speech recognition is really several different markets: embedded (where recognition is done solely on the device); mobile (which could either be solely on the device or a hybrid model), desktop (like Dragon), medical / legal, and enterprise telephony.
For those of you who are doubting how good desktop dictation is nowadays, you probably have not tried one of the most recent versions of Dragon Naturally Speaking. For dictation, it is pretty awesome. There's a minimal training period, and in general it's able to get quite good accuracy. Correcting dictation mistakes with speech is also very easy. Where it does not work is well is in command and control of your computer. That is still a slow, annoying proposition.
People always want a "Star Trek" like speech rec experience. I think partly because it's cool, but also because whenever a reporter writes an article, they start off with that... In any case, as someone pointed out earlier, speech rec will need to know much more context about it's user and how they work. That's coming, but slowly. The latest version of Dragon now has shortcuts to do various types of internet searches.
In the area of enterprise telephony (IVR) speech recognition, these systems are still expensive and require a lot of custom development. So, companies tend not to upgrade very often. If you want to call a speech application with a modern recognizer, call United Airlines. You'll find that the ability to filter out noise is much better than in the past. In fact, there are still a number of systems out there that are still using ~10 year old technology.
A friend of mine has a clapper-like device that uses speech recognition, or perhaps more accurately, noise recognition. It's a nice gimmick, but a light switch works better. If you're looking to consolidate your remote control setup, get a nice universal, like a Harmony. Those things are pretty sweet.
In general, we're seeing a lot more competition in a few sectors of the speech rec market: mobile and enterprise. There are a number of up and coming enterprise speech rec engines, and a lot of services companies. Plus the technology keeps getting better and cheaper. It's an exciting time for speech recognition.