COMPUTERS
August 22, 2008 6:05 AM PDT

Wonder why everything isn't speech controlled?

Posted by Steve Tobak
  • Font size
  • Print

Last November, I wrote a post titled "Top 10 technology flops." One of the 10 was speech recognition. Judging by the feedback I got from all over the Web, you'd think I'd said Apple was a flop or Bush was a great president.

What I meant, at the time, was that I was disappointed that we're not rid of all the keyboards, buttons, and remote controls by now. So I did some research and discovered that speech technology is indeed proliferating in some industries: defense, medical, call centers, and rudimentary capability for cell phones, edutainment, and high-end automobiles.

That said, I don't really care that American Airlines can recognize my voice responses on the phone. The only speech application that actually benefits me on a day-to-day basis is on my cell phone, and that's pretty basic stuff.

For the most part, we're still banging away on computer keyboards and drowning in a sea of proprietary consumer electronics devices and remote controls.

(Credit: Nuance)

And now I know why. When it comes to speech technology, one company is holding just about all the cards: Nuance Communications.

Courtesy of dozens of mergers and acquisitions (M&A) over the past 13 years, Nuance now owns much of the speech technology on planet Earth. The company boasts a $3.5 billion market cap on annual sales that will likely top $800 million this fiscal year but, remarkably, has never been profitable. I can see why. Nuance has been so busy acquiring companies it hasn't had a chance to worry about a little thing like profitability.

The company's history is a tribute to M&A gluttony. Let's see if I can get this right. In 1980, Xerox (so many of these stories begin with Xerox) bought inventor Raymond Kurzweil's optical character recognition (OCR) company and ultimately renamed it ScanSoft.

In 1999, a scanner software company called Visioneer bought ScanSoft and adopted the name. That seems to be about when current Nuance Chairman and CEO Paul Ricci entered the picture, and that's when all the fun began.

In late 2001, ScanSoft bought Lernout & Hauspie, a Belgian company that had previously acquired a host of other companies including Berkeley Speech Technologies and Dragon Systems. Amazingly, L&H--the leader in the speech technology field--was bankrupt so ScanSoft got it for a song: $39.5 million.

ScanSoft went on to acquire about a dozen other companies, including some that were themselves made up of acquired companies.

While all this was happening on the East Coast, Nuance spun off from Stanford Research Institute (SRI) in 1994. In 1996, the Menlo Park company deployed its first large-scale, call-center-based commercial speech application.

In September 2005, ScanSoft merged with Nuance and the combined company adopted the Nuance Communications name. Since then, Nuance has gobbled up another dozen companies, the largest of which being Dictaphone for $357 million and eScription for $400 million.

According to my math, the current incarnation of Nuance Communications is actually made up of 42 companies, with a $180 million acquisition of SNAPin Software in the works and an unsolicited offer of $40 million for Zi on the table. Got all that?

The company lists its competitors as AT&T, IBM, and Microsoft. Sounds formidable, but each of these giants competes with Nuance in specific, limited markets. Nuance is far and away the 800-pound gorilla of speech technology.

As for its business strategy, Nuance seems to have done a good job of focusing its limited resources on the largest vertical markets where it can optimize profit margins. The company's primary focus is on helping businesses improve efficiency and productivity while reducing costs.

The fact that the company says little about aggressively driving its technology into the consumer space is telling. That's simply not its business plan, and I can certainly understand why. The consumer electronics market is highly fragmented with thin margins and high support costs. And if Nuance wishes to avoid that, well, there really isn't much competition left to twist its arm.

I'd say Ricci is a shrewd businessman.

Still, the next time you get off the phone with an automated call-center that communicates eerily well, only to fumble around with the myriad of keyboards, buttons, and remote controls in your own life, at least you'll know what name to curse: Nuance Communications.

Steve Tobak is managing partner of Invisor Consulting LLC. He is a member of the CNET Blog Network, and is not an employee of CNET. Disclosure.
Recent posts from Train Wreck
Wonder why everything isn't speech controlled?
Survey links CEO approval to stock performance
Making sense of reorgs
Meetings suck, but they don't have to
Far out technology for the geek in all of us
How many strikes before a tech CEO is out?
The alternative-energy bubble
Corporate governance is a myth
Add a Comment (Log in or register) 25 comments
by pinchio August 22, 2008 7:00 AM PDT
I completely agree. I recently had the pleasure of listening to Clayton M. Christensen give a presentation. He is the academic who coined the phrase "The Inventors Dilemma". In this presentation he spoke about history of business innovation and how companies such as Nuance are not driven by low margin markets and focus on innovating in higher margin markets. This lack of focus on the low end paves the way for newer more nimble companies to come in and dominate the market.

The problem we are seeing here is that this is essentially not occurring. Developing speech reco technology and making it work reasonable well is really hard. Essentially all speech reco originates from academia and then branches out for licensing and proprietary applications and solutions.

We need a revolution something disruptive like the iPhone was to the cell-phone market.

For me the next big thing in speech reco is not the cellphone....its speech enabled flash in the browser. You tackle that problem (small footprint) and you've tackled the cellphone.

Rob Mitchell
Product Manager
neuroLanguage
www.neuroLanguage.com
Reply to this comment
by Renegade Knight August 22, 2008 7:42 AM PDT
Toytota capatalized on the low end market to start out. It's now poised for a permanent takeover of #2 and #1 isn't far behind. It still serves the low end market it came from.

I'm not so sure that what you propose is different from that. Get somethign that works in the low end. The rest follows.
by komarue August 22, 2008 12:21 PM PDT
Rob,

"For me the next big thing in speech reco is not the cellphone....its speech enabled flash in the browser. You tackle that problem (small footprint) and you've tackled the cellphone."

This is possible using open source software, Red 5 (Open source flash media server), BlazeDS(Open source flash -> java backend server), SPHINX (Open source ASR).

Ross Hendrickson
Software Developer
PSST Research Group
http://psst.byu.edu/
by miroslodki August 22, 2008 7:05 AM PDT
Thxs for that review
I have also been wondering why voice recognition isn't in the consumer space
(I figured out the Bush question 2 elections ago)

tapping away at a key board or worse still (mobile pad) is a patently anti-customer centric way of using all these new gizmo's regardless of what millenials may think

maybe at some point we'll catch up to Roddenberry's communicator vision
till then lets all qwerty

cheers
Miro
Reply to this comment
by Perry_Clease August 22, 2008 7:17 AM PDT
We have more than enough cacophony in the world as it is.
Reply to this comment
by August 22, 2008 8:12 AM PDT
First I agree with Perry.

I believe there are a few reasons that speech recognition is not being widely used

The first is because it actually has very limited practical application (unless you work and live alone). A room full of people can type on keyboards and interact with a mouse without disturbing their coworkers, but imagine if all of them were dictating documents. Activating lights in your home and turning on the stereo might work fine if you live alone, but if you have kids...forget it.
The second reason I think voice rec. is not widely used is because when writing, people edit on the fly. They backspace, move paragraphs and delete entire sections, and it's just too difficult to do that with voice rec.

So where could it work - I think the practical applications come into play in situations where people are already speaking - here are a few examples:

* Sales rep on the phone with a client - phone call is documented in a CRM system using Voice recognition
* Police Reports over the phone
* assistance to people with disabilities
* University Lecture Transcripts
* Updating a contact record on your cell phone (ie - someone gives you a phone number during the conversation, and at the end of the call, any phone numbers that were mentioned are listed in context with the option to save them...

I think most of the reasonable applications for vr will be in the telephony space.

cheers,

Keith www.endsville.ca
Reply to this comment
by mszlazak August 24, 2008 5:58 PM PDT
Keith, I agree with you almost entirely that speech recognition is mostly limited to where speech is used. One place it could be used is in an automated appointment scheduling service or application for small businesses like Doctor's offices, beauty salons, Spas, etc. One such service is www.angelspeech.com but it looks like it needs some polishing with regards to how self-scheduling is handled since providers don't like giving up to much control to their clients over appointment calendars.
by Peet42 August 22, 2008 8:31 AM PDT
I don't know how much truth there is in this story, but I heard that when Vista first launched with speech control enabled by default there was a van with a PA system driving around the business district of one large city shouting out:

"Speech on"
"Command Prompt"
"C colon"
"CD backslash"
"Del star dot star slash s"

Just to prove how insecure it was. :-)
Reply to this comment
by yelocab August 22, 2008 8:50 AM PDT
I think everything isn't speech controlled because the things we do (just on computers, for example) is much more complex than we *think* they are. For example. I could say: "find the e-mail from Erin Daylor from last week". First of all, the computer would have to understand what I said and that "Erin Daylor" is not "Eric Taylor" (another co-worker). Or how about the command: "Create a new layout from the proposal template"? We know that it is an InDesign template, located on the server, in the department folder, within the templates folder, but the computer doesn't know that. We would have to say everything in specific steps, like we were talking to a child. Think about having a person sitting at a computer, and you telling them exactly what to do. It would get really annoying pretty fast and we would finally just say "move over, I will do it myself". Even something simple like "Switch to the text selection tool". (or "text selection"). It's easier/faster to just click the toolbar rather than have to say all that and wait for the computer to react and understand. After the computer messed up a few times, we would no longer trust the computer and give up with speech.
I think speech has come a long way. I just called the cable company and said to the automated system "my cable is out". It understood me.
Reply to this comment
by hawkeyeaz1 August 22, 2008 9:41 AM PDT
I don't expect it will happen (from Nuance at least) any time too soon, but Free/Open Source Software would potentially be the answer to the thin profit, high support cost issue, as well as improving it to resolve the latter.
Reply to this comment
by username74 August 22, 2008 9:52 AM PDT
Who says that call center VR is good?

Just yesterday I called my doctor's office...

Answering system:
... if you would like to speak to an operator, say "operator"

Me:
"operator"

Answering system:
I think you said "operator" is that correct?

Me:
"Yes"

Answering system:
I'm sorry I didn't understand your response. I'll connect you to the operator.

Me:
"Ha Ha Ha"

I despise talking to inanimate objects.
Reply to this comment
by MadLyb August 22, 2008 9:54 AM PDT
You forgot to mention the incredibly deep War Chest of Patents that Nuance owns. This is more stifling than them ignoring the consumer market because it would be very risky for a start-up to try and tackle with dark cloud of possible litigation always following them around.
Reply to this comment
by craigber August 22, 2008 10:08 AM PDT
The problem with using voice for everything in the office is the noise it would create. Imagine an office with even a couple of dozen people huddled in their cubicles, all talking to their computer. No thanks.
Reply to this comment
by stobak August 22, 2008 10:15 AM PDT
Good comments.

To the issue of kids or others being around, each person has a unique voice print that devices can recognize (technology already exists), plus any command is of course proceeded by a key word that gets the device's attention. Think controlling relatively simple consumer devices, i.e. TVs, DVRs, phones, etc., relative to complex computer commands. And yes, a crowded meeting or conference room is a unique situation.

Nuance's technology lead and patent portfolio is of course the issue. As a result of their shrewd acquisition strategy, they have a commanding lead over everyone, including IBM who has been at this speech game for decades. When it takes hundreds of millions of investment capital and a decade or two of R&D to bring an advanced technology like this to market, you can forget "open source." There is no trendy phrase that will magically produce competition.

Steve Tobak

Steve Tobak
Reply to this comment
by InklingBooks August 22, 2008 10:34 AM PDT
You brought it up, so it's on topic. Bush has been a marvelous President, particularly in comparison to Gore and Kerry. With either of the latter two in the White House, Iraq would still be a murderous dictatorship, funding terrorists (as indeed it was, paying money to the families of those who killed Jews). Under any liberal Democrat, the Arab-Muslim world wouldn't now be building its first democracy. And yes, I learned years ago that liberals don't care about sharing the benefits of democracy with others. Stalin, Castro, Vietnam, the last decade and a half of the Cold War all demonstrate that. Iraq merely confirms that "Freedom for me but not for thee" is Rule 1 for liberals. Liberals aren't being clever when they bash Bush. They're simply displaying their chronic indifference to the suffering of others, a indifference that begin before birth with legalized abortion.

Now to the main topic. We're not interested in running our technology by speaking because speech is a highly voluntary activity, meaning we exercise great care in what we say, since saying the wrong thing can get us into serious trouble. We taught that as children and spend our adult lives mastering it.

On the other hand, we're quite relaxed with using our other motor skills, because there's far less risk involved and because our brains are designed to deal with them in the background. We drive, eat, and work with our computers by depending on motor skills that are almost automatic. We can type with cheerful abandon because we can edit before I send. On the other hand, we have to be careful with what we say because most of the time someone is listening.

Nothing in the foreseeable future is going to change that, and as long as that's true, speaking to computers is going to feel unnatural and uncomfortable.

--Michael W. Perry, editor of Chesterton on War and Peace: Battling the Ideas and Movements that Led to Nazism and World War II
Reply to this comment
by The_Decider August 22, 2008 11:33 AM PDT
Wow, this is what Fox produces, people who still think Iraq had anything to do with Al-Queda or 9/11.

How can Bush be a "marvelous" president compared to two people that never were? Typical Fox "logic".

I guess it is tru, you can fool 21% of the people all of the time. Too bad your hero is going to go down as the worst president ever. One who brought totalitarianism to our shows and scared people into accepting it.
by Dalkorian August 22, 2008 12:09 PM PDT
Your "marvelous" president has really done a marvelous job on our country, hasn't he. He's brought us such joys as 9/11 (HE KNEW!), unjustified war for oil (Iraq), a trashing economy, depressed dollar, treason, torture and the virtual abolition of the Constitution (ever heard the name Padilla? Did you know he was an AMERICAN CITIZEN? Look up what your "marvelous" fuhrer bushit has done with him and ask what's preventing him from doing the same to any other citizen).

But why am I bothering, you've apparently already been hopelessly programmed by the nazi repuke party. Do your country a favor please and miss the next election. You haven't got the brains to vote like an adult.
by gridwerk August 22, 2008 10:45 AM PDT
I think to assume that a business computer can be wiped simply by shouting commands at it from a loudspeaker in a passing car is the very top of the naivety pyramid. Yes its apparently hip to hate Vista but no one is dumb enough to create a system flaw with that level of oversight yet still have the intelligence to maintain a job as a programmer for one of history's most successful companies.

Tone the hate down, son, your comment smacks a bit of schoolyard idiocy. You might as well have said "I heard that if you looked directly at Vista the wrong way it melts down burning a hole to the center of the earth that releases a demon which finds and violates your dear-sweet-saint of a grandmother".

The commentary gets old- and I'm not defending either- but what the hell does Bush have to do with this? Or did I miss the part where we were reminded that he was never mentioned as having been in any way related to or involved at all in any way to anything that was mention anywhere in the entirety of this post?
Reply to this comment
by Tsee August 22, 2008 10:48 AM PDT
Exactly. For me talking to voice menus is like talking to a tree. Give me a key-based menu and I'll get to where I want to be. Plus, you don't need to worry about being on a noisy street or loud bar. Thank goodness voice systems haven't taken over everything.
Reply to this comment
by The_Decider August 22, 2008 11:35 AM PDT
For all the reason given here, VR will never replace a keyboard in many settings. It is also not good enough to even spend time with it ditacting letters yet. It has come along way, but it is not good enough and would cut productivity by 90%.

Embracing new tech for its own sake is total nonsense. If it doesn't increase productivity then there is no point, which is exactly why it is a niche application.
Reply to this comment
by meles78 August 22, 2008 12:00 PM PDT
Honestly, I can't believe no one has mentioned this before: voice activated sotfware has the potential to be not only more convenient than a keyboard, but healthier. For some people crippled by repetitive stress injuries, voice activated software - in all its clunkiness and glitches - is simply the only option.

Like it or not, they're stuck with it because typing has become too physically painful. Failing to improve this technology is really a health issue. As we age, and spend more of our lives in front of keyboards, the need for voice-activated software is going to be imperative, not just convenient.
Reply to this comment
by cheepsh0t August 22, 2008 12:27 PM PDT
I see that the writer of this article and his editor could not resist giving us their opinions of Bush. Maybe the writer of this article can tell us which president he thinks are good. Let me guess....Perhaps Carter or Clinton??? Here's an opinion from me: Cnet news is no good. Here is a fact: I'll no longer be visiting Cnet news. Zero tolerance!
Reply to this comment
by gsigas August 22, 2008 12:49 PM PDT
The argument that this technology will not take off because of noise in the office is short sighted. One of the primary purposes of an office is voice communication (via telephone or face to face). Talking to a computer would probably be via a headset and would be no different than phone calls in the office (and background noise and loadspeaker commands over a PA would not be an issue since the computer is only listening to the headset mic). The same way accidental commands, when not talking to the computer, can be avoided with a mute button on the headset. A combination of voice and gesture is much more efficient than keyboard and mouse and will probably be the way we interface with computers in the future. The main limitation is not the voice recognition but the computer's intelligence. Voice and gesture are the way humans naturally communicate, eventually (easily within our lifetime) computer's will be smart enough to respond.
Reply to this comment
by TomMariner August 23, 2008 4:54 AM PDT
I was in the beginning of practical speech synthesis and recognition. It's tough. Not recognizing 95% or 98% of one speaker's controlled utterances when the beginning of speech is known. As one approaches the only practical 99.99% in an uncontrolled conversation the difficulty increases exponentially.

But in hands and eyes busy environment it is an absolute requirement for interfacing with most practical technology. I have been witness to some of the first disatrous attempts at using speech as a gimmick. Years ago the Chrysler folks decided that sppech in a car would sell. What did the marketing types pick, "Your washer fluid is low" and best of all, "Your door is ajar". (The latter one helped launch Eddie Murphy's career as he mocked the absurdity.) When we mere techies opined that maybe letting the driver check and change things like radio station or heat / ventilation systems while not pulling one's eyes down to the controls for 20 seconds while travelling at 65, we were dismissed. So was the stupid speech in cars -- and helped kill the entire concept.

So I am still sitting in front of my "portable" computer that is "Light" at 2.5 pounds and typing on a 14 inch keyboard. Thanks Eddie Murphy!
Reply to this comment
by reddot1975 August 24, 2008 6:27 AM PDT
Hopefully, this will be interesting information. I have been in the speech rec industry for quite some time, and even worked at Nuance for a while. The main reason that Nuance is not making speech recognition consumer devices is because the company is not a hardware company. If a consumer product company came to Nuance and wanted to license their embedded speech rec technology, Nuance would do it. I recall definitely 1, perhaps 2, projects over the years to put speech recognition into a remote control, for example. It doesn't seem like they really went anywhere. An example of embedded speech recognition has been used (and that supposedly people like) is the Ford Sync, by Microsoft, using the Nuance embedded engine.

Speech recognition is really several different markets: embedded (where recognition is done solely on the device); mobile (which could either be solely on the device or a hybrid model), desktop (like Dragon), medical / legal, and enterprise telephony.

For those of you who are doubting how good desktop dictation is nowadays, you probably have not tried one of the most recent versions of Dragon Naturally Speaking. For dictation, it is pretty awesome. There's a minimal training period, and in general it's able to get quite good accuracy. Correcting dictation mistakes with speech is also very easy. Where it does not work is well is in command and control of your computer. That is still a slow, annoying proposition.

People always want a "Star Trek" like speech rec experience. I think partly because it's cool, but also because whenever a reporter writes an article, they start off with that... In any case, as someone pointed out earlier, speech rec will need to know much more context about it's user and how they work. That's coming, but slowly. The latest version of Dragon now has shortcuts to do various types of internet searches.

In the area of enterprise telephony (IVR) speech recognition, these systems are still expensive and require a lot of custom development. So, companies tend not to upgrade very often. If you want to call a speech application with a modern recognizer, call United Airlines. You'll find that the ability to filter out noise is much better than in the past. In fact, there are still a number of systems out there that are still using ~10 year old technology.

A friend of mine has a clapper-like device that uses speech recognition, or perhaps more accurately, noise recognition. It's a nice gimmick, but a light switch works better. If you're looking to consolidate your remote control setup, get a nice universal, like a Harmony. Those things are pretty sweet.

In general, we're seeing a lot more competition in a few sectors of the speech rec market: mobile and enterprise. There are a number of up and coming enterprise speech rec engines, and a lot of services companies. Plus the technology keeps getting better and cheaper. It's an exciting time for speech recognition.
Reply to this comment
 See all 25 Comments >>
advertisement

In the news now

Yahoo's Decker strong contender for CEO

Sources say the president of the embattled Internet search pioneer has been through two rounds of interviews with the board.


Gadget extravaganza in Las Vegas

CES 2009 is in full swing. Highlights so far include Palm's WebOS and Pre device, Microsoft's Windows 7 beta, and much more.


About Train Wreck

Steve Tobak is a marketing consultant and former chip industry executive. Train Wreck provides insight into dysfunctional corporate behavior, among other things. When he's not airing the industry's dirty laundry, Steve likes to hang around the house, make believe he's working, and drive his wife crazy. Find out more at www.invisor.net or email Steve at trainwreck@invisor.net. He is a member of the CNET Blog Network and is not an employee of CNET. Disclosure.

Add this feed to your online news reader

Train Wreck topics

advertisement
advertisement

Inside CNET News

Scroll Left Scroll Right