| | | | | | Listen to Me | | | | | |

The Kurzweil Applied Intelligence Alumni Newsletter

Go to:

Welcome

Table of Contents

What's New

Registration

Database

Speech recognition programs still on ABCs

By Hiawatha Bray, Globe Columnist, 4/28/2003

Two of the leading speech recognition companies merged the other day, and the most interesting aspect of the deal is how little it seems to matter. That's no knock on the companies involved. ScanSoft Inc. and the SpeechWorks International Inc. it acquired each offer powerful and well-regarded technologies that enable computers to recognize human speech. After the merger, the new ScanSoft may well establish itself as the leading player in this market.

It's just that the market is still rather small. Most American homes and virtually all businesses have computers, and speech recognition software for home computers has been widely available since the mid-1990s. But hardly anybody composes documents or surfs the Web by speaking instead of typing. And there's a good reason for this. After nearly a half century of research, computers are still lousy at interpreting the subtleties of human speech.

The effort hasn't been a total loss -- far from it. Speech software is doing just fine in a handful of specialized niches well-suited to its technical limitations. You can dial 800 directory assistance and a computer will look up the number you ask for. That's telephone-based speech recognition, one of SpeechWorks' specialties, and according to research firm Frost & Sullivan, it's a $100 million market in North America alone.

There's also a noteworthy boomlet in embedded speech devices --the cellphone that dials a number when you say ''home,'' or the luxury car that'll obey your order to switch on the radio.

But plop down in front of the household PC and ask it to jump over to the CNN Web page, then flip to the sports section. There are ways to do this with voice commands, but by the time you've figured it out, baseball season may be over. Never mind the latest Pentium 4 or PowerPC chip; your computer is still too dumb for such simple tasks.

It's not that it can't correctly interpret your words. Today's speech recognition software is remarkably good at that. Install a copy of IBM's ViaVoice on a late-model PC with lots of memory. It takes less than half an hour to ''train'' the software to recognize your voice and the sound properties of the room. From then on, the program will transcribe your words with over 90 percent accuracy. Yes, it'll make the occasional gaudy error, but so do human transcriptionists. It was a person who blundered last week while transcribing an ABC News broadcast for closed-captioning. Viewers were informed that Federal Reserve chairman Alan Greenspan ''was in the hospital for an enlarged prostitute.''

An amusing blunder, and a revealing one. A human reader would immediately and correctly guess that the word should have been ''prostate.'' That's because people understand context. We analyze the overall meaning of the speaker's words, to ensure correct understanding. ''Speech recognition alone is not enough,'' said Victor Zue, the speech recognition expert who runs the Laboratory for Computer Science at the Massachusetts Institute of Technology. ''You have to understand what people say.'' Alas, even today's most powerful computers have no idea what our words mean. Prostate or prostitute -- it's all the same to a piece of silicon. And this digital ignorance has unfortunate consequences for speech recognition. Consider the first sentence of the previous paragraph. If you read it to an English-speaking human, you wouldn't need to read the punctuation marks. Your tone of voice would convey that information. But here's how you'd read that sentence to a speech recognition program: ''New paragraph -- an amusing blunder comma and a revealing one period.'' The computer needs explicit punctuation commands. It no more understands your vocal inflections than it does the words you're uttering.

ScanSoft calls its dictation program Naturally Speaking. But there's nothing natural about having to say punctuation marks. In fact, it's a time-consuming nuisance, and one reason that people give up on dictation software.

Here's another reason: noise. Your brain filters it out, ignores it. But speech recognition software treats all sound as speech. That's why dictation programs need special microphones that filter out extraneous noise, and must be used in a fairly quiet room. That's also why they've never caught on in corporate America, where few people work in soundproof booths.

Still, computers do all right with human speech, when we don't ask too much of them. Medical specialties like radiology are extremely complex, yet radiologists have a surprisingly limited vocabulary for their work -- so limited that many use speech recognition software to transcribe their diagnoses. That's the same reason the technology works for directory assistance or travel reservation services. Give the computer a small enough vocabulary to work with -- ''Boston, New York, aisle seat'' -- and it'll do OK. There are enough such limited applications to keep ScanSoft's engineers happily employed for years.

The rest of us will only speak to our computers when we swear at them for crashing. And even then, we'll be wasting our time.