Our voice is so much a part of our identity that it’s hard to imagine not having one of our own. Yet the 2.5 million Americans with speech impairments can choose from only a handful of generic computerized voices.
This sad fact was driven home for associate professor Rupal Patel several years ago while attending a conference on speech pathologies. As she entered the convention center, she overheard a little girl and an older man talking to each other—and they were using the same computerized voice.
“There was something that really didn’t feel right about that,” says Patel, who has a joint appointment to the speech pathology and computer science departments.
“We would never think of giving a little girl the prosthetic limb of an older man, so we shouldn’t do this with voices either.”
Mobilized by her epiphany, Patel took action. For the past five years she has been developing VocaliD, a technology that creates personalized voices by blending the basic vocal sounds of the speech-impaired person with those of a “speech donor.”
The process works like this:
1. Patel records the speech-impaired client making a variety of simple vowel sounds. This captures the personal quality of the person’s voice, even though the client can only make the most basic sounds.
2. She then enlists a “voice donor” to read hundreds of simple sentences that include all the sound combinations used in the English language. It is essential to find a voice donor that matches well with the client. “Each of us has a unique voiceprint that reflects our age, our size, and sometimes even our lifestyle,” says Patel, noting that Henry Wadsworth Longfellow referred to the human voice as “the organ of the soul.”
3. Patel’s computer program, VocaliD, chops up these sentences into speech fragments, forming a “speech bank” that can be accessed to create sentences never before spoken by the donor. Because the two voices have been blended, the computer-generated voice sounds human and matches the client’s natural voice.
Patel says that her 6-year-old daughter captured the process perfectly when she called it “mixing colors for painting voices.”
According to Patel, the key to developing customized voices on a large scale is the creation of an international voice bank. On Dec. 5, she announced the creation of such a bank (see VocaliD.org), while giving a TED Talk in San Francisco.
So far, Patel has seen this process through to completion three times. Her first client was a 9-year-old boy named William. When she first played the new voice for the boy’s family, his mother lit up and said, “This is what William would have sounded like if he had been able to speak.”
But it was William’s own reaction Patel found most moving. When he heard the voice, the 9-year-old smiled broadly and said, “I never heard me before.”