Stephen Hawking and a 9-​​year-​​old girl with a speech dis­order most likely use the same syn­thetic voice. It’s called Per­fect Paul and it’s easy to under­stand, espe­cially in acousti­cally chaotic envi­ron­ments like class­rooms full of chil­dren. While new, more natural-​​sounding voices are avail­able, Per­fect Paul remains the most oft-​​used syn­thetic voice in the com­mu­nity of dis­or­dered speakers.

But Per­fect Paul con­veys none of the per­son­ality inherent in vocal iden­tity, explains Rupal Patel, an asso­ciate pro­fessor of com­puter sci­ence and speech lan­guage pathology and audi­ology.

What we’re trying to do is improve the quality,” she said, “but also increase the per­son­al­iza­tion of those voices, by not just making it a little kid’s voice, but making it that little kid’s voice.”

Backed by a grant from the National Sci­ence Foun­da­tion, Patel and her research team are devel­oping ways to create per­son­al­ized syn­thetic voices that resemble users’ vocal iden­ti­ties while remaining as under­stand­able as those of the healthy donors.

In the first iter­a­tion of the project, which Patel calls VocaliD (pro­nounced vocality, for Vocal Iden­tity), her team com­pu­ta­tion­ally merged the acoustics of a sus­tained vowel sound from a child with a speech dis­order, like this:

with the acoustics of a full sen­tence spoken by a healthy speaker of the same demo­graphic, like this:

The result is a clear, syn­thetic voice with the per­son­ality of the end user:

These voices have already elicited great responses from par­ents; one said, “If [my son] had been able to talk, this is what he would sound like.” How­ever, the early ver­sion of VocaliD used a difficult-​​to-​​scale  approach that is not easily repro­ducible. Patel said, “We’d like to be able to allow users to create new voices as they mature in the same way a nat­ural voice evolves.”

With the sup­port of another grant from the National Sci­ence Foun­da­tion, her team is cur­rently adding phys­i­o­log­ical infor­ma­tion on top of the acoustics.  “When you hear speech, it’s a com­bi­na­tion of your source and your filter,” Patel said. The source, she explained, derives from the voice box in the larynx whereas the filter is deter­mined by the shape and length of the vocal tract.

Vocal characteristics—such as pitch, breath­i­ness, and loudness—all emerge from the vocal folds in the larynx and give rise to vocal iden­tity. Mod­u­lating those fea­tures by changing the shape of our mouths and moving our tongues gives rise to dis­tinct vowel and con­so­nant sounds, which, Patel said, are typ­i­cally impaired in dis­or­dered speech.

Using data from a set of sen­sors placed on par­tic­i­pants’ tongues and mouths, the researchers will deter­mine the most effi­cient way to approx­i­mate the phys­ical aspects of the dis­or­dered speaker’s vocal tract. They can then add this infor­ma­tion into the voice-​​synthesis soft­ware to create voices that will grow and change as the users mature.

Image cour­tesy of Rupal Patel.

The aca­d­emic com­mu­nity has long accepted the source-​​filter theory of speech, but more work needs to be done in order to under­stand it, according to Patel, espe­cially as researchers develop more advanced speech tech­nolo­gies for secu­rity and other applications.

Patel’s work in par­tic­ular also aims to inform basic research ques­tions such as, “How much do both the source and filter con­tribute to the iden­tity of a speaker’s output?”

Patel’s soft­ware is com­pat­ible across assis­tive tech­nology plat­forms, including main­stream touch-​​pad devices, a fea­ture she hopes will increase its adop­tion within the com­mu­nity. Patel spec­u­lates that assis­tive com­mu­ni­ca­tion devices will even­tu­ally appeal to healthy people as a new way of learning, com­mu­ni­cating, and interacting.

The iPad rev­o­lu­tion is helping to break down bar­riers and increasing the emphasis on user inter­face issues,” said Patel, who has been working to improve assis­tive com­mu­ni­ca­tion tech­nolo­gies for more than 16 years. “Lots of kids, both healthy and impaired, are using screens to interact now.”