Artificial intelligence programs that are made to transcribe text into speech instead of using more rigid hand-coded programs are not an entirely new thing, but Google’s newest, called Tacotron, is one of the most advanced and forgiving of the bunch. Tacotron boasts the ability to dynamically control shifts in pitch and tone based on context, including prosody, the tendency of certain speech to resemble singing. The kicker is in just how smart Tacotron is about exactly what it’s reading; for example, it can not only account for some seriously bad typos without missing a beat, but can figure out which variant of a word it’s looking at when it deals with the same word pronounced differently in different context, or words that look alike but sound different.
Tacotron’s list of talents is quite wide. It’s sensitive to punctuation and capitalization for emphasis, and can figure out how a sentence is supposed to sound based on both those qualifiers and context clues in the sentence and surrounding phrases. Thanks to specialized code made to truncate output strings through calls to related code, Tacotron is not only smarter than existing text-to-speech programs like GRU and seq2seq, but is also faster in most scenarios. The headline feature of Tacotron is the ability to handle words it’s never seen before by sounding them out in a similar fashion to children who are encountering words in text form for the first time. Tacotron can even handle spelling errors, making it read seamlessly over typos or even mostly incoherent blobs of text as if there’s nothing wrong with them.
While Tacotron is quite advanced as far as text-to-speech AI programs based on machine learning go, Google readily admits that Tacotron doesn’t sound quite as natural as text-to-speech engines that piece together recordings of humans talking just yet. While that facet could improve with time, the primary reasons for research into Tacotron is that it’s cheaper, less time-consuming, and far more flexible to implement than a pre-recorded speech synthesizer. The white paper for Tacotron, along with a few audio samples, is available through the source link on Github, though Tacotron is currently not open-source.