Google's Word Recognition Error Rate Is Down To 4.9%

Google CEO Sundar Pichai took the stage during the first day of this year’s Google I/O proceedings to announce a few things, one of which was the fact that Google’s voice recognition software has thus far achieved an error rate of only 4.9%. While 1 in 20 word recognition attempts failing sounds like a lot on paper, conditions like noise and unfamiliar words, and strange pronunciation all have to be taken into consideration. While such an error rate is far from perfect, it’s close to being on par with the average human listener. According to Pichai, the way Google achieved such an astounding accuracy is through the use of neural networking.

Neural networking is also behind the “beamforming” audio technique used in Google Home, which allowed Google to release the unit with only two microphones instead of eight, greatly reducing the cost. That special technique, combined with neurally networked speech recognition on Google’s end, allows Google Home to recognize speech accurately even if the noise level around the Google Home unit is high, or the person speaking to it is far away. The same neural networking backend, trained through millions of collective exposures to human speech, is why other Google-powered objects, such as your smartphone, are able to hear what you’re saying with a decent degree of accuracy in noisy places like running cars, stores, or houses with children running around.

Neural networks are a type of machine learning in which a network of computers or nodes link together, with each one acting as a neuron in a brain. Essentially, they band together to approach and process information from different angles simultaneously, and are able to process it faster, more efficiently, and more thoroughly as a result. Back in January, Google announced that it had cut some 30% off of its error rate for speech recognition, and it’s mostly thanks to advancements in machine learning and neural networking, such as Google’s creation of their own machine learning superchip, the Tensor Processing Unit. According to Pichai, the voice recognition will continue to improve as the models are trained more, and more advancements are made in both audio recording and computer-based recognition.