X

Google's Advanced Language Parser Just Went Open Source

When you say, “OK Google“, you’re not just talking to Google’s servers. You’re talking to Parsey McParseface, a component of SyntaxNet. The aptly, if a bit strangely, named parsing tool is responsible for taking natural human language and converting it into something a computer can use as instructions. Parsey takes the bits and pieces of a sentence and figures out their part of speech, then figures out their function, structure and meaning from there in a process known as dependency parsing. If this sounds like a process that computers may have a hard time with, that’s because it is. Parsey, however, is the best in the business, performing only a few percent off from estimations of human performance based on linguists’ agreement on random texts presented both to them and to Parsey. SyntaxNet, meanwhile, is the database system that, to put it shortly, gives Parsey common sense. Google announced today that Parsey and SyntaxNet are now open source, opening up the vast possibilities that they present to any tech firm willing to have them.

Much like the ancient game of Go, human language presents nearly infinite possibilities in even the simplest sentences. For example, one could say, “What is the time in England right now?”. The meaning is obvious to a human, but to a machine, there are a few different meanings. The sentence could mean you want to know the current time in England at the moment, it could mean you want to know the definition of the phrase “the time” as used in England, you may be asking for the time in a place called England Right Now, or it could mean, if the parser doesn’t catch that it’s a question, that you are telling the parser that the current time in England is “what”. It’s the parser’s job to figure out which one you mean based on the way the sentence is arranged, any punctuation, or fluctuations in tone in the case of speech recognition, and which meaning is most likely based on currently available data.

Opening up Parsey to the open source community could mean we will be seeing natural language recognition develop further as developers outside of Google get their hands on the project with a fresh set of eyes. It could also mean that we will be talking to Parsey a lot more often, such as giving voice commands to a Linux distribution or speaking to it as a proxy for a chatbot. The source code for SyntaxNet and Parsey are available for download right now via the source link.