Google’s core product, Search, has required double the bandwidth (or more) every year for the last decade. This growth has required more than an off-the-shelf design for Google’s server farms; Google realized early on (in 2004) that it would be a better option to build its own hardware. Now, a decade after the company’s first datacenter was pieced together, the scale of Google’s networking configuration and datacenter management benefits from academic input and why the current network, called Jupiter, can deliver a petabit per second of throughput. It’s 100,000 servers can randomly speak to another at a speed of 10 Gb/s, which is a hundred times quicker than the first generation datacenter network created back in 2005. The skills and knowledge that Google has learnt in the last ten years is one reason why Google is presenting a networking paper at the Sigcomm conference at London, going into the details behind Google’s five generations of datacenter network architecture.
The reason why Google built its own servers using hardware from traditional vendors, such as Qualcomm, is to leverage on its existing in-house expertise: software. Off-the-shelf switches, used in traditional servers, involve “manual and error prone” techniques that were unable to scale to meet Google’s needs. Instead, software switching is cheaper to design and considerably easier to control remotely, which is something that has becoming increasingly important for Google with datacenters spread across the world. It is also very much a Google thing; they are at their core a software business.
Google’s blog details some of the benefits of the software switching technology – including how engineers did not need to optimize their code to include layers of bandwidth hierarchy, because Google’s network architecture is designed around a high performance network supporting many, many servers (numbering in the tens of thousands) with flat bandwidth between them. This configuration allows applications to be highly scaleable, and for a business that has relatively recently started promoting and marketing its cloud computing platform to businesses, the ability to scale a client’s application is very important. Google have rolled out this relatively flat server structure across its entire cloud computing farms. They have also utilized other technology projects such as the Bandwidth Enforcer, a means of allocating wide area bandwidth across many thousands of individual applications based on a core, central configuration policy. Another technology is TIMELY, designed to manage datacenter bandwidth allocation whilst maintaining high responsiveness and low latency times. Google’s efforts have remained very much biased around the need to keep their server and cloud computing clusters scaleable across the globe.