Introduction
There are only a couple of devices that spring to mind running the Nvidia Tegra K1 processor, the Shield Tablet (using the 32-bit version) and the HTC Nexus 9, which uses the 64-bit version. The 32-bit version of the K1 is referred to as the K1-32 and combines a quad core ARM Cortex-A15 processor with Nvidia’s powerhouse of a graphics processor unit. The first Android tablet with an ARM Cortex-A15 processor is the Samsung Nexus 10, which still shows respective performance today. The K1-64, 64-bit version, uses Nvidia’s own custom dual-core Denver CPU to replace the ARM Cortex-A15. Both use the same 192-core GPU, Graphics Processor Unit, which pulls mobile device video performance up to desktop class levels. My experience of the 64-bit Tegra K1 is far too limited to gain an understanding as to how it performs in the real world but now Nvidia have dropped a new processor (or, more accurately, a system on a chip or SoC) at the CES, Consumer Electronics Show. This is brilliant news: it means I can pour over specifications and technical documents and summarise it in an article for Android Headlines. Now, we’ve already introduced the Nvidia Tegra X1, but let’s take a deeper look at the SoC.
Processor and Video Architecture
I need to separate the CPU (central processor) from the GPU and I’m going to write about the CPU first. The X1 isn’t an evolution of the K1’s 64-bit dual-core unit but instead uses a familiar arrangement of eight ARM reference cores. Four are the high powered 64-bit ARM Cortex-A57 and four are the high-efficiency ARM Cortex-A53. So far, this sounds like the Qualcomm Snapdragon 810, right? Sure, except Nvidia have implemented their own core and power management arrangement. The X1 isn’t using the typical big.LITTLE arrangement we’ve seen from other processor manufacturers (whereby the A53s are used for lightweight duties and the A57s are plugged in when more performance is needed). Instead, all eight cores are available to the device at all times. The processor is constructed on a 20nm die size and Nvidia’s claims are that it offers 140% of the performance of the Samsung’s Exynos 5433 for the same power consumption. It isn’t yet clear how the X1 achieves this, but it’s believed to be because of more efficient processor throttling, turbo management and processor management techniques. Essentially, other than using ARM’s reference CPU architecture, everything else is Nvidia-designed. This is an interesting departure. The Denver processor core, used in the Tegra K1-64, is Nvidia’s own design and they’ve stated that their long term development still includes Denver. Essentially, the X1 was produced in a short time and that’s why they hotplugged the ARM reference A53 / A57 processor unit and bolted on their own GPU.
Let me write about the GPU as this is Nvidia’s strength. By the numbers, the X1 increases the number of GPU cores from 192 (in the K1) to 256. The number of texture units is doubled from 8 to 16 and the GPU clock speed has been given a small bump from around 950 MHz to 1.0 GHz. There’s a big increase in memory clock speeds, from 930 MHz to 1.6 GHz, and it now uses newer generation memory units, too.
The Tegra X1 uses Nvidia’s Maxwell architecture, which was designed for the Tegra processor rather than ported to the Tegra from the desktop class systems. One of the side effects of this is that the desktop Maxwell-based video cards are showing industry-leading power efficiency as a result of the mobile bias in development. These benefits have and will continue to cascade down into the Tegra SoC. In terms of features, the X1 adds over and above the already impressive K1 but a key advantage is the improvement in memory bandwidth plus some intelligent memory compression code. These new GPU features, examples including third-generation delta color compression, volumetric tiled resources and multi-frame anti-aliasing, require a lot of memory bandwidth to prevent the GPU from stalling because it has no information to process.
The Tegra X1 uses what Nvidia are calling a double speed FP16 (floating point, 16-bit) unit by persuading the 32-bit FPU (floating point unit) to run parallel 16-bit calculations. The fewer the bits, the less the mathematical precision but Android performs many 16-bit floating point operations. This one little hardware feature may greatly accelerate certain Android functions. The Tegra X1 is approximately three times quicker at FP16 operations compared with the Tegra K1.
Power and Heat
Every new generation of processor makes claims to both increase performance whilst reducing power consumption, with a side helping of reducing heat output. Heat is especially important in our mobile devices because many chipsets are quickly thermally limited, being forced to underclock to prevent damage to the sensitive electronics and this in turn limits how responsive the device is. It was relatively easy to cause my LG Nexus 4 to throttle back and it was obvious to use, as well. And when it comes to new high-end processors, manufacturers need to ensure that a given chipset won’t only be able to run at full performance for a few seconds before it throttles itself back. Here, the processor manufacturer will define a TDP, Thermal Design Power. Depending on where this is defined, it can mean under maximum load but more typically, it’s under whatever the manufacturer determine a “typical load” is, running a mix of applications. The device chassis needs to be designed to cope with exhausting this much heat from the chipset without resorting to a fan stuck on the back of the tablet. Nvidia’s reference hardware was a typical tablet and could remove around 5W of power. The X1 is manufactured the X1 on a 20nm process and this helps. The smaller the chip, the less voltage that’s required (as there’s a shorter distance for the electrons to travel). Voltage is important because the power consumption (and hence heat produced) is proportional to the square of the voltage applied. A small reduction in voltage has a much greater knock-on effect in power consumption.
Performance
I’m not a fan of synthetic benchmarks, especially run on reference hardware, which is exactly what Nvidia demonstrated the Tegra X1 on. For the interests of brevity I’m not going to cite all of the sources’ benchmark scores but I’ll summarize: the Tegra X1 finished the Apple A8X like a cheesecake, which Nvidia considers to be the most powerful SoC available today. The X1 is around twice as powerful as the K1 at certain benchmarks. Where it falls short of double the performance, it’s at around the 150% point. The Tegra X1 is fast! Of perhaps more importance is the power consumption, which when compared with the Apple A8X, is around 60%, performance for performance. In other words, the X1 uses a little over half the power of the Apple A8X. With the X1, you can have that cheesecake and eat it too.
Availability and Wrap Up
About twice the performance, using less power; where does the Tegra X1 put a foot wrong? We’ll get a better idea once it starts to arrive in devices. Meanwhile, we can’t say when the processor will start to ship but “later this year” is a good bet. And it’s great to see the Snapdragon 810 seeing some stiff competition. I can’t wait for a showdown between new high-end devices running different processor configurations! Bring it on, Qualcomm, Samsung, MediaTek!