Nvidia's most significant news was revealed the week before its 2019 GPU Technology Conference (GTC) in San Jose, California (which was held March 22–26): its acquisition of Mellanox for $6.9bn, the largest in its history. Mellanox off-loads processing from the CPU onto its ASICs built into its network switches, and this improves bandwidth and performance for devices that benefit from direct connection, such as distributed GPUs. Timed with the 2019 GTC, AWS announced it is creating an EC2 instance called G4 for running Nvidia Tesla T4s – these GPUs are Nvidia's play for machine learning (ML) inference in the cloud. At the other end of the spectrum, Nvidia launched the Jetson Nano Developer Kit at GTC (priced at $99 at the conference) for embedding edge devices and running AI applications on 128 CUDA GPU cores.
The Mellanox acquisition indicates the importance of east-west traffic in the data center
Keynoting at GTC, Nvidia's CEO, Jensen Huang, talked about the exponential rise in east-west traffic in the data center, a consequence of the rise in distributed computing models that arise from cloud-native computing (i.e., containerization and microservices), as well as distributed GPUs for scaling applications, including artificial intelligence (AI). Mellanox is a leading player in intelligent networks and is used in several leading supercomputers. Its technology focus is off-loading workloads from the CPU, and it does this in four key respects:
Adaptive routing. Conventional static routing can result in bottlenecks, whereas adaptive routing can route traffic dynamically, choosing optimum routes for maximum bandwidth.
Remote direct memory access (RDMA). This enables memory access directly between two computers without going through TCP/IP and therefore not involving either computer's operating system and CPU, resulting in latency reduction.
GPUDirect. Developed with Nvidia over the last eight years, this technology provides peer-to-peer data paths between the GPU memory directly to/from Mellanox InfiniBand host channel adapters, allowing significant decrease in GPU-to-GPU communication latency by removing the need to access the CPU in all GPU-to-GPU communications across the network.
Scalable hierarchical aggregation and reduction protocol (SHARP). An improvement on the message-passing interface (MPI). In the context of machine learning workloads on GPUs, SHARP allows aggregation and reduction calculations to be performed on the network, bypassing the need for CPU access.
The Mellanox acquisition is expected to complete by the end of 2019, so neither party can say anything about next steps but, clearly, we can expect Nvidia to build on its existing relationship with Mellanox and create the next generation of distributed GPUs on intelligent networks.
As ML technology permeates into real-world applications, the need for inferencing will rise
Nvidia is the acknowledged king for training deep neural networks on its GPUs but the market also needs accelerators for running ML applications in inference mode, and here the market is more open. To address this gap, in September 2018, Nvidia launched the Tesla T4 GPU (based on its most advanced architecture, Turing) specifically for inference mode workloads. AWS now offers T4 on its EC2 G4 instance.
Nvidia aims to be positioned one level above the end node in edge computing
Edge computing is increasingly topical as technologies such as 5G, AI, and containerization converge on the edge, and will lead to innovation. Nvidia believes its strength lies in being one level above the individual nodes on the edge, where more computation is required for aggregating data traffic from multiple nodes. The launch of the Nvidia Jetson Nano Developer Kit shows where Nvidia wishes to play in edge computing, where it can be embedded in, for example, robots and small devices that require multiple sensor processing.
Michael Azoff, Distinguished Analyst, IT Infrastructure