At its recent Xilinx Developer Forum (XDF) 2018 in San Jose, the hitherto field-programmable gate array (FPGA) company announced more details about its new adaptive compute acceleration platform (ACAP) device, Versal (formerly Everest project), as well as calling itself a compute platform company on the back of its latest generation devices. ACAP is fabricated in the latest 7nm technology and hosts multiple tightly coupled processors including dual-core ARM Cortex-A72 and Cortex-R5, an FPGA, and a new artificial intelligence (AI) engine. A wide range of applications are targeted including 5G, cloud, autonomous driving, and edge computing, as well as AI, primarily for inference mode but also encompassing training modes. The first samples of Versal will appear in 2019. Xilinx is already a successful FPGA company and sees future growth in taking a share of the growing AI applications market. It realizes the key is to appeal to a general class of software developers and not just its traditional embedded programmable hardware specialists through the provision of a middle layer that abstracts away the complexity of programming ACAP. This is the right approach for Xilinx to take to broaden the appeal of its product portfolio. With several new rival AI accelerator processors to appear over next few years, this market will become highly competitive because having a software abstraction layer is as important as having the hardware.
Appealing to software developers
While Versal abstracts complexity even at the highest level, software developers will still need to understand how to allocate application workloads to the different specialized processors on the platform. Xilinx will have a simulator available for developers to optimize how they partition workloads across the platform, as well as to identify the best Versal processor in its 6-fold series options. At the highest level, programmability in C/C++ and Python as well as popular AI frameworks such as Caffe, MXNET, and Tensorflow will be supported. Developers with hardware knowledge can program the platform at lower levels of the stack. Xilinx is also making its software available in containers pre-packaged with dependencies to ease development. Libraries and pre-designed domain-specific accelerators (DSA) from partners will also ease the adoption path.
At the event, customers appeared on stage to talk about how they plan to use Versal. Nokia talked about the need for machine learning optimization in 5G beamforming, and AMD talked about the second generation of cache-coherent interconnect for accelerators (CCIX) for interfacing and accelerating its CPU with FPGAs. Xilinx also revealed benchmarks showing the superiority of Versal over competing technologies such as CPU and GPU (Versal also requires a CPU in combination).
Accelerating the whole application and not just a subset of compute workloads
In benchmarking devices, it is typical to compare throughput, power consumption, and latency for specific workloads such as convolution neural network classification tasks. However, Xilinx says the whole application, not just specific workloads, needs to be considered in the total cost of ownership. For example, the AI accelerator power consumption for running multiple inference devices, such as those that would be used by a cloud service, can become critical. Xilinx customers are attracted to FPGA technology over alternative accelerators when trialing POCs for the whole application. Xilinx is signed up to the independent MLPerf benchmarking standard and will release these benchmarks in future.
A critical factor in working with FPGAs is the cycle time to perform a change. Whereas a CPU and GPU combination can be changed at the speed of normal software programming (and its lifecycle needs), the range is broader for FPGA devices depending on what is being changed. A software change is rapid whereas a change to a domain-specific architecture (DSA) implemented in an FPGA can stretch to days and weeks. However, the adaptability of DSAs is a strength in achieving optimum designs because an ASIC custom chip requiring a change would have a far longer change lifecycle stretching to months.
Choosing the Appropriate Hardware Acceleration for AI Systems, Ovum INT002-000149 (August 2018)
Michael Azoff, Distinguished Analyst, Information Management