Intel Vision 2024 was the stage where Intel showcased their Gaudi 3 AI accelerator. Known as Intel’s latest dedicated AI accelerator, the Gaudi 3 represents a big advancement from the previous Gaudi 2. We’re set to see this new accelerator go into mass production in the latter part of 2024.
Spotting the Intel Gaudi 3 128GB HBM2e AI Chip
Most people are interested in knowing the specifications, and the Gaudi 3 is definitely worth their curiosity. This new component continues the HBM2e trend but is utilizing eight stacks to reach a capacity of 128GB. The Gaudi 3 offers a processing capability of up to 1.835PFLOPS in FP8 compute. It’s essential to point out that mere support of FP8 is already a significant feat as many accelerators don’t support it currently. This new piece touts 64 tensor processor cores added to 8 matrix math engines.
Intel acquired the company the same year when they were in their Gaudi 1 generation era. This was during the time when competition was heating up for the choice of AI acceleration, with Facebook’s choice being a major player. After the Intel acquisition in 2019, the company delved deep into the production of the Gaudi 2 generation, which started gaining traction in late 2022 as AI began to soar. Now the stage is set for the Gaudi 3, which comes with substantial enhancements in computational and bandwidth capabilities, along with a reduction in size from 7nm to 5nm.
To give you an idea of the transformation, just consider the size difference between Gaudi 3 and Gaudi 2.
When you hold Gaudi 2 and Gaudi 3 in your hands, it is quite apparent that the Gaudi 3 silicon package is considerably larger.
The silicon package houses two dies, equipped with 48MB of SRAM, 16 tensor processing cores, and a media engine. The distinct approach initially introduced by Habana Gaudi 1, which includes the use of Ethernet to scale up and out, is continued in Gaudi 3. This version facilitates 24 network interfaces at a speed of 200GbE, a huge leap from Gaudi 2’s 100GbE and Gaudi 1’s 10x 100GbE. Ethernet is Intel’s choice for facilitating interconnectivity between AI accelerators in a chassis, as well as expanding to several AI accelerators in a data center. This is a different approach from NVIDIA, which utilizes NVLink/NVSwitch, Infiniband, and Ethernet in an HGX H100 platform.
Here is a closer look at the die. The thermal paste is there because this was a working card that was pulled. If you are wondering what is between the 8x 16GB HBM2e packages, we were told that is filler silicon to make the structure of the package.
One can also see the line between the two pieces of silicon that make up the main compute, SRAM, and networking portion of the AI accelerator.
Here is the OAM package bottom for the Gaudi 3.
The key concept of Habana’s Gaudi 3 was using Ethernet to scale up. Network administrators do not generally wish to manage many different types of data fabrics, and Ethernet is widely used. This concept came to be before the advent of UltraEthernet. As of 2024, network switch speeds have increased significantly, with modern 51.2T switches capable of handling numerous 200GbE devices. Compared to 2019, when Gaudi 1 was launched and 32-port 100GbE was still considered high-end, network bandwidth and topologies were less advanced. The OAM package uses 21 out of the 24 lanes and employs 3x 200GbE to connect with each of the other seven OAM packages. The remaining 3x 200GbE connections are connected to OSFP connectors at the back of the chassis.
According to Intel, the new Gaudi 3 is more energy-efficient and potentially more rapid than NVIDIA’s H100 in inferencing. It seems that Gaudi 3 will secure its place in the AI inferencing market.
Simultaneously, the Intel Gaudi 3 outperforms the NVIDIA H100 in terms of training speed. Although the NVIDIA H200 has been unveiled, we still await its training benchmarks as production is retreating. Expectedly, the NVIDIA Blackwell edition will introduce an upgraded performance later this year. Gaudi 3, on the other hand, is vying in terms of pricing. The power-efficiency debate is forecasted to regain strength in 2024 when organizations will start reassessing their energy footprint and the decision of either retaining or replacing machines to conserve energy capacity.
Subsequently, Intel has unveiled the Gaudi PCIe CEM, namely HL-338, which is an add-in card with a TDP of 600W. It deserves appreciation the transformation from 300-350W PCIe accelerators of the past year or two to a 600W TDP today.
The sampling of all these accelerators is scheduled for the first half of this year. Meanwhile, the production of the air-cooled and liquid-cooled models is slated for the latter half of the year. It is noteworthy as it is correlated with the debut of the Granite Rapids-AP Xeon 6 Supermicro platform.
It is imperative to remember that these will be simultaneously launched with the NVIDIA H200 and Blackwell, so a significant price reduction by Intel is expected. The subsequent development in 2025 will be Falcon Shores, an AI-designed GPU architecture. Intel has reassured users of Gaudi 3 that transitions towards the newer models will be made conveniently.
ColoCrossing excels in providing enterprise Colocation Services, Dedicated Servers, VPS, and a variety of Managed Solutions, operating from 8 data center locations nationwide. We cater to the diverse needs of businesses of any size, offering tailored solutions for your unique requirements. With our unwavering commitment to reliability, security, and performance, we ensure a seamless hosting experience.
For Inquiries or to receive a personalized quote, please reach out to us through our contact form here or email us at sales@colocrossing.com.