Nvidia Blackwell data center GPUs could face further delays due to overheating problems

Nvidia's Blackwell chips face delays due to overheating and design flaws.

: Nvidia's Blackwell AI chips are experiencing delays due to overheating in server racks and previous design flaws found by TSMC. The overheating issue arises from server racks with 72 processors consuming up to 120kW, while the earlier design flaw involved a mismatch in thermal expansion causing chip failure. Despite these setbacks, Nvidia is working with cloud providers to manage the delays, and demand for Blackwell remains high.

Nvidia's new Blackwell AI chips have encountered delays due to overheating issues and initial design flaws. The chips overheat when placed in server racks designed to hold 72 processors, each consuming up to 120kW, prompting Nvidia to repeatedly ask suppliers to redesign the racks for better cooling.

Previously, the manufacturer TSMC identified a design flaw in the chip's processor die, affecting the GB100 and GB200 chips. This flaw was linked to a mismatch in thermal expansion between components, leading to warping and system failures during testing.

Despite these challenges, Nvidia remains in discussions with major cloud providers like Microsoft, Google, and Meta to manage the production timetable. There is high demand for Blackwell AI chips, as stated by CEO Jensen Huang, so any further delays could disappoint these key partners.