Before we discuss why thermal resistance is important in data center cooling, we probably need to explain what it is.
Thermal resistance is defined as the CPU temperature rise over the incoming water or air temperature, divided by the chip power. This is reported as oC/watt and is a measure of a cooling system’s ability to remove heat. Think of it as an efficiency metric (like MPG for a car) for data center cooling systems. The lower the oC/watt, the better.
For example, if the thermal resistance of a heat sink or cold plate is 0.1 oC/watt, a 100-watt CPU would run 10oC warmer than the incoming water or air temperature. So, if the system’s water was a 20oC, the chip would run at 30oC. Typical CPUs have a maximum temperature of about 85oC. In this example, you could use warm water for cooling, which is energy efficient and cost-effective. In some cases, you can reuse the heat from the computers.
To collect the data needed to calculate thermal resistance, data center operators can measure the CPU temperature and power with a CPU monitoring program like CoreTemp (https://www.alcpu.com/CoreTemp/). The incoming water or air temperature is generally already set or recorded.
As the CPU power increases, a low thermal resistance becomes more important. For example, if a future 800-watt CPU is cooled with water from an outdoor radiator in a hot location where the cooling water is 60oC (140oF), we need a thermal resistance of 0.031 oC/watt or lower ((85oC-60oC)/800W). This exceeds the thermal resistance of an air-cooled system.
Below are some thermal resistance numbers with references for the various systems used today.
Cold plate data includes the thermal grease interface resistance of about 0.01 oC/watt. Lower thermal resistance means better cooling performance.
For the data center, thermal resistance is a key consideration when selecting a cooling system. If the thermal resistance of a cooling system is not low enough, the processors will throttle and performance will be reduced. ASHRAE recently reported that thermal resistance will need to be lower for future high-power, low-temperature CPUs and GPUs.