A Comparative Study and In-Depth Analysis of Cooling Methods Through Optimal Cold Plate Design

There is a lot of marketing hype around liquid cooling and the best approach to support next generation data centers and HPC facilities. Chilldyne conducted an in-depth technical study on cold plate optimization and cooling methods, focusing on managing the thermal output of powerful GPUs like the 700-watt NVIDIA H100. This research was initiated at a customer's request to aid in the development of natural convection oil-cooled heat sinks. We performed a mathematical analysis, validated it through experiments, and determined the optimal fin spacing for cooling.

Key Findings

1. Performance of Direct-to-Chip Water-Based Liquid Cooling

The thermal resistance of the water-based direct liquid cooling is more than 700% better compared to air cooling.

A liquid-cooled cold plate at 0.7 lpm would provide a thermal resistance of 0.02 °C/watt, resulting in a 14°C rise for the configuration with two 700-watt NVIDIA H100 GPUs in series.

Direct-to-chip liquid cooling is more suitable for managing the thermal output of high-performance GPUs like the H100, especially in dense configurations where other cooling methods are inadequate. Its superior thermal performance allows for operation of these powerful processors in data center environments. (For context, lower thermal resistance indicates better cooling performance.)

2. Oil Immersion vs. Air Cooling

The performance of the heat sink in oil is 43% better compared to air.

Our experiments revealed that oil immersion cooling outperforms traditional air cooling by 43%. This finding aligns with previous studies on oil vs. air cooling, such as “Thermal Performance and Efficiency of a Mineral Oil Immersed Server Over Varied Environmental Operating Conditions” [1]. However, this advantage comes with limitations when dealing with the most powerful processors.

For reference, we compared our experimental results to the Dynatron B4A, a similar air-cooled heat sink with a thermal resistance of 0.147°C/W at 25 CFM.

3. Limitations of Oil Immersion for High-Performance GPUs

When we applied our findings to a scenario involving two 700-watt NVIDIA H100 GPUs in series (common in 8-GPU servers) with 53x280x17 mm cold plates, we discovered that using the same mineral oil, the heat sink temperature would reach 97°C with a 27°C coolant.

This cold plate temperature is expected to be too high, exceeding the safe operating range for H100 GPUs.

Despite its improvement over air cooling, oil immersion struggles to manage the extreme heat output of densely packed, high-performance processors like the H100.

Full calculations and analysis are available in the reference documentation.

The Experiment Setup

We conducted the comprehensive study using the following setup:

  • Test Subjects: LGA3457 (Skylake) heat sinks
  • Cooling Medium: Immersion via Royco 602 PAO Avionics Coolant, with properties similar to Compuzol™ fluid (used for coolant calculations); single-phase
  • Heat Source: A 275-watt heater (thermal test vehicle) simulating a Skylake CPU
  • Cooling System: Oil bath with a pump flowing at 1.5 lpm, paired with a fan and radiator typical of desktop liquid cooling systems
  • Measurements:
    • Cold plate temperature: Measured by a sensor inside the cold plate
    • Oil temperature: Measured by a sensor at the bottom of the oil bath
      Circulation: Oil was drawn from the top of the bath, cooled, and returned to the lower portion under the heat sink
  • Heat Sink Positioning: Spaced approximately 20 mm from the bottom of the bath

A thermal image of the glass container (Figure 2) revealed that most of the oil remained cool, with only a thin top layer showing increased temperature.

Figure 1: Oil immersion test setup.


Figure 2: Thermal image of fluid container.

Purpose of the Experiment

Our primary goal was to experimentally verify the theoretical analysis of the Skylake heat sink and use this validated model to estimate the thermal performance of heat sinks for the H100 GPU, particularly in a configuration with two GPUs in series.

Experimental Result for Oil Cooling

Figure 3: Thick fin heat sink and thin fin standard model.

Critical Insights

  1. The spacing and thickness of fins optimized for air cooling are similar to those required for optimized oil cooling.
  2. Thinner fins create more flow paths but also increase thermal resistance due to fin conduction. This trade-off is crucial in determining the optimal thickness and spacing for heat sinks in natural convection setups.
  3. The optimal thickness and spacing for natural convection set the upper limit for cold plate thermal resistance in both air and oil cooling.

Implications for Data Center Cooling

These findings have significant implications for cooling HPC systems and data centers:

  • While oil immersion cooling offers a 43% improvement over traditional air cooling, it falls short for the most demanding applications, such as densely packed H100 GPUs.
  • Direct-to-chip liquid cooling emerges as the most effective solution for managing the extreme heat loads of next-generation processors, enabling higher power densities in data centers.
  • As computational power increases, the limitations of natural convection cooling (in both air and oil) become more apparent, particularly as we push the boundaries of computational power and increasing TDP.

At Chilldyne, we're leveraging these insights to develop the best cooling solutions tailored for the most demanding computing environments. Our direct-to-chip liquid cooling systems, featuring unique negative pressure technology, are designed to address the thermal challenges posed by the latest high-performance processors while eliminating risk of coolant leaks.

Detailed Calculations and Miscellaneous Information

The heat transfer was analyzed based on methods from Chapter 9.7 of Fundamentals of Heat and Mass Transfer [2] and from the paper “Thermally Optimum Spacing of Vertical, Natural Convection Cooled, Parallel Plates” [3].


Figure 4: Mathematical Analysis Nomenclature and Boundary Conditions from Incropera, DeWitt [2]

For the full calculations and detailed analysis, please download the document here.

Leave a Comment