The importance of Node Interleaving on AMD compute nodes
An explanation of Node Interleaving can be found here
The end result, a 4-5x performance increase in terms of memory bandwidth.
In our lab we have several 64 core AMD nodes with the following specs:
- Supermicro HBQGL-6F/HBQGL-IF
- Supermicro 1042-LTF SuperServer
Processor | AMD 6274 |
---|---|
Nickname | Interlagos |
Clock (GHz) | 2.2 |
Sockets/Node | 4 |
Cores/Socket | 16 |
NUMA/Socket | 2 |
DP GFlops/Socket | 140.8 |
Memory/Socket | 32 GB |
Bandwidth/Socket | 102.4 GB/s |
DDR3 | 1333 MHz |
L1 cache (excl.) | 16KB |
L2 cache/# cores | 2MB/2 |
L3 cache/# cores | 8MB/8 |
I noticed a a few days ago that one of the nodes was performing horribly compared to the other so I decided to do some digging. I installed AMDAPPSDK on both machines and ran the clpeak benchmark with the following results:
Bad Compute Node:
Good Compute Node:
There is a 4-5x differerence in memory bandwidth! I omitted the Flop rates of both nodes as they were identical. By enabling Node interleaving, the performance increases dramatically.
Section 1. Bios Configuration
Note that I will be talking about Bios version 2.0 here.
I am going to provide the bios configuration of the faster machine for the CPU and the Memory options
Bios->Advanced->Processor & Clock Options
Bios->Advanced->Advanced Chipset Control -> NorthBridge Configuration
Bios->Advanced->Advanced Chipset Control -> NorthBridge Configuration ->Memory Configuration