The importance of Node Interleaving on AMD compute nodes
An explanation of Node Interleaving can be found here
The end result, a 4-5x performance increase in terms of memory bandwidth.
In our lab we have several 64 core AMD nodes with the following specs:
- Supermicro HBQGL-6F/HBQGL-IF
- Supermicro 1042-LTF SuperServer
Processor | AMD 6274 |
---|---|
Nickname | Interlagos |
Clock (GHz) | 2.2 |
Sockets/Node | 4 |
Cores/Socket | 16 |
NUMA/Socket | 2 |
DP GFlops/Socket | 140.8 |
Memory/Socket | 32 GB |
Bandwidth/Socket | 102.4 GB/s |
DDR3 | 1333 MHz |
L1 cache (excl.) | 16KB |
L2 cache/# cores | 2MB/2 |
L3 cache/# cores | 8MB/8 |
I noticed a a few days ago that one of the nodes was performing horribly compared to the other so I decided to do some digging. I installed AMDAPPSDK on both machines and ran the clpeak benchmark with the following results:
Bad Compute Node:
Platform: AMD Accelerated Parallel Processing
Device: AMD Opteron(TM) Processor 6274
Driver version : 1214.3 (sse2,avx,fma4) (Linux x64)
Compute units : 64
Clock frequency : 2200 MHz
Global memory bandwidth (GBPS)
float : 9.22
float2 : 9.64
float4 : 9.95
float8 : 10.16
float16 : 9.99
...
Good Compute Node:
Platform: AMD Accelerated Parallel Processing
Device: AMD Opteron(TM) Processor 6274
Driver version : 1214.3 (sse2,avx,fma4) (Linux x64)
Compute units : 64
Clock frequency : 2205 MHz
Global memory bandwidth (GBPS)
float : 37.66
float2 : 42.27
float4 : 58.08
float8 : 55.39
float16 : 43.31
...
There is a 4-5x differerence in memory bandwidth! I omitted the Flop rates of both nodes as they were identical. By enabling Node interleaving, the performance increases dramatically.
Section 1. Bios Configuration
Note that I will be talking about Bios version 2.0 here.
I am going to provide the bios configuration of the faster machine for the CPU and the Memory options
Bios->Advanced->Processor & Clock Options
GART Error [Disabled]
Microcode Update [Enabled]
Secure Virtual Machine Mode [Disabled]
PowerNow [Enabled]
C State Mode [Disabled]
PowerCap [P-state 0]
HPC Mode [Disabled]
CPB Mode [Auto]
CPU DownCore Mode [Disabled]
C1E Support [Auto]
Clock Spread Spectrum [Disabled]
Bios->Advanced->Advanced Chipset Control -> NorthBridge Configuration
HT Speed Support [Auto]
IOMMU [Enabled]
Bios->Advanced->Advanced Chipset Control -> NorthBridge Configuration ->Memory Configuration
Bank Interleaving [Auto]
Node Interleaving [Auto] THE MOST IMPORTANT CHANGE
Channel Interleaving [Auto]
CS Sparing Enable [Disabled]
Bank Swizzle Mode [Enabled]