CFD Online Discussion Forums - Intel announces "Xeon Max" CPUs with HBM2

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)

- Hardware (https://www.cfd-online.com/Forums/hardware/)

- - Intel announces "Xeon Max" CPUs with HBM2 (https://www.cfd-online.com/Forums/hardware/246063-intel-announces-xeon-max-cpus-hbm2.html)

Intel announces "Xeon Max" CPUs with HBM2

https://www.servethehome.com/intel-x...pids-hbm-line/

Taking a page out of Apple's playbook, Intel is calling their next generation of Xeon CPUs with HBM2e in the package "Xeon Max".
The cliff notes:

multiple dies/chiplets ("tiles" in Intels nomenclature)
up to 56 cores
(up to?) 64GB of HBM2e in the CPU package
"over 1TB/s" of HBM2 bandwidth
8-channel DDR5-4800 memory (with 1DPC)
2s scalability, other CPUs from this generation can do 4s and 8s
3 different operation modes for HBM2, one of which runs entirely without DRAM
expected availability mid 2023
up to 350W TDP

Intel claims a ~5x increase in stream triad benchmark compared to AMDs current-gen Epyc 7773X.
These will no doubt be very interesting CPUs for CFD and FEA.

For sure, both of them promise a generational leap in CFD performance we haven't seen in a long time.
By the way, I am aware of the peculiar timing for this announcement from Intel so far ahead of release. It's because AMD introduced their new Epyc "Genoa" lineup yesterday. But that's just what marketing has come to these days :rolleyes:

Of course we're all waiting to see Genoa-X benchmarked against Sapphire Rapids HBM for CFD, but with most CFD solvers now supporting GPU compute in some capacity, Sapphire Rapids HBM is probably also going to be compared against GPUs.

The LBM solver I use (PowerFLOW) licenses GPUs as 32 FP32 cores=1 CPU core, so H100 or RTX 6000 Ada GPUs may or may not be a better bang for buck vs. 56-core 'Xeon Max' CPUs. The 'Xeon Max' may come out ahead of GPUs in this licensing paradigm, at least for scenarios like ours where the per-core licensing costs FAR outstrip the hardware costs.

That's one of those things that always rubbed me the wrong way with GPU acceleration for commercial solvers. Making it artificially viable through lower license costs, instead of improving the implementation to a point where GPUs are just a no-brainer.
I'll always root for faster CPUs. I don't want to cross my fingers every time I need a slightly more "obscure" solver feature, that may or may not yet work on GPUs.

Quote:

Originally Posted by flotus1 (Post 839418)

I'll always root for faster CPUs. I don't want to cross my fingers every time I need a slightly more "obscure" solver feature, that may or may not yet work on GPUs.

Yeah, I don't particularly want to spend my days beta testing buggy solvers that don't quite work right on GPUs. So if DS makes my decision easy by having CPUs be a slam dunk under their licensing paradigm, that's fine by me.

Besides, CPUs and GPUs are starting to converge on memory bandwidth vs. cost, with GPUs really only jumping ahead when electricity costs are paramount (we should be so lucky). But for someone that routinely runs simulations requiring 1TB+ of memory, CPUs running with system RAM are always going to be a necessity. I'm sure eight+ H100s are very fast, but they cost as much as a house to acquire. You can source a 2P server with 1TB of RAM for peanuts.

Most Milan CPUs has a memory bandwidth of 205 GB/s and that has increased to 461 GB/s for Genova. A huge step up!

A bandwidth of 1 TB/s is about 2,2 times that of Genova. It will be interesting to see some prices on Intels Xeon Max with HBM2e. If prices for the HBM2 CPUs are not too high, then both Intel and AMD have made some very interesting CPUs for CFD calculations.

Quote:

Originally Posted by ErikAdr (Post 839453)

It's a simplification, but I think of the memory subsystem as 'feeding' the cores. For CFD, the rule of thumb up to now was that cores would start to go a bit hungry if you couldn't 'feed' them with ~8 GB/s or so of memory bandwidth per core (meaning you'd get nonlinear scaling if you dropped below that). HBM2e will 'feed' the top Xeon Max with ~18 GB/s per core, which may actually be more memory bandwidth than the cores can make use of.

With Genoa, you're looking at:
32-core=14.4 GB/s per core (not too far off the 56-core Xeon Max CPU)
48-core=9.6 GB/s per core
64-core=7.2 GB/s per core
96-core=4.8 GB/s per core

So I expect you'll probably get linear speedup up to around 48 or 64 cores.

Obviously, there are other major factors in play (latency, cache, IPC, clocks, etc.) But back-of-the-envelope calcs indicate that the 64-core Genoa part will likely outpace the 56-core Xeon Max part (at a much lower cost), and Genoa-X (with all that 3D cache) may obliterate Xeon Max.

We're still at least 6 months away from Genoa-X, but I do find it curious that leaks so far have the Genoa-X lineup skipping from 32 straight to 96 cores. I hope they actually include 48 and/or 64-core Genoa-X CPUs, as those should be optimal for CFD (at least with per-core licensing).