|
[Sponsors] |
November 24, 2021, 15:02 |
Cache usage in CFD, and AMD 3D V-cache
|
#1 |
New Member
Harris Snyder
Join Date: Aug 2018
Posts: 24
Rep Power: 7 |
I was under the impression that CFD workloads, at least finite volume methods, didn't benefit too much from cache size, but in AMD's recent Milan-X announcement, they're claiming outrageous performance improvements in OpenFOAM and Fluent from the introduction of 3D V-cache. So either I'm wrong and having lots of cache really matters, or they're using a small enough case that some substantial fraction of the decomposed domain fits into the 700+MB of cache they have. I'm curious to hear from anyone who might have deeper knowledge on this subject.
|
|
November 25, 2021, 03:24 |
|
#2 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46 |
The CFD benchmarks I have seen so far seem to be honest in that regard https://www.tomshardware.com/news/mi...n-x-benchmarks
The cases are way too big to fit into the cache entirely, and they also show results for different number of nodes. You can also see where the advantage of Milan-X starts to break down: the "Combustor_830m" case for Fluent only shows a minor performance increase, which then grows for higher number of nodes. Cache size always mattered a bit with unstructured CFD, but so far, last level caches were an order of magnitude smaller. Now we suddenly get cache sizes in the GB range for a dual-socket system, instead of MB. Intel will go a similar route in the future, presumably with slower but even larger HBM inside the CPU package. Last edited by flotus1; November 25, 2021 at 07:16. |
|
November 25, 2021, 10:11 |
|
#3 |
New Member
Harris Snyder
Join Date: Aug 2018
Posts: 24
Rep Power: 7 |
Interesting. What property of the combustor case is responsible for this difference? Just the outright size?
|
|
November 25, 2021, 10:28 |
|
#4 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46 |
I would think so, yes. Especially since it starts speeding up again for higher number of nodes.
|
|
February 18, 2022, 07:44 |
|
#5 |
New Member
Andrew
Join Date: Apr 2012
Posts: 15
Rep Power: 14 |
Well, talking about caches - of course take it with a grain of salt, but Intel recently bragged about some serious performance uplift that on-die fast memory (HBM) give to its upcoming server products.
Have a look here , claims performance increase in 2.8 times of Sapphire Radips with HBM versus (some) current gen Xeon or EPYC Milan. Well, as usual it isn't clear which one CPU and in what exact configuration; but most likely some that gives the same performance as Xeon, so probably not top of the line ) And, more interesting, it is 1.75x faster then Sapphire Rapids without HBM. 28million cells grid in that case is not a huge one, but pretty relevant for engineering uses. |
|
February 19, 2022, 13:06 |
|
#6 |
New Member
Join Date: Aug 2021
Posts: 22
Rep Power: 4 |
What is the effect of the L3 cache size on the performance of non-CFD simulation programs such as metal forming?
Does the simulation time of mechanical problems, such as for example metal forming (implicit or explicit) decrease with increasing L3 cache capacity? https://www.youtube.com/watch?v=i0MW66Mly8E from the 30th minute of the Abaqus 2022 presentation, the topic of performance begins and a few examples of processor performance |
|
March 22, 2022, 01:52 |
|
#7 |
New Member
M-G
Join Date: Apr 2016
Posts: 28
Rep Power: 9 |
Hi,
Do you think new 3D V-Cache could affect maximum possible core numbers per node ? There was a rule of thumb that (core counts) / (number of memory channels) not greater than 4. e.g no more than 32 cores per socket for a 8 channel memory capable CPU. Now could we use 64 cores on 8 channel memory while having the linear scaleup ? Thanks |
|
July 11, 2022, 16:26 |
|
#8 | |
Member
Matt
Join Date: May 2011
Posts: 43
Rep Power: 14 |
Quote:
Since current Xeons typically get linear scaling up to ~24 cores per socket for most CFD workloads, that would mean Xeons with HBM2E could scale linearly up to ~96 cores per socket. Unfortunately, Sapphire Rapids will top out at 60 cores per chip, so while Sapphire Rapids+HBM will have over 2X the memory bandwidth per socket as EPYC Genoa (even with its 12 channels of DDR5), Sapphire Rapids+HBM Xeons may not actually have enough CPU horsepower to make use of all that memory bandwidth for CFD workloads. A 2P 64-core Genoa (or especially Genoa-X) node could well perform very similarly to a 2P 60-core Sapphire Rapids-HBM node, even for simulations that fit within the 128 GB of HBM. I'm anxious to see the 3rd-party Sapphire Rapids+HBM vs. Genoa-X CFD benchmarks in early 2023. Despite VERY different approaches, it wouldn't surprise me if the end result is quite similar. Throw GPU CFD into the mix, and things will get really interesting; with the next generation of HPC CPUs, the memory bandwidth gulf between CPUs and GPUs will shrink dramatically, so GPU vs. CPU will most likely come down to things like licensing costs, hardware costs, simulation memory requirements, power/space efficiency, etc. |
||
July 11, 2022, 17:07 |
|
#9 | ||
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46 |
Quote:
With real-world applications, some of the data has to come from slower memory at some point. So performance/scaling will be somewhat lower. How much exactly is the exciting part, and will probably also depend on the application and model size. Quote:
My guess is that there will be noticeable differences depending on applications. And whether a code gets some architecture-specific optimizations. Anyway, we can look forward to quite substantial performance increases compared to current-gen architectures with both approaches. I am more worried about pricing and availability. Neither of this will be cheap, there is going to be product segmentation, and hyperscalers will take the first bites. Last edited by flotus1; July 12, 2022 at 03:47. |
|||
July 12, 2022, 09:27 |
|
#10 | |
Member
Matt
Join Date: May 2011
Posts: 43
Rep Power: 14 |
Quote:
Yeah, as much as we'd all love to pick the perfect hardware for our CFD workloads, in reality we'll be constrained by what we can actually get our hands on. It was 4-6 months between the 'launches' of EPYC Milan and Xeon 'Cascade Lake Refresh' and when regular Joes could actually buy them at retail. Milan-X processors are still hard to come by for anywhere near MSRP, Threadripper Pro 5000WX is STILL unobtainium four months after 'launch', yet AMD says Genoa is 'launching' imminently. It's gotten to the point where the hyperscalers are always a CPU generation ahead of the rest of the market. |
||
July 13, 2022, 08:51 |
|
#11 |
New Member
Daniel
Join Date: Jun 2010
Posts: 12
Rep Power: 15 |
Jumping in just to highlight two things:
- The 5800X3D result in the benchmarking thread here is the fastest single core run I could find, depending on what one plans for work, that is a beast. - Yet another Phoronix test with the latest AMD huge cache server chip (7773) was recently published https://www.phoronix.com/scan.php?pa...3x-redux&num=3 and may deserve a look (with some grain of salt but nice to give a look at) Cheers! |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
General recommendations for CFD hardware [WIP] | flotus1 | Hardware | 18 | February 29, 2024 12:48 |
AMD Epyc CFD benchmarks with Ansys Fluent | flotus1 | Hardware | 55 | November 12, 2018 05:33 |
AMD EPYC 7281 (32 MB L2 cache) vs 7301 (64 MB cache)? | zwilhoit | Hardware | 0 | November 9, 2018 15:00 |
AMD Ryzen Threadripper vs Intel Xeon, importance of cache and memory channels | JohnMartinGodo | Hardware | 4 | March 21, 2018 12:07 |
Superlinear speedup in OpenFOAM 13 | msrinath80 | OpenFOAM Running, Solving & CFD | 18 | March 3, 2015 05:36 |