|
[Sponsors] |
June 8, 2018, 09:47 |
ThunderX2 ?
|
#1 |
Senior Member
Join Date: May 2012
Posts: 546
Rep Power: 15 |
So the ARM based Cavium ThunderX2 seems to score fairly well with OpenFOAM
https://www.anandtech.com/show/12694...ver-reality/12 The price/performance seems really nice. Would be nice to have it in the OpenFOAM benchmark comparison Any thoughts? |
|
June 8, 2018, 23:16 |
|
#2 |
Senior Member
Join Date: Mar 2009
Location: Austin, TX
Posts: 160
Rep Power: 18 |
Looks like similar results to EPYC, which makes sense because they have similar memory bandwidth. Price is similar as well I believe.
I'd stick with x86 just because who knows what problems you'd see on ARM with other software. If one of these ARM server chip companies would drop 16GB-32GB of high bandwidth memory on onto the CPU package they'd immediately be the value leader for CFD. 16 DIMMs adds $2000+ to a system. |
|
June 9, 2018, 04:11 |
|
#3 |
Senior Member
Join Date: May 2012
Posts: 546
Rep Power: 15 |
Yeah that would be awesome.
"Similar to AMD's EPYC, Cavium's ThunderX2 is likely to shine in the "sparse matrix" HPC market. This is thanks to its 33% greater theoretical memory bandwidth and a high core/thread count. However as we've seen in the case of AMD's design, EPYC's L3-cache is slow once you need data that is not in the local 8 MB cache slice. The ThunderX2, by comparison, is a lot more sophisticated with a dual ring architecture, which seems to be similar to the ring architecture of the Xeon v4 (Broadwell-EP). According to Cavium, this ring structure is able to offer up to 6 TB/s of bandwidth and is non-blocking." |
|
June 9, 2018, 10:42 |
|
#4 |
Senior Member
Join Date: Mar 2009
Location: Austin, TX
Posts: 160
Rep Power: 18 |
I don't think that particular cache deficiency of EPYC makes much difference for CFD. It only comes into play on domain boundaries within a socket, so you're taking a modest cache latency hit on like <1% of cells interfaces.
|
|
June 10, 2018, 10:32 |
|
#5 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46 |
I would subscribe to that. In my opinion, if an application heavily relies on fast core-to-core data transfer with low latency, it does not really qualify as HPC. Scaling on more than one node with such an application would be problematic.
Benchmarks with MPI parallel CFD software have shown that Epyc performs pretty well despite its partially high latency cache access. And even for the OpenMP parallel codes I tried so far both scaling and absolute performance were pretty good on a dual-socket machine. It sure would be nice to see improvements on the infinity fabric and cache latency in future generations, but even in the first gen these issues don't hurt performance to a point where it becomes a problem. At least not for all applications... I don't have a strong opinion on these ARM chips yet and no plans to leave x86 behind. But as we could clearly see in the last 1 1/2 years, the CPU market was in dire need of more competition. People like us who have to buy these chips and always need more computing power can only benefit from more competitive players in the market. Intel pulling marketing stunts like their latest "28-core 5GHz CPU" -despite being utterly useless- is a good indicator that we will see more innovation in the market in the years to come. |
|
Thread Tools | Search this Thread |
Display Modes | |
|
|