ThunderX2 ?

Simbelmynë · June 8, 2018, 09:47

So the ARM based Cavium ThunderX2 seems to score fairly well with OpenFOAM

https://www.anandtech.com/show/12694...ver-reality/12

The price/performance seems really nice. Would be nice to have it in the OpenFOAM benchmark comparison

Any thoughts?

kyle · June 8, 2018, 23:16

Looks like similar results to EPYC, which makes sense because they have similar memory bandwidth. Price is similar as well I believe.

I'd stick with x86 just because who knows what problems you'd see on ARM with other software.

If one of these ARM server chip companies would drop 16GB-32GB of high bandwidth memory on onto the CPU package they'd immediately be the value leader for CFD. 16 DIMMs adds $2000+ to a system.

Simbelmynë · June 9, 2018, 04:11

Yeah that would be awesome.

"Similar to AMD's EPYC, Cavium's ThunderX2 is likely to shine in the "sparse matrix" HPC market. This is thanks to its 33% greater theoretical memory bandwidth and a high core/thread count. However as we've seen in the case of AMD's design, EPYC's L3-cache is slow once you need data that is not in the local 8 MB cache slice. The ThunderX2, by comparison, is a lot more sophisticated with a dual ring architecture, which seems to be similar to the ring architecture of the Xeon v4 (Broadwell-EP). According to Cavium, this ring structure is able to offer up to 6 TB/s of bandwidth and is non-blocking."

kyle · June 9, 2018, 10:42

I don't think that particular cache deficiency of EPYC makes much difference for CFD. It only comes into play on domain boundaries within a socket, so you're taking a modest cache latency hit on like <1% of cells interfaces.

flotus1 · June 10, 2018, 10:32

I would subscribe to that. In my opinion, if an application heavily relies on fast core-to-core data transfer with low latency, it does not really qualify as HPC. Scaling on more than one node with such an application would be problematic.
Benchmarks with MPI parallel CFD software have shown that Epyc performs pretty well despite its partially high latency cache access. And even for the OpenMP parallel codes I tried so far both scaling and absolute performance were pretty good on a dual-socket machine.
It sure would be nice to see improvements on the infinity fabric and cache latency in future generations, but even in the first gen these issues don't hurt performance to a point where it becomes a problem. At least not for all applications...

I don't have a strong opinion on these ARM chips yet and no plans to leave x86 behind. But as we could clearly see in the last 1 1/2 years, the CPU market was in dire need of more competition. People like us who have to buy these chips and always need more computing power can only benefit from more competitive players in the market.
Intel pulling marketing stunts like their latest "28-core 5GHz CPU" -despite being utterly useless- is a good indicator that we will see more innovation in the market in the years to come.

June 8, 2018, 09:47	ThunderX2 ?	#1
Simbelmynë Senior Member Join Date: May 2012 Posts: 546 Rep Power: 15	So the ARM based Cavium ThunderX2 seems to score fairly well with OpenFOAM https://www.anandtech.com/show/12694...ver-reality/12 The price/performance seems really nice. Would be nice to have it in the OpenFOAM benchmark comparison Any thoughts?

Thread Tools	Search this Thread
Show Printable Version Email this Page	Search this Thread: Advanced Search
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

June 8, 2018, 23:16		#2
kyle Senior Member Join Date: Mar 2009 Location: Austin, TX Posts: 160 Rep Power: 18	Looks like similar results to EPYC, which makes sense because they have similar memory bandwidth. Price is similar as well I believe. I'd stick with x86 just because who knows what problems you'd see on ARM with other software. If one of these ARM server chip companies would drop 16GB-32GB of high bandwidth memory on onto the CPU package they'd immediately be the value leader for CFD. 16 DIMMs adds $2000+ to a system.

June 9, 2018, 04:11		#3
Simbelmynë Senior Member Join Date: May 2012 Posts: 546 Rep Power: 15	Yeah that would be awesome. "Similar to AMD's EPYC, Cavium's ThunderX2 is likely to shine in the "sparse matrix" HPC market. This is thanks to its 33% greater theoretical memory bandwidth and a high core/thread count. However as we've seen in the case of AMD's design, EPYC's L3-cache is slow once you need data that is not in the local 8 MB cache slice. The ThunderX2, by comparison, is a lot more sophisticated with a dual ring architecture, which seems to be similar to the ring architecture of the Xeon v4 (Broadwell-EP). According to Cavium, this ring structure is able to offer up to 6 TB/s of bandwidth and is non-blocking."

June 9, 2018, 10:42		#4
kyle Senior Member Join Date: Mar 2009 Location: Austin, TX Posts: 160 Rep Power: 18	I don't think that particular cache deficiency of EPYC makes much difference for CFD. It only comes into play on domain boundaries within a socket, so you're taking a modest cache latency hit on like <1% of cells interfaces.

June 10, 2018, 10:32		#5
flotus1 Super Moderator Alex Join Date: Jun 2012 Location: Germany Posts: 3,399 Rep Power: 46	I would subscribe to that. In my opinion, if an application heavily relies on fast core-to-core data transfer with low latency, it does not really qualify as HPC. Scaling on more than one node with such an application would be problematic. Benchmarks with MPI parallel CFD software have shown that Epyc performs pretty well despite its partially high latency cache access. And even for the OpenMP parallel codes I tried so far both scaling and absolute performance were pretty good on a dual-socket machine. It sure would be nice to see improvements on the infinity fabric and cache latency in future generations, but even in the first gen these issues don't hurt performance to a point where it becomes a problem. At least not for all applications... I don't have a strong opinion on these ARM chips yet and no plans to leave x86 behind. But as we could clearly see in the last 1 1/2 years, the CPU market was in dire need of more competition. People like us who have to buy these chips and always need more computing power can only benefit from more competitive players in the market. Intel pulling marketing stunts like their latest "28-core 5GHz CPU" -despite being utterly useless- is a good indicator that we will see more innovation in the market in the years to come.