CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

8 channel vs 4 channel memory bandwidth using only 8 cores!

Register Blogs Community New Posts Updated Threads Search

Like Tree6Likes
  • 2 Post By Allen_
  • 1 Post By Allen_
  • 2 Post By flotus1
  • 1 Post By flotus1

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   June 27, 2020, 21:18
Default 8 channel vs 4 channel memory bandwidth using only 8 cores!
  #1
Member
 
Join Date: Oct 2019
Posts: 63
Rep Power: 6
Habib-CFD is on a distinguished road
Hi friends,
I am very curious to know about the effect of high memory bandwidth in low core count CPUs and in CFD applications.

Can Epyc 7252 provide a similar performance of 7302p limited to 8 cores (assuming full channel populated)?
The 7252 benefits from the 100 MHz higher base clock, however, the L3 cache and memory bandwidth are half.


I found the results in OpenFOAM benchmark:
2x7302 total 8 Cores 81.4s
1x3960X total 8 Cores 101.5s


How it is possible? in 8 cores benchmark, the 4 channel bandwidth must be enough!!

Last edited by Habib-CFD; June 28, 2020 at 01:28.
Habib-CFD is offline   Reply With Quote

Old   June 28, 2020, 01:24
Default
  #2
New Member
 
Allen
Join Date: Jan 2020
Posts: 8
Rep Power: 6
Allen_ is on a distinguished road
Quote:
Originally Posted by Habib-CFD View Post
2x7302P total 8 Cores 81.4s
1x3960X total 8 Cores 101.5s


How it is possible? in 8 cores benchmark, the 4 channel bandwidth must be enough!!
This is a pretty apples and oranges comparison. The 7302P is a single-socket only chip, so any test with two of them is going to also include an interconnect. I think a safe assumption here is that each 7302P is running four cores with an infinband connection. As fast as infiniband is it's not as fast as the memory on a single socket, so these ROME chips are being saddled with a pretty big disadvantage from the slower message passing they're being forced to do. I suspect the results of running 8 cores on a single 7203P would be better, and that might give a better idea of 4 vs. 8 memory channels. Of course the best comparison would be to test a 7302 against one of the 64Mb cache models. If I had the funds do such a comparison I would, but alas...

Also the TR has a decent clock speed advantage and it should have pretty similar IPC to the 7302 since they're both zen2 chips, so that's 2 big advantages it's got going for it here yet it's still ~20 seconds slower. I might be making some incorrect assumptions but to me this comparison is pretty good evidence of 8 channels being better than 4, even at these small core counts.
Freewill1 and Habib-CFD like this.
Allen_ is offline   Reply With Quote

Old   June 28, 2020, 01:34
Default
  #3
Member
 
Join Date: Oct 2019
Posts: 63
Rep Power: 6
Habib-CFD is on a distinguished road
Thank you Allen.

I checked the results again, the benchmark comes from 2x7302 (my mistake to type P). But, again it is difficult to accept the differences in 8 cores.
Habib-CFD is offline   Reply With Quote

Old   June 28, 2020, 03:22
Default
  #4
New Member
 
Allen
Join Date: Jan 2020
Posts: 8
Rep Power: 6
Allen_ is on a distinguished road
Quote:
Originally Posted by Habib-CFD View Post
the benchmark comes from 2x7302
Ahh ok, that makes more sense.

Quote:
Originally Posted by Habib-CFD View Post
But, again it is difficult to accept the differences in 8 cores.

We can try to isolate the difference caused by the 8 (actually 16 available in this case) vs. 4 channel chips, at least to some degree, by comparing the amount of work done on a per-core per-clock basis, though we'll have to make some assumptions:
1) Both chips are running with their base memory speed of 3200MHz, i.e. the TR is not overclocked in any way.

2) The chips have similar IPC since they are both zen2, though this may be a moot point since we already have the product of their work.
7302: 8 cores running at 3.3 GHz do the work in 81.4 seconds, so with perfect scaling 1 core at 3.3 GHz would do it in 651.2 seconds, and at 1 Ghz it would do it in 2149 seconds.

3960X: 101.5s * 8 = 812s, 812s * 4.5GHz = 3654 seconds.

3654/2149 = ~1.7


Ok so in this case a 7302 core is 1.7x faster per GHz than a 3960X core, but they had 4x the memory bandwidth. There obviously is some difference, but going any further with the available information is beyond me. I'd also be interested to see how the results here would differ with all 8 of the 7302's cores on a single socket. Hopefully someone with more knowledge will chime in.
Habib-CFD likes this.
Allen_ is offline   Reply With Quote

Old   June 28, 2020, 10:39
Default
  #5
Member
 
Join Date: Oct 2019
Posts: 63
Rep Power: 6
Habib-CFD is on a distinguished road
How about the CPU cache effect !? not only the RAM bandwidth.


C: 2x7302,..................................3960x,... .............................7252
1: 3.3GHz (723s).........................4.5GHz (550s).....................3.2GHz, (?)
2: 3.3GHz (328s).........................4.0GHz (299s).....................3.1GHz, (?)
4: 3.0GHz (164s).........................3.8GHz (162s).....................3.1GHz, (?)
8: 3.0GHz (81s)...........................3.8GHz (102s).....................3.1GHz, (effective,??)
Habib-CFD is offline   Reply With Quote

Old   June 29, 2020, 11:26
Default
  #6
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Quote:
Originally Posted by Habib-CFD View Post
How it is possible? in 8 cores benchmark, the 4 channel bandwidth must be enough!!
Any particular reason for that assumption? I would be inclined to disagree with it.

Anyway, we can just dissect the results at hand further, and do our own strong scaling analysis:
scaling.png
It clearly shows that the 3960X is well below ideal scaling at 8 threads, while the 2x Epyc 7302 even shows super-linear scaling with up to 16 threads.
My conclusion: That TR 3960X is being held back by the performance of its memory subsystem. Even at a thread count of 8, memory bandwidth is not enough to keep those fast cores busy. The effect might be more pronounced than it should be, OP never responded to the comments he got.
And those Epyc 7302 perform much better thanks to an abundance of available memory bandwidth. The super-linear effect might be explained by effectively doubling the available L3 cache with two CPUs. Other 7302 results are closer to the ideal line, so some caution is required when drawing conclusions like these.
Freewill1 and Habib-CFD like this.
flotus1 is offline   Reply With Quote

Old   June 29, 2020, 20:13
Default
  #7
Member
 
Join Date: Oct 2019
Posts: 63
Rep Power: 6
Habib-CFD is on a distinguished road
Thank you flotus1.
Actually, I start to investigate lightly the Epyc 8 cores line up due to the poor scaling of my CFD software. With your explanation, the 7262 expects to provide noticeable performance in comparison to the 7252 in CFD application. The 10900x is also available with an almost similar total budget (7252), but I am not interested to be involved with VRM and CPU temperature.
Habib-CFD is offline   Reply With Quote

Old   June 30, 2020, 09:20
Default
  #8
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
If it is memory bandwidth that is holding your code back -and that's a big if- then the Epyc 7262 might be an option.
I can't remember each discussion we had on that topic in detail: have you already tried a monolithic CPU, with low latency and high clock speeds? Such as Intel 10900x?
But at this point, I think no CPU would give you a significant performance improvement.

Wait a minute... you already tried an Epyc 7302p, right? If that's the case, forget about the 7262. It's the same CPU, just with less active cores.
flotus1 is offline   Reply With Quote

Old   June 30, 2020, 10:55
Default
  #9
Member
 
Join Date: Oct 2019
Posts: 63
Rep Power: 6
Habib-CFD is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
Wait a minute... you already tried an Epyc 7302p, right? If that's the case, forget about the 7262. It's the same CPU, just with less active cores.

You are right, you taught me how to scale the Flow 3D using 7302p . We concluded that the Flow 3D scale fall after 4 cores rapidly, and now they announce the new HPC release for in-house clusters!! I am still using the previous version due to license issues.


Anyway, the reason that I interest in EPYC line-up is low-temperature working conditions. The 7302p reach to a maximum of 42 degrees by 16 cores and 100 percent load, really awesome. I am planing a budget PC to use it for a long time with-out worry about VRM temperature or even liquid cooler etc. Something like used 7252 bundle. Best performance per budget (total 1000 to 1200$), not the fastest. I have not tested 10900x, no chance for additional budget if I miss the 24 hours workload condition. Every time that I check the X299 motherboards, the thick heat-sink warn me something might be very hot .
Habib-CFD is offline   Reply With Quote

Old   June 30, 2020, 17:45
Default
  #10
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Where do I start...

A decent sized heatsink is not a sign that a component runs hot. It is a sign that the manufacturer put some thought into the thermal design. At least if we are talking about actual heatsinks, and not those slabs of massive metal that some manufacturers started to put on their boards. This was one of the reasons why the first batch of X299 boards was universally torn apart by reviewers. Those were all form over function.
But still: these catastrophic results were mostly obtained on open-air test benches, with CPU water cooling. Meaning absolutely zero airflow for the VRMs, In a case with case fans and an air cooler on the CPU, the situation would have been much less severe.

VRM temperatures on boards for AMD Epyc are also a "hot" topic. VRMs on these boards are not as over-engineered as on most X299 boards. Meaning they need to operate closer to their maximum capacity, and thus running hotter. The tiny heatsinks, which mostly rely on plenty of airflow for adequate temperatures, don't help either.
VRMs on my H11DSi in a well-ventilated workstation case regularly went over 90°C before I went with water cooling. That's not an issue, but it shows that the VRM design and cooling solution is inferior to many consumer boards.

Which finally brings us to the CPU. Intel CPUs run hot, right? Well most of them don't, at least when they are not overclocked. In order to get an edge in marketing over their competitors, some motherboard manufacturers have resorted to overclocking CPUs out of the box. "Look, using our board, you will get 2% more performance than with brand x". And they do so with borderline insane voltage settings. With power limits enforced and voltages in a normal range, modern Intel CPUs can be quite easy to cool.
This also ties in with VRM temperatures. Of course the 18-core X299 SKU will run hot, and cause more heat in the VRMs. But your are looking at 8-10 cores, where temperatures are much easier to keep in check.

With all that out of the way, the go-to option for X299 is ASRock X299 Extreme 4. A sub-200$ part with adequate VRM cooling, capable of handling an 18-core CPU, so 8-10 cores will be no challenge. And pretty much all the features one would ever need.
Since you are searching for 8-core Epyc CPUs, the closer match would be the the I7-9800X for ~350$. A decent air cooler for 50$ or above will get the job done, along with some additional fans in the case.
Habib-CFD likes this.
flotus1 is offline   Reply With Quote

Old   June 30, 2020, 20:15
Default
  #11
Member
 
Join Date: Oct 2019
Posts: 63
Rep Power: 6
Habib-CFD is on a distinguished road
Dear Alex, Thank you for your precise statement.
Habib-CFD is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Memory bandwidth problem? MSF Hardware 6 May 8, 2019 07:28
Superlinear speedup in OpenFOAM 13 msrinath80 OpenFOAM Running, Solving & CFD 18 March 3, 2015 05:36
Lenovo C30 memory configuration and discussions with Lenovo matthewe Hardware 3 October 17, 2013 10:23
Optimal 32-core system bindesboll Hardware 17 July 9, 2013 10:58
CFX CPU time & real time Nick Strantzias CFX 8 July 23, 2006 17:50


All times are GMT -4. The time now is 15:02.