CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

Ram Bandwith on Dual Cpu Config.

Register Blogs Community New Posts Updated Threads Search

Like Tree3Likes
  • 2 Post By flotus1
  • 1 Post By evcelica

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   January 7, 2023, 03:14
Default Ram Bandwith on Dual Cpu Config.
  #1
New Member
 
Tarık Yaman
Join Date: Dec 2021
Location: Turkey
Posts: 9
Rep Power: 4
dord is on a distinguished road
Hello Everyone,

We have determined 3 processors to be used in our workstations.
It will be used as dual cpu configuration on our workstation.


Software : (Ansys Cfx, Fluent, Mechanical)
There are no problem about hpc license.



These are :

-Xeon Gold 2nd Gen 5218R
-Xeon Gold 2nd Gen 5220R
-Xeon Gold 2nd Gen 6230RIn order to use these processors at full performance, that is,
to avoid bottlenecks, we need to load optimal memory.

The 6230R model processor supports up to 6 channels of DDR4-2993 memory type. (Max Memory Size 1TB).
5220R and 5218R model processors support up to 6 channels in DDR4-2667 memory type. (Max Memory Size 1TB).

My first question: We will use two CPUs in our workstation. If we use two CPUs, can we multiply the max memory size by 2?
So if we have 2TB ram on the motherboard, can we use it efficiently?

My second question: If there are 16 ram sockets on the motherboard and all of them have ram sticks, will there be octa-channel performance from rams?
Will this be good or bad?

There are some calculations on the internet about therotical maximum memory bandwith and processor compatibility.

For example ;
Memory (12 Unit 64 GB 2933 MHz DDR4 ECC)


Memory Type : DDR4
Frequency : 2933 MegaHertz
Channel Number : 6 (Hexa Channel)

2933*8(Bit Number)*6 = 140800 MB/s = 140 GB/s

Cpu

Bus Speed : 10.4 GT/s (Intel Ultra Path Interconnect) (Gold 6230R)

10.4 GT/s = 83.2 GB/s


My third question: Do we need to multiply the Bus speed by 2 when we are going to use dual processors? When we multiply, will the memory bandwith (140 GB/s) be insufficient since 83.2*2 = 166.4 GB/s?



My fourth question: If there are 16 ram slots on the motherboard and if all of them are filled, if the number of channels is 8, will the ram bandwith 186.6 GB/s?


Thank you.
dord is offline   Reply With Quote

Old   January 7, 2023, 04:20
Default
  #2
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Quote:
My first question: We will use two CPUs in our workstation. If we use two CPUs, can we multiply the max memory size by 2?
So if we have 2TB ram on the motherboard, can we use it efficiently?
Yes, these days having two CPUs also doubles the maximum supported memory capacity you find in the CPUs spec sheet.
No, 2TB of memory can not be used effectively with the CPUs you picked. They have 6-channel memory controllers, there is no way to get a balanced memory population of 2TB on 12 memory channels.

Quote:
My second question: If there are 16 ram sockets on the motherboard and all of them have ram sticks, will there be octa-channel performance from rams?
Will this be good or bad?
No, motherboards with 16 DIMM slots are not an ideal fit for these CPUs.
Best case: you only fill the correct 12 slots and have a performance impact that is barely measurable. It stems from the trace layout of the DIMM slots.
Worst case: You fill all 16 DIMMs and end up with weird and confusing performance issues.

Side-note here on socket interconnects:
Quote:
Cpu
Bus Speed : 10.4 GT/s (Intel Ultra Path Interconnect) (Gold 6230R)
10.4 GT/s = 83.2 GB/s
The CPU has two of these UPI links, doubling the theoretical inter-socket bandwidth of these CPUs. Whether both links are used depends on the implementation of the of the motherboard.
But it doesn't matter a whole lot anyway: that's non-uniform memory access, which should be avoided anyway for memory sensitive applications. With the system configured correctly (no socket interleaving), Fluent and CFX mostly access local memory, with very limited traffic between sockets.

Quote:
My third question: Do we need to multiply the Bus speed by 2 when we are going to use dual processors? When we multiply, will the memory bandwith (140 GB/s) be insufficient since 83.2*2 = 166.4 GB/s?
Yes, that is the nice thing about dual CPU, compared to just getting a single CPU with more cores.
Two CPUs effectively double the available memory bandwidth.

Quote:
My fourth question: If there are 16 ram slots on the motherboard and if all of them are filled, if the number of channels is 8, will the ram bandwith 186.6 GB/s?
No, creating an unbalanced memory population does not increase performance. Quite the opposite.


With that out of the way: you are obviously putting a lot of thought into this. So let me give you some pointers beyond your original questions.
1) IF you are determined to use one of these CPUs, get a compatible motherboard with 12 DIMM slots. And fill each of the slots with identical DIMMs. E.g. 12x16GB, 12x32GB, 12x64GB...
2) Also check the memory support section of your motherboard. I am reading between the lines that you want to use A LOT of memory, in the TB range. You probably need LRDIMM for that. May I ask why you need so much memory?
3) There are better CPUs available for your intended application, from both Intel and AMD.
Intel has their newer "Ice Lake" CPUs which have 8-channel DDR4-3200 memory controllers.
And AMD has Epyc "Milan", "Milan-X" (with larger L3 caches, perfect for CFD and FEA) as well as the latest Epyc "Genoa" which support 12-channel DDR5-4800. The latter is quite expensive though, especially if you need a lot of RAM.
A side-effect: all of these CPUs support much larger memory capacity than the "Cascade Lake" CPUs you picked.
wkernkamp and Crowdion like this.
flotus1 is offline   Reply With Quote

Old   January 7, 2023, 19:15
Default
  #3
New Member
 
Tarık Yaman
Join Date: Dec 2021
Location: Turkey
Posts: 9
Rep Power: 4
dord is on a distinguished road
Thank you for your respond, Alex.

Quote:
2) Also check the memory support section of your motherboard. I am reading between the lines that you want to use A LOT of memory, in the TB range. You probably need LRDIMM for that. May I ask why you need so much memory?
This workstation will be taken for use in structural, cfd, dynamic analysis. The number and type of mesh may vary depending on the type of analysis and may reach 100-200 million cells.

Many websites say at least 2GB of ram per 1 million cells and at least 8Gb of memory for each core. (see: https://www.ansys.com/blog/hardware-...ate-simulation)

But the current situation seems to be using 12*64 Gb (768 Gb) memory.

We do not want to get an insufficient memory error.

Quote:
3) There are better CPUs available for your intended application, from both Intel and AMD.
Intel has their newer "Ice Lake" CPUs which have 8-channel DDR4-3200 memory controllers.
And AMD has Epyc "Milan", "Milan-X" (with larger L3 caches, perfect for CFD and FEA) as well as the latest Epyc "Genoa" which support 12-channel DDR5-4800. The latter is quite expensive though, especially if you need a lot of RAM.
A side-effect: all of these CPUs support much larger memory capacity than the "Cascade Lake" CPUs you picked.
Our budget is 12500 US Dollars. But if you have a better suggestion for this price, I'm always welcome.

--

The reason why I pay attention so much on memory bandwidth and processor compatibility: I have a workstation of my own. 2 X Xeon E5 2682 V4 and 8*32 GB 2400 Mhz Memory. (Totally 32 Core&64 Thread | 256 GB ram (Quad Channel per Cpu)).

While running analysis in cfx, I get the fastest result when I set the partition value to 16. When I set it to 64 it almost doubles the time.

Ansys offical web-site say : Selecting a processor with the highest number of cores is usually not recommended because it can negatively affect memory bandwidth if the CPU memory isn’t increased along with the core count. A large number of cores may decrease the performance of CFX, Fluent and LS-DYNA, which usually run on large clusters.

End-
dord is offline   Reply With Quote

Old   January 7, 2023, 19:53
Default
  #4
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
No need to explain the importance of memory bandwidth to me
Recommendations for "amount of memory per core" can be ignored. Maybe that was important 20 years ago. But all you need is enough memory to fit your largest models.

Quote:
Our budget is 12500 US Dollars. But if you have a better suggestion for this price, I'm always welcome.
What you get for 12500$ hugely depends on where you buy.
12500$ for parts will get you much better hardware than paying the same amount for an OEM workstation, just to state the obvious.
If you buy from an OEM or SI, and you need tons of memory, you can make a lot of room in your budget buy getting a minimal memory configuration, and upgrading RAM yourself. DDR4 is ridiculously cheap these days. Less than 3€/GB for DDR4-3200 reg ECC https://geizhals.eu/samsung-rdimm-64...7.html?hloc=de
OEMs will charge much more than that.

Either way, I would still recommend you get more recent CPUs. Either Intel Xeon "Ice Lake" (Xeon Gold 63xx) or AMD Epyc "Milan" (Epyc 7xx3). They both have 8-channel DDR4-3200 memory controlles, and have newer/faster cores than Cascade Lake.
Which CPUs exactly can be squeezed into your budget depends on where you buy.

Quote:
While running analysis in cfx, I get the fastest result when I set the partition value to 16. When I set it to 64 it almost doubles the time.
That's a different issue. It is still recommended to turn SMT off for dedicated CFD/FEA workstations. If you leave it on, you can still get similar performance. But only if you take full control of thread binding. It's just less hassle to turn it off.
flotus1 is offline   Reply With Quote

Old   January 8, 2023, 11:59
Default
  #5
New Member
 
Tarık Yaman
Join Date: Dec 2021
Location: Turkey
Posts: 9
Rep Power: 4
dord is on a distinguished road
Quote:
Either way, I would still recommend you get more recent CPUs. Either Intel Xeon "Ice Lake" (Xeon Gold 63xx) or AMD Epyc "Milan" (Epyc 7xx3). They both have 8-channel DDR4-3200 memory controlles, and have newer/faster cores than Cascade Lake.

1 * Amd Epyc 7763 & 8*64 GB 3200 Mhz DDR4


What do you think of the above configuration? Can we use all cores with full performance?

Isn't the memory bandwith insufficient according to the processor?

Will this cause a bottleneck if all cores are used?
dord is offline   Reply With Quote

Old   January 8, 2023, 13:24
Default
  #6
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
64 cores is too much for these CPUs. 2x32 cores would be much better. Even 2x24 cores would be preferable.
flotus1 is offline   Reply With Quote

Old   January 9, 2023, 03:51
Default
  #7
New Member
 
Tarık Yaman
Join Date: Dec 2021
Location: Turkey
Posts: 9
Rep Power: 4
dord is on a distinguished road
Dear Flotus1, your advice is very valuable to us.

Thanks to your advice, we are looking for new processors. Especially 4th gen amd processors. (9004series)


4th gen AMD processors stand out from other competitors with their core frequencies, cache amounts, and the number and type of memory channels they support.


We created some configs as our budget allows. We'd love to hear from you and other readers (if any, I don't think so).

2 X 9174F & 24 X 32 GB DDR5 4800 MHZ
2 X 9274F & 24 X 32 GB DDR5 4800 MHZ
2 X 9254 & 24 X 32 GB DDR5 4800 MHZ

**[Why is the 9274F more expensive than the 9174F? Everything is the same, just a slight change in core frequencies.]

In addition, why and how important is the l3 cache value? Can you feel a very serious difference between 128 and 256 MB?
dord is offline   Reply With Quote

Old   January 9, 2023, 04:08
Default
  #8
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
I recently wrote a buyers guide specifically for AMD Genoa: AMD Epyc 9004 "Genoa" buyers guide for CFD
If your budget allows it, I would definitely recommend the versions with 256MB of L3 cache, i.e. 8 active CCDs. Before dropping down to the lower tier SKUs with 128MB L3, I think it is better to look for discounted higher-end Milan systems.

Quote:
**[Why is the 9274F more expensive than the 9174F? Everything is the same, just a slight change in core frequencies.]
I guess you mean the other way round, why is the 16-core 9174F more expensive than the 24-core 9274F. It's a software license thing. There are other enterprise software packages where the license cost depends on the amount of physical cores in the system. Hence having fewer cores even with slightly higher frequency pays off.
Nothing we need to worry about, just avoid the 9174F.

While they are the latest and greatest, Genoa with a lot of RAM will be way over your original budget. No problem if you can stretch your budget this far.
A good compromise is Milan-X with the increased L3 cache. The 32-core 7573X is kind of good value these days at less than 4000€ retail. And you can use cheap DDR4 memory with it.
flotus1 is offline   Reply With Quote

Old   February 8, 2023, 10:33
Default
  #9
New Member
 
Tarık Yaman
Join Date: Dec 2021
Location: Turkey
Posts: 9
Rep Power: 4
dord is on a distinguished road
We decided to buy a system with Dual 75F3 (2*32 Core, 2*64 Thread). We will install 16 * 64 GB on this system. We preferred the 75F3 instead of the 74F3 due to its budget convenience. Although the number of cores has increased, our base frequency has decreased. The cache sizes are the same.

Hopefully, we can effectively use all the cores and not experience bottlenecks.

Dear Flotus, thank you for all your advice. I will share the benchmark results with you and if there are any tests you want, I will share their results with you.
dord is offline   Reply With Quote

Old   February 8, 2023, 12:44
Default
  #10
Senior Member
 
Erik
Join Date: Feb 2011
Location: Earth (Land portion)
Posts: 1,167
Rep Power: 23
evcelica is on a distinguished road
Flotus1 Mentioned this already, but it may have been overlooked. When you ran 64 threads for the benchmark, you are using twice the amount of physical cores you have. You do not want to work on virtual cores. It would be beneficial to disable hyperthreading in the BIOS so the OS only uses the 32 physical cores. This is called something else in the BIOS for AMD chips (SMT).
wkernkamp likes this.
evcelica is offline   Reply With Quote

Old   March 3, 2023, 08:57
Default
  #11
Senior Member
 
Blanco
Join Date: Mar 2009
Location: Torino, Italy
Posts: 193
Rep Power: 17
Blanco is on a distinguished road
Hi all,

thanks for the interesting discussion.

Flotus I have a question:

Quote:
Originally Posted by flotus1 View Post
I recently wrote a buyers guide specifically for AMD Genoa: AMD Epyc 9004 "Genoa" buyers guide for CFD
The 32-core 7573X is kind of good value these days at less than 4000€ retail. And you can use cheap DDR4 memory with it.
Do you think the extra cost associated w/ 7573X compared to 75F3 (+1000€/CPU) is worth the additional L3 cache? Or, in other words, will the bigger L3 have a direct impact on simulation speed? Let's think about a 2 socket configuration w/ no other bottlenecks, w/ the same mesh and models
Blanco is offline   Reply With Quote

Old   March 3, 2023, 10:19
Default
  #12
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
75F3: 4600€ https://geizhals.eu/amd-epyc-75f3-10...-a2491883.html
7573X: 3700€ https://geizhals.eu/amd-epyc-7573x-1...-a2697336.html

Even if it was the other way round due to regional prices or availability, the 7573X would still be worth it. At the high end, factoring in all platform costs, it is still the best price/performance CPU.
Yes, the huge L3 caches make the 7573X the faster CPU for CFD and FEA. By how much exactly depends on the case. But definitely enough to warrant 1000€ more per CPU, if you are shopping in that price region anyway.
flotus1 is offline   Reply With Quote

Reply

Tags
bandwith, cpu memory


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Any ideas on the Penalty for dual CPU and infiniband JoshuaB Hardware 3 July 3, 2018 13:00
Superlinear speedup in OpenFOAM 13 msrinath80 OpenFOAM Running, Solving & CFD 18 March 3, 2015 05:36
Star cd es-ice solver error ernarasimman STAR-CD 2 September 12, 2014 00:01
Dual cpu workstation VS 2 node cluster single cpu workstation Verdi Hardware 18 September 2, 2013 03:09
New workstation for different usage scenarios - CPU and RAM natem Hardware 6 August 7, 2013 02:47


All times are GMT -4. The time now is 04:49.