CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

Memory bandwidth problem?

Register Blogs Community New Posts Updated Threads Search

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   May 7, 2019, 11:11
Default Memory bandwidth problem?
  #1
MSF
New Member
 
Join Date: Apr 2014
Location: Germany
Posts: 24
Rep Power: 12
MSF is on a distinguished road
Hi,


our lab has a workstation with two Intel Gold 6150 (each 18 cores) and 6x 16GB DDR4-2666 ECC. Each processor has 6 memory channels with in total makes 12. So only half of them are in use.

I have to run a large number of rather small cases ~ 40.000 cells, so I run the cases in parallel by starting e.g. 20 simulations at once. I noticed that the simulations get very slow (need more than 3 times as long) if I start more than ~ 24 simulations at once. Our workstations has more than 24 cores, so I do not think that the processor is the bottleneck.I read at lot about memory bandwidth problems in this forum so I was wondering if this is one. I therefore removed 4 of the DIMMs and ran 20 Simulations at once. I expected the 20 simulations to run slower but the simulations weren't getting much slower (only 5-10%). Is it a memory bandwidth problem? Anyone an idea where the bottleneck is?






Best,
Moritz
MSF is offline   Reply With Quote

Old   May 7, 2019, 14:51
Default
  #2
Senior Member
 
Erik
Join Date: Feb 2011
Location: Earth (Land portion)
Posts: 1,171
Rep Power: 23
evcelica is on a distinguished road
Get 12 identical memory modules and populate them correctly (in the correct slots per your motherboard manual)
Check your motherboard manual for the best way to balance only 6 slots, (3 per CPU) but you really should be using all 12 channels.
evcelica is offline   Reply With Quote

Old   May 7, 2019, 15:16
Default
  #3
New Member
 
Joshua Brickel
Join Date: Nov 2013
Posts: 26
Rep Power: 12
JoshuaB is on a distinguished road
See https://lenovopress.com/lp0742.pdf This will show you why populating only half the memory channels is, to put it mildly, not good.
JoshuaB is offline   Reply With Quote

Old   May 7, 2019, 16:56
Default
  #4
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,400
Rep Power: 47
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
The fact that the simulations do not become much slower when you remove DIMMs would normally indicate that memory bandwidth is not an issue here. Then again, we don't know whether the 6 DIMMs that are populated now are populated correctly to give at least 2x triple-channel. Did you check for that?
What might also be happening with these very small cases: one computation mostly fits into L3 cache. Running more of them at once causes more and more cache-misses.
Or you could have a thermal issue where stressing more cores leads to very low CPU frequencies. This would need to be checked.
Or for some reason several of your smaller simulations get scheduled on the same physical cores. Again, something that needs to be investigated before drawing conclusions. Disabling Hyperthreading could be one way to solve this. Or on Linux you could use taskset to make sure each computation gets pinned to a different physical core.
After that, I would highly recommend getting 12 identical DIMMs and making sure they get populated correctly.
flotus1 is offline   Reply With Quote

Old   May 8, 2019, 06:22
Default
  #5
MSF
New Member
 
Join Date: Apr 2014
Location: Germany
Posts: 24
Rep Power: 12
MSF is on a distinguished road
I checked the following:


1. The DIMMs are populated correctly. (And as soon as there is some money I will ask for more DIMMs.)
2. The simulations are not running on the same physical core.
3. The temperature of the CPU is at 90°C (lm_sensors) but the CPU is not throttling. cat /proc/cpuinfo | grep "MHz" tells me that all cores are running with around 3400 MHz.(By the way why is the CPU running at 3400 MHz when the model name is Gold 6150 CPU @ 2.70GHz ? )


So I conclude that the bottleneck is the L3 cache size. Is there anything I can do?



Thanks to all.


Best,
Moritz
MSF is offline   Reply With Quote

Old   May 8, 2019, 07:16
Default
  #6
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,400
Rep Power: 47
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Quote:
(By the way why is the CPU running at 3400 MHz when the model name is Gold 6150 CPU @ 2.70GHz ? )
2.7GHz is the base clock speed for all cores running non-AVX code. The CPU can decide to clock higher than that depending on parameters like power consumption, number of cores active, temperature and a few others. It is called turbo boost, all modern CPUs have some variant of this feature.

Quote:
So I conclude that the bottleneck is the L3 cache size. Is there anything I can do?
All you could do with your current hardware is try and find the optimum amount of simulations for maximum throughput. And make sure that the simulations are distributed equally across the physical CPUs. Again, taskset is your friend.
Aside from that, fully populating all memory channels will help at least a bit. Last level cache misses mean that the data has to be funnelled through memory.

Edit, I forgot one thing: another possible bottleneck could be data I/O. In case your simulations read or write a lot of data from disk. Or perform a high number of small reads/writes.
flotus1 is offline   Reply With Quote

Old   May 8, 2019, 07:28
Default
  #7
MSF
New Member
 
Join Date: Apr 2014
Location: Germany
Posts: 24
Rep Power: 12
MSF is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
All you could do with your current hardware is try and find the optimum amount of simulations for maximum throughput. And make sure that the simulations are distributed equally across the physical CPUs. Again, taskset is your friend.
Aside from that, fully populating all memory channels will help at least a bit. Last level cache misses mean that the data has to be funnelled through memory.

Edit, I forgot one thing: another possible bottleneck could be data I/O. In case your simulations read or write a lot of data from disk. Or perform a high number of small reads/writes.

Thanks for the advice. I/O is not an issue.
MSF is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Memory bandwidth and memory interleaving Sly Hardware 2 February 19, 2015 13:41
Lenovo C30 memory configuration and discussions with Lenovo matthewe Hardware 3 October 17, 2013 10:23
RAM memory problem alpha Main CFD Forum 8 February 12, 2008 11:07
"Memory too low" problem with Fluent HELP NEEDED Amr FLUENT 6 May 8, 2006 12:06


All times are GMT -4. The time now is 05:32.