CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

Need help with subpar performance

Register Blogs Community New Posts Updated Threads Search

Like Tree3Likes
  • 1 Post By rnburne
  • 2 Post By rnburne

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   March 22, 2022, 13:00
Default Need help with subpar performance
  #1
Member
 
Ron Burnett
Join Date: Feb 2013
Posts: 42
Rep Power: 13
rnburne is on a distinguished road
A description of the recently purchased machine......

HP DL560 G8
4x E5-4610v2
16x 8Gb 2Rx4 PC3L-12800R-11-12 (discovered that one module is actually 16Gb but otherwise identical)
P420i raid controller
750 watt power supply

........to which I added a 240 Gb SSD, Ubuntu 20.04, and OF8 (from OpenFoam.org). All bios settings have
been biased toward "performance", cooling adjusted to "enhanced" (loaded cpu temps < 52 C),
interleaving enabled, hyperthreading off, all DIMM's in the correct socket (with this machine it's the
white ones).
Using bench_template_v02, it's obvious something is not right when compared with other older
four socket machines such as Kailee (post #416), wildemam (#339), Morland (#158), kstuart (#260).
Using the command < watch -n1 "grep "^[c]pu MHz" /proc/cpuinfo" > , which allows monitoring all
cores in real time, shows a consistant 2.49GHz under load.

Can anyone shed some light on the problem?
Attached Files
File Type: txt run results.txt (586 Bytes, 3 views)
File Type: txt dmidecode.txt (26.6 KB, 2 views)
File Type: txt lshw.txt (154.8 KB, 4 views)
File Type: txt numactl.txt (487 Bytes, 3 views)
flotus1 likes this.
rnburne is offline   Reply With Quote

Old   March 22, 2022, 14:21
Default
  #2
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
At first glance, nothing obvious stands out.
I'm not a huge fan of mismatching memory. It can lead to weird performance regression. So if you could get your hands on another identical 8GB stick, it would rule out one potential issue.
On the software level, another interesting test might be probing each CPU individually. I.e. running the simulation with 8 threads, pinned to the hwthreads of each CPU. 4 tests total.

Same can be done with other synthetic benchmarks, like stream or this one here: Benchmark fpmem

Have you tried clearing caches first, and then directly running the benchmark on 32 threads?

On a completely unrelated note: I wish more posts here were as thought out as yours.
flotus1 is offline   Reply With Quote

Old   March 22, 2022, 20:23
Default
  #3
Member
 
Ron Burnett
Join Date: Feb 2013
Posts: 42
Rep Power: 13
rnburne is on a distinguished road
Alex, the memory mismatch was disappointing, especially since I tried to impress upon the tech person the need for uniformity. It's easy enough to correct and maybe a software test will point the way. My knowledge of mpi commands is limited, what does it take to check each cpu by itself?

Clearing caches, yes I've tried that with no change.

I appreciate the compliment ......and your help.
rnburne is offline   Reply With Quote

Old   March 22, 2022, 22:02
Default The test indicates a problem with your memory speed
  #4
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 316
Rep Power: 12
wkernkamp is on a distinguished road
Your technician may have used the 16Gb because one of the 8Gb he was planning to use turned out bad. If one was bad another could also.


You were specific on everything but the BIOS settings. Look at all memory related settings. Use auto where possible. There is rank interleaving and bank interleaving. With unequal size Dimms, you risk hurting performance when forcing bank interleaving.


Try forcing a lower speed for the memory. A lot of the higher speed units started life at a lower clock, but were recently reprogrammed for the higher speed (by the clever Chinese). At 1333 MHz, your machine should still benchmark just above 70 seconds, I would think.
wkernkamp is offline   Reply With Quote

Old   March 23, 2022, 02:42
Default
  #5
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Quote:
Originally Posted by rnburne View Post
It's easy enough to correct and maybe a software test will point the way. My knowledge of mpi commands is limited, what does it take to check each cpu by itself?
Running on each CPU separately, especially the one that forms the NUMA node with the 16GB DIMM, should show is if this is part of the problem.
According to numactl output, your OS mapped threads to cores in order. I.e. cores on the first socket/NUMA node are numbered 0-7 and so on.
There are a few ways to bind threads with MPI, one of them is "cpu-set"
Code:
mpirun --cpu-set 24-31 --bind-to core -np 8 ...
Should run the benchmark on the fourth socket only.
flotus1 is offline   Reply With Quote

Old   March 29, 2022, 12:46
Default
  #6
Member
 
Ron Burnett
Join Date: Feb 2013
Posts: 42
Rep Power: 13
rnburne is on a distinguished road
The mismatched module was indeed the problem.....running a benchmark on
each CPU indicated as much. It was replaced with one that matches the
other 15 and wow, what a difference. New results are posted in the benchmark thread.



Quote:
Your technician may have used the 16Gb because one of the 8Gb he was planning to use turned out bad.
The performance loss may have been due to a faulty module, the mismatch itself or both.
At some point in the future I may buy a new 16 Gb module and rerun everything.



Quote:
There is rank interleaving and bank interleaving
Settings are as follows: Channel interleaving.....enabled, node interleaving....disabled.



Quote:
At 1333 MHz, your machine should still benchmark just above 70 seconds
Out of curiosity, I did just that. With the new module it ran in 61 seconds.
flotus1 and wkernkamp like this.
rnburne is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
General recommendations for CFD hardware [WIP] flotus1 Hardware 18 February 29, 2024 12:48
If memory bound : how to improve performance? aerosayan Main CFD Forum 13 July 7, 2021 05:44
Abysmal performance of 64 cores opteron based workstation for CFD Fauster Hardware 8 June 4, 2018 10:51
Openfoam parallel calculation performance study - Half performance on mpirun Jurado OpenFOAM Running, Solving & CFD 22 March 24, 2018 20:40
parallel performance on BX900 uzawa OpenFOAM Installation 3 September 5, 2011 15:52


All times are GMT -4. The time now is 10:19.