CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

Low CFD performance on two socket motherboard

Register Blogs Community New Posts Updated Threads Search

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   March 21, 2023, 11:08
Default Low CFD performance on two socket motherboard
  #1
Member
 
Georgy
Join Date: Apr 2011
Location: Russia
Posts: 32
Rep Power: 15
gera is on a distinguished road
Hej everyone.

I have a computer with two two socket motherboard, i.e. two CPU on one motherboard.
Each CPU has 24 cores, so 48 cores in total. Hypertheading is off in BIOS.
If I run 12 cores (threads) then CPU load is about 40 %.
If I run 24 cores (threads) then CPU load is about 80 %.
The computational time reduces approximately two times.

If I run all 48 cores (threads) then CPU load is 100% and the computational time reduces approximately by factor of 1.2 (benefit is approximately 20%). So, two times cores (threads) increase leads to 20% increase in performance.

Does anyone know why is that? And, Is there any way to improve performance anyhow?
gera is offline   Reply With Quote

Old   March 21, 2023, 12:22
Default
  #2
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,400
Rep Power: 47
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Some amount of less-than-linear scaling is to be expected.
If we want to get to the bottom of this, we need more information:
1) Please be more specific about the hardware. Is it some OEM workstation? Which CPUs exactly? How is the memory populated?
2) Which software are you running? Operating system, CFD solver...
3) More information about the case you are testing with. Especially cell count.
flotus1 is offline   Reply With Quote

Old   March 22, 2023, 08:25
Default
  #3
Member
 
Georgy
Join Date: Apr 2011
Location: Russia
Posts: 32
Rep Power: 15
gera is on a distinguished road
flotus1,

Thank you for the prompt reply.



Yes, I agree some deviation from linear scaling is usual, but it is not as big as I have.



1)

Motherboard is ASUS Z11 PR-D16.
CPU is Intel Xeon Gold 5220R. Two indentical CPUs.
RAM memory is Kingston ksm26rs4/32hai.

DDR4. 16 slots. Each slot is 32 GB.


Could you please explain in more detail what OEM is?



2)

Operating system is Windows Server 2016.
Solver is ANSYS CFX.


3)

The mesh has approximately 60 million hexa cells (~6e7 cells) without interfaces.
gera is offline   Reply With Quote

Old   March 22, 2023, 08:42
Default
  #4
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,400
Rep Power: 47
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
By OEM I mean one of the large PC companies like Dell, HP, Lenovo and the likes.
Since you have a regular old retail motherboard, we don't need that information.

The mesh is definitely large enough to not cause scaling issues here. I see a few other possibilities, apart from the obvious one: memory bandwidth limit
1) Your CPUs have 6 memory channels each. Yet you have 16 DIMMs installed. This *should* not have a huge impact on performance, but it is certainly not ideal for HPC. And god only knows what happens with memory management on Windows.
For testing, you could remove the 4 DIMMs sitting in the black memory slots.
2) Cooling issues.
I don't know how good cooling is in your workstation. There is always the possibility that some temperatures exceed their limits when using all computing resources.
Please note that thermal problems are not limited to CPU temperatures. It could also be memory modules overheating, or some VRM on the motherboard. If you can't find a way to monitor all relevant temperatures, a good compromise is monitoring CPU frequency while the benchmark is running.
3) Lots of other tasks running in the background?
4) Some hardware defect, like missing memory channels or defective DIMMs.

But just to clarify: when you are running on let's say 20 threads, you are already using both of your CPUs by default. That's one of the reasons why scaling drops off sharply when approaching the maximum number of threads.
If you were to bind a simulation with 20 threads exclusively to cores on the first CPU, and then compare to a run with 40 threads without core binding, scaling would look much better. Despite getting the same performance on 40 threads.
flotus1 is offline   Reply With Quote

Old   March 22, 2023, 10:04
Default
  #5
Member
 
Georgy
Join Date: Apr 2011
Location: Russia
Posts: 32
Rep Power: 15
gera is on a distinguished road
1) Yes, it is good idea. I also suspected this implicitly.
2) Cooling seems to be fine. I'll monitor.
3) Background: total commander and CFX-pre. No, it is not the reason.
4) No, it was checked. No errors found. I do not think it is the reason.

Do you know how to understand what core on what CPU? For example, 1-20 cores belong to the first CPU and 21-40 cores belong to the second CPU. Or, odd cores belong to the first CPU and even cores to the second one.

Is it possible to indicate to ANSYS CFD what CPU to use?
gera is offline   Reply With Quote

Old   March 22, 2023, 11:52
Default
  #6
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,400
Rep Power: 47
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Quote:
Do you know how to understand what core on what CPU? For example, 1-20 cores belong to the first CPU and 21-40 cores belong to the second CPU
That was the default on every system I ever checked, with SMT disabled at least. Though I don't think it is guaranteed.
On Linux, tools like lstopo tell you which logical threads belong to which socket and core. Not sure if similar tools exist for Windows.

Quote:
Is it possible to indicate to ANSYS CFD what CPU to use?
Maybe. On Linux, you could try with e.g. taskset. On Windows, you can set process affinity via task manager, after a program has already started. Which would be a horrible idea for HPC, because at that point, memory allocation is already done. And there is no guarantee, that programs adhere to this setting anyway.
flotus1 is offline   Reply With Quote

Old   March 22, 2023, 16:21
Default
  #7
Senior Member
 
Erik
Join Date: Feb 2011
Location: Earth (Land portion)
Posts: 1,171
Rep Power: 23
evcelica is on a distinguished road
It's #1. Unbalanced Memory configuration is surely a problem here. The machine will be much faster and scale better with a balanced memory configuration.

For checking which CPU is being utilized, in task manager, go to the performance tab, then right click on the graphs and click change to >> Numa Nodes. Then it will show you CPU0 and CPU1 utilization.
evcelica is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Low performance xeon x5472 smb Hardware 1 March 30, 2017 11:35
How to assess performance of a CFD Application? HectorRedal Main CFD Forum 24 November 30, 2016 11:46
CFD Online Celebrates 20 Years Online jola Site News & Announcements 22 January 31, 2015 00:30
Performance comparison for CFD FEM solver HectorRedal Main CFD Forum 3 July 26, 2012 10:07
999999 (../../src/mpsystem.c@1123):mpt_read: failed:errno = 11 UDS_rambler FLUENT 2 November 22, 2011 09:46


All times are GMT -4. The time now is 19:44.