CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

Efficiency of dual socket node in Fluent

Register Blogs Community New Posts Updated Threads Search

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   August 20, 2018, 17:45
Default Efficiency of dual socket node in Fluent
  #1
New Member
 
Arthur Piquet
Join Date: Mar 2013
Posts: 18
Rep Power: 13
halowine is on a distinguished road
Hi everybody!

I’m running some scalability tests on a machine using Fluent v18.2 and I’ve found some strange behavior in bandwidth, latency and CP.

The machine is a HPC with 32 nodes of dual i5 with 16 cores each. Meaning 32 per node. 1024 overall. No hyper threading. The machine is not mine

First, on my scalability test I found that Fluent starts losing efficiency from 100k/proc. Isn’t that strange ? According to Fluent benchmark, efficiency starts to be bad around 10k/proc. My test case is 40m cells with keps, really simple. I just change the number of proc to reduce the nb cells per proc. I’m using MPi InfiniBand.

Secondly, when i’m testing bandwidth or latency on 32 proc on one node, I found that inter connectivity is bad between the two processor. Inside a processor (16 core) I have 10Mb/s but between the 2 proc I have 2Mb/s of bandwidth .
If I run my 40m case on 512 cores mapped as 32*16 (16 nodes - node full) It will take more time than mapping 16*32 (32 nodes - node half full).

My question , what the purpose of dual processor per node if only half is good to run my case? I’m losing half of my HPC here...

Maybe I need to activate something in the bios to improve the inter-processor connectivity ? Inter-node connectivity is good though (InfiniBand) ~7Mb/s

Thks!
halowine is offline   Reply With Quote

Old   August 21, 2018, 04:08
Default
  #2
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
In order to not draw false conclusions about inter-node scaling, you should first check for intra-node scaling. i.e. running the job on one node with 1-32 cores.
Here you will probably see one of the reasons for what you observed when running the cluster nodes full vs. half-full: Scaling on a single node will be less than ideal due to memory bandwidth limitations.
This should make it obvious why running the cluster nodes half full is more efficient compared to running half the nodes fully occupied. Running jobs like this is common practice in memory-bound HPC with per-core licensing.

Inter-socked bandwidth in a NUMA system will always be worse than intra-socket bandwidth. This is nothing to worry about, just a consequence of the implementation where data has to be sent over some kind of interconnect between the sockets.
Non-ideal scaling when comparing a fully occupied node vs. a half-occupied node does not necessarily stem from poor inter-socket bandwidth and latency. MPI with default settings should distribute the threads across both CPUs even when only half of all cores are used. Again, it is more likely a consequence of memory-bound execution, unless you pinned the 16 threads to the first CPU when running 16 threads per node.

Edit:
Quote:
First, on my scalability test I found that Fluent starts losing efficiency from 100k/proc. Isn’t that strange ? According to Fluent benchmark, efficiency starts to be bad around 10k/proc.
At which number of cells/core scaling deteriorates can depend on so many factors. Just taking arbitrary numbers from Ansys will lead to false conclusions. If you really want to check inter-node scaling on your system, run the benchmark with 32 cores per node and increase the number of nodes from 1-32.
Since the system is not yours, make sure that it is not running other heavy jobs during your tests. Both load on the nodes and the node interconnect could distort your findings.
flotus1 is offline   Reply With Quote

Reply

Tags
benchmarking, dual cpu, fluent, hpc


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Parallel fluent not using all processors specified Paul FLUENT 18 October 26, 2023 03:54
node values vs. no node values in fluent user0314 FLUENT 4 June 30, 2019 04:14
Fluent refuse to start, because... "Error during socket creation"? Sataha FLUENT 4 February 26, 2018 10:51
Problem in using parallel process in fluent 14 Tleja FLUENT 3 September 13, 2013 10:54
RE: Edge Node distances in Fluent Ashutosh Joshi FLUENT 0 December 25, 2000 23:18


All times are GMT -4. The time now is 17:38.