CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM > OpenFOAM Running, Solving & CFD

Case running extremely slow on cluster in parallel mode

Register Blogs Community New Posts Updated Threads Search

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   November 10, 2021, 03:34
Default Case running extremely slow on cluster in parallel mode
  #1
Member
 
Venkat Ganesh
Join Date: May 2020
Location: Cincinnati, Ohio
Posts: 49
Rep Power: 5
Venky_94 is on a distinguished road
Hello,

This is my first time running openFoam on a cluster and I noticed that my simulation was running much slower on the cluster using multiple processors than on my local PC.

I did a scaling analysis by noting down the execution time for around 50 iterations and averaged them (for different processor counts). I noticed that the performance sharply deteriorates after just increasing the processor count to 4. I've attached a photo of the timings and efficiency.

For comparison, the execution time is less than a second on my local PC while using 8 cores. A single node on the cluster has 40 processors, and I'd like to make use of the available computing power to speed up my simulation. I'm using scotch method of decomposition and a custom solver based on interFoam. I'm also attaching the case setup. Please let me know what I could change to improve the performance.
Attached Images
File Type: jpg Scaling Analysis.JPG (39.4 KB, 21 views)
Attached Files
File Type: zip VOF.zip (16.1 KB, 3 views)
Venky_94 is offline   Reply With Quote

Old   November 10, 2021, 04:03
Default
  #2
Senior Member
 
Santiago Lopez Castano
Join Date: Nov 2012
Posts: 354
Rep Power: 15
Santiago is on a distinguished road
Quote:
Originally Posted by Venky_94 View Post
Hello,

This is my first time running openFoam on a cluster and I noticed that my simulation was running much slower on the cluster using multiple processors than on my local PC.

I did a scaling analysis by noting down the execution time for around 50 iterations and averaged them (for different processor counts). I noticed that the performance sharply deteriorates after just increasing the processor count to 4. I've attached a photo of the timings and efficiency.

For comparison, the execution time is less than a second on my local PC while using 8 cores. A single node on the cluster has 40 processors, and I'd like to make use of the available computing power to speed up my simulation. I'm using scotch method of decomposition and a custom solver based on interFoam. I'm also attaching the case setup. Please let me know what I could change to improve the performance.
Pretty common when running unstructured codes on single nodes. This is basically due to the number of channels and the L2/L1 cache of your blade. My suggestion: run the same performance analysis, but using the node as the smallest "cpu unit". That is: use 40, 80, 120 processors.
Santiago is offline   Reply With Quote

Old   November 15, 2021, 10:56
Default
  #3
Member
 
Venkat Ganesh
Join Date: May 2020
Location: Cincinnati, Ohio
Posts: 49
Rep Power: 5
Venky_94 is on a distinguished road
Quote:
Originally Posted by Santiago View Post
Pretty common when running unstructured codes on single nodes. This is basically due to the number of channels and the L2/L1 cache of your blade. My suggestion: run the same performance analysis, but using the node as the smallest "cpu unit". That is: use 40, 80, 120 processors.
Thanks for your suggestion. I tried it out and noticed that while all 3 options are faster than my local PC, I had the best results while using a single node. I apologize for the delayed response. I had quite a large wait time for the jobs to run.

I ran all three options for 24 hours and the completed runtimes for the case were 2.5s on single node (40 processors), 2s on 80 processors and 1.9s on 120 processors.

While the performance of single node is good enough for my case currently, I'm genuinely curious about the results.
  1. Why is it that the performance was so poor for 20 cores but much improved for 40 cores? Is it because the complete system is available at its disposal enabling openfoam to have access to more memory?
  2. Also why is 40 cores giving a better performance than higher core count? Is it because of higher communication times between the cores (not enough cells/cores situation)
  3. And finally what would I have to do if I need to further quicken my simulation?

Last edited by Venky_94; November 19, 2021 at 23:30.
Venky_94 is offline   Reply With Quote

Reply

Tags
cluster, parallel


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Running parallel case after parallel meshing with snappyHexMesh? Adam Persson OpenFOAM Running, Solving & CFD 0 August 31, 2015 22:04
Fluent 14.0 file not running in parallel mode in cluster tejakalva FLUENT 0 February 4, 2015 07:02
OpenFOAM parallel running error in cluster vishal_s OpenFOAM Running, Solving & CFD 5 March 11, 2014 15:11
Running Error using Compressible OpenFoam Parallel mode dhendria OpenFOAM Running, Solving & CFD 0 February 13, 2014 20:53
Free surface boudary conditions with SOLA-VOF Fan Main CFD Forum 10 September 9, 2006 12:24


All times are GMT -4. The time now is 03:49.