CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM

How can I run OpenFOAM to benchmark/compare two environment performance

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   November 28, 2016, 08:44
Wink How can I run OpenFOAM to benchmark/compare two environment performance
  #1
New Member
 
Join Date: Nov 2016
Posts: 7
Rep Power: 5
ZhouYoung is on a distinguished road
Hello everyone, I'm a new comer here, thankyou for your attention.

I was asked to test the performance of OpenFOAM on a cloud cluster! But I have no any experience on these, and I think OpenFoam is a kind of simulate platform just like Matlab. So I run a typical model like pitzDaily or cavity on a cloud cluster and then judge or compare the performance with running time ?

And here I encounter two questions :
1. If I want to run a model (modify one under tutorials), I need to increase the computing work then I could put it on a hundreds CPU cluster. So, if there is some big model under tutorials ? or how could I modify a big enough model as soon as possible?

2. Currently I could run interFoam with foamJob, but I will get error with "/usr/lib64/openmpi/bin/mpirun --hostfile ~/hostfil -np 6 simpleFoam -case /root/OpenFOAM/root-3.0.0/run/pitzDaily -parallel" :
--> FOAM FATAL ERROR:
Cannot read "/root/OpenFOAM/root-3.0.0/run/pitzDaily/system/decomposeParDict"
Any one have idea about this ? Thank you !
ZhouYoung is offline   Reply With Quote

Old   November 28, 2016, 09:41
Default
  #2
Senior Member
 
floquation's Avatar
 
Kevin van As
Join Date: Sep 2014
Location: TU Delft, The Netherlands
Posts: 253
Rep Power: 16
floquation will become famous soon enough
Quote:
Originally Posted by ZhouYoung View Post
I was asked to test the performance of OpenFOAM on a cloud cluster! But I have no any experience on these, and I think OpenFoam is a kind of simulate platform just like Matlab. So I run a typical model like pitzDaily or cavity on a cloud cluster and then judge or compare the performance with running time ?
For benchmarking you'll be interested in the execution time of a given simulation. Accuracy shouldn't matter, as the simulation result should be independent of the hardware, and I reckon you want to compare different hardwares?

Quote:
Originally Posted by ZhouYoung View Post
And here I encounter two questions :
1. If I want to run a model (modify one under tutorials), I need to increase the computing work then I could put it on a hundreds CPU cluster. So, if there is some big model under tutorials ? or how could I modify a big enough model as soon as possible?
Most tutorials are designed to be small as to be run quickly. If you want to increase its runtime, the easiest method is to either force it to perform more iterations (lower the value of "tolerance" in system/fvSolution or lower the value of "maxDeltaT" in system/controlDict as to limit the timestep), or to use a finer mesh (constant/polyMesh/blockMeshDict). Evidently, 3D simulations will require considerably more time than 2D simulations.
The most effective method will be to use a finer mesh.

With such a 'sufficiently fine' mesh, any tutorial will do. For example, the "damBreak" case of the "interFoam" application.

Quote:
Originally Posted by ZhouYoung View Post
2. Currently I could run interFoam with foamJob, but I will get error with "/usr/lib64/openmpi/bin/mpirun --hostfile ~/hostfil -np 6 simpleFoam -case /root/OpenFOAM/root-3.0.0/run/pitzDaily -parallel" :
--> FOAM FATAL ERROR:
Cannot read "/root/OpenFOAM/root-3.0.0/run/pitzDaily/system/decomposeParDict"
Any one have idea about this ? Thank you !
You have to decompose the mesh before running a simulation on multiple processors. Have a look at the following tutorial (especially Sec. 2.3.11 "running in parallel"):
http://cfd.direct/openfoam/user-guid...x7-630002.3.11
floquation is offline   Reply With Quote

Old   November 28, 2016, 15:00
Default
  #3
Senior Member
 
Elvis
Join Date: Mar 2009
Location: Sindelfingen, Germany
Posts: 611
Blog Entries: 5
Rep Power: 20
elvis will become famous soon enough
http://openfoamwiki.net/index.php/Sig_HPC

what about taking testcases from the SIG turbomachinery
http://openfoamwiki.net/index.php/Sig_Turbomachinery

http://openfoamwiki.net/index.php/Si...ion_test_cases
elvis is offline   Reply With Quote

Old   November 29, 2016, 03:13
Default
  #4
New Member
 
Join Date: Nov 2016
Posts: 7
Rep Power: 5
ZhouYoung is on a distinguished road
Quote:
Originally Posted by floquation View Post
For benchmarking you'll be interested in the execution time of a given simulation. Accuracy shouldn't matter, as the simulation result should be independent of the hardware, and I reckon you want to compare different hardwares?
http://cfd.direct/openfoam/user-guid...x7-630002.3.11
Yes, I just mean this! I need a big OpenFOAM job that I can easily compare two environment performance with the finish time!

Quote:
Originally Posted by floquation View Post
Most tutorials are designed to be small as to be run quickly. If you want to increase its runtime, the easiest method is to either force it to perform more iterations (lower the value of "tolerance" in system/fvSolution or lower the value of "maxDeltaT" in system/controlDict as to limit the timestep), or to use a finer mesh (constant/polyMesh/blockMeshDict). Evidently, 3D simulations will require considerably more time than 2D simulations.
The most effective method will be to use a finer mesh.

With such a 'sufficiently fine' mesh, any tutorial will do. For example, the "damBreak" case of the "interFoam" application.
http://cfd.direct/openfoam/user-guid...x7-630002.3.11
Yes, I just tried these (use the dambreak model, change tolerance to 1e-15 and maxDeltaT to 0.00001) , and then the runtime of interFoam on 4CPU extend to 18mins, quite useful and easily to me !

But for your metioned finer mesh, firstly I found there is no constant/polyMesh/blockMeshDict under damBreak, is that the same as system/blockMeshDict ?
Are you mean I add more elements in the blockMeshDict ? or there is some parameter that I could change easily ??

Quote:
Originally Posted by floquation View Post
You have to decompose the mesh before running a simulation on multiple processors. Have a look at the following tutorial (especially Sec. 2.3.11 "running in parallel"):
http://cfd.direct/openfoam/user-guid...x7-630002.3.11
OK, thankyou very much ! Currently I could run the foamJob on two machine, I think I could read the tutorial and the foamJob script to find what's wrong to me !!!
Thank you very much again!
ZhouYoung is offline   Reply With Quote

Old   November 29, 2016, 03:15
Default
  #5
New Member
 
Join Date: Nov 2016
Posts: 7
Rep Power: 5
ZhouYoung is on a distinguished road
These seem a little old, I'm using the OF3.0.0 version, I don't know if these are compatible
ZhouYoung is offline   Reply With Quote

Old   November 29, 2016, 03:30
Default
  #6
Senior Member
 
floquation's Avatar
 
Kevin van As
Join Date: Sep 2014
Location: TU Delft, The Netherlands
Posts: 253
Rep Power: 16
floquation will become famous soon enough
Quote:
Originally Posted by ZhouYoung View Post
But for your metioned finer mesh, firstly I found there is no constant/polyMesh/blockMeshDict under damBreak, is that the same as system/blockMeshDict ?
Are you mean I add more elements in the blockMeshDict ? or there is some parameter that I could change easily ??
It appears that the location of that file was changed in a recent version, so yes, that is the same file.

Find lines like these:
Code:
blocks
(
    hex (0 1 5 4 12 13 17 16) (23 8 1) simpleGrading (1 1 1)
    hex (2 3 7 6 14 15 19 18) (19 8 1) simpleGrading (1 1 1)
    hex (4 5 9 8 16 17 21 20) (23 42 1) simpleGrading (1 1 1)
    hex (5 6 10 9 17 18 22 21) (4 42 1) simpleGrading (1 1 1)
    hex (6 7 11 10 18 19 23 22) (19 42 1) simpleGrading (1 1 1)
);
And increase the bold numbers to use more cells = a finer mesh.
(Note, these numbers are linked: 19+4=23 and 42=42, etc. So if you change something, simply multiply all numbers by 2 for example to retain those properties.)
floquation is offline   Reply With Quote

Old   December 4, 2016, 07:58
Exclamation
  #7
New Member
 
Join Date: Nov 2016
Posts: 7
Rep Power: 5
ZhouYoung is on a distinguished road
Quote:
Originally Posted by floquation View Post
It appears that the location of that file was changed in a recent version, so yes, that is the same file.

Find lines like these:
Code:
blocks
(
    hex (0 1 5 4 12 13 17 16) (23 8 1) simpleGrading (1 1 1)
    hex (2 3 7 6 14 15 19 18) (19 8 1) simpleGrading (1 1 1)
    hex (4 5 9 8 16 17 21 20) (23 42 1) simpleGrading (1 1 1)
    hex (5 6 10 9 17 18 22 21) (4 42 1) simpleGrading (1 1 1)
    hex (6 7 11 10 18 19 23 22) (19 42 1) simpleGrading (1 1 1)
);
And increase the bold numbers to use more cells = a finer mesh.
(Note, these numbers are linked: 19+4=23 and 42=42, etc. So if you change something, simply multiply all numbers by 2 for example to retain those properties.)
OK, I change the blocks as the userguide and using lower tolerance, now it run much longer! Thank you for your patient guide.

But here I have another questions, for the running time with the modified mode:
1. I change the tolerance to 1e-15 and maxDeltaT to 0.00001, firstly I run with the default setting (mpirun -np 4, and simple Coeffs 2 2 1), the result for runing on two VM (2 CPU per VM) sometimes 17mins sometime 25 mins( a large span).
And then, I change it to 12 np (2*6*1), it runs even longer! (about 28mins), and 9 np (3*3*1) runs 17mins too!
Quite strange, no rules to follow! So, what's the benefit for multi cores?

ps : my testing env is connected by Mellanox IB nic, I think it's not a network problem!

2. change the blocks setting with user guide, and then set tolerance to 1e-14.
To this mode, It runs longer than the upper model, and now I'm still testing it with different processor parallel.

3. On the other way, I found a 3D model called motorBike (incompressible/pisoFoam/les/motorBike), and its directory is not very same as the cavity or damBreak, it support a "Allrun" script and run with 6 thread by default! From the Allrun script, I found it's using the runParallel function defined in bin/tools/RunFunctions, So I could easily change the definition of runParallel that I could run the motorBike model with mpi cluster ?
I add a "--hostfile hostfile" to the mpirun line of runParallel function
ZhouYoung is offline   Reply With Quote

Old   December 4, 2016, 07:59
Exclamation testing result strange
  #8
New Member
 
Join Date: Nov 2016
Posts: 7
Rep Power: 5
ZhouYoung is on a distinguished road
Quote:
Originally Posted by floquation View Post
It appears that the location of that file was changed in a recent version, so yes, that is the same file.

Find lines like these:
Code:
blocks
(
    hex (0 1 5 4 12 13 17 16) (23 8 1) simpleGrading (1 1 1)
    hex (2 3 7 6 14 15 19 18) (19 8 1) simpleGrading (1 1 1)
    hex (4 5 9 8 16 17 21 20) (23 42 1) simpleGrading (1 1 1)
    hex (5 6 10 9 17 18 22 21) (4 42 1) simpleGrading (1 1 1)
    hex (6 7 11 10 18 19 23 22) (19 42 1) simpleGrading (1 1 1)
);
And increase the bold numbers to use more cells = a finer mesh.
(Note, these numbers are linked: 19+4=23 and 42=42, etc. So if you change something, simply multiply all numbers by 2 for example to retain those properties.)
OK, I change the blocks as the userguide and using lower tolerance, now it run much longer! Thank you for your patient guide.

But here I have another questions, for the running time with the modified mode:
1. I change the tolerance to 1e-15 and maxDeltaT to 0.00001, firstly I run with the default setting (mpirun -np 4, and simple Coeffs 2 2 1), the result for runing on two VM (2 CPU per VM) sometimes 17mins sometime 25 mins( a large span).
And then, I change it to 12 np (2*6*1), it runs even longer! (about 28mins), and 9 np (3*3*1) runs 17mins too!
Quite strange, no rules to follow! So, what's the benefit for multi cores?

ps : my testing env is connected by Mellanox IB nic, I think it's not a network problem!

2. change the blocks setting with user guide, and then set tolerance to 1e-14.
To this mode, It runs longer than the upper model, and the result goes the same.
default 4 = 2*2 takes 28mins
try 9 = 3*3 takes 38mins
16 = 4*4 takes more than 680mins => I terminated it!

3. On the other way, I found a 3D model called motorBike (incompressible/pisoFoam/les/motorBike), and its directory is not very same as the cavity or damBreak, it support a "Allrun" script and run with 6 thread by default! From the Allrun script, I found it's using the runParallel function defined in bin/tools/RunFunctions, So I could easily change the definition of runParallel that I could run the motorBike model with mpi cluster ?
I add a "--hostfile hostfile" to the mpirun line of runParallel function

Last edited by ZhouYoung; December 4, 2016 at 21:14. Reason: change for the result
ZhouYoung is offline   Reply With Quote

Old   December 5, 2016, 04:13
Default
  #9
Senior Member
 
floquation's Avatar
 
Kevin van As
Join Date: Sep 2014
Location: TU Delft, The Netherlands
Posts: 253
Rep Power: 16
floquation will become famous soon enough
Quote:
Originally Posted by ZhouYoung View Post
1. I change the tolerance to 1e-15 and maxDeltaT to 0.00001, firstly I run with the default setting (mpirun -np 4, and simple Coeffs 2 2 1), the result for runing on two VM (2 CPU per VM) sometimes 17mins sometime 25 mins( a large span).
And then, I change it to 12 np (2*6*1), it runs even longer! (about 28mins), and 9 np (3*3*1) runs 17mins too!
Quite strange, no rules to follow! So, what's the benefit for multi cores?

ps : my testing env is connected by Mellanox IB nic, I think it's not a network problem!
I don't know much about hardware.
In terms of software: you are limiting your max\Delta t to a very small value. It depends on the physics whether this value is indeed small or high, but for most cases it will be small.
As a consequence, I think that during a single timestep virtually nothing changes. Therefore, the processors finish calculating almost instantly and then have to communicate. If this is the case, adding more processors will simply yield to more communication without any added benefits, causing the simulation to be slower.
In other words, you must give the processors more tasks to do to outweigh the time spent on communication.

How to fix:
Set the following in system/controlDict:
Code:
runTimeModifiable yes;
adjustTimeStep  yes;
maxCo           0.5;
maxAlphaCo      0.5;
maxDeltaT       1;
Then OpenFoam will automatically choose the highest possible \Delta t. This will yield more work per timestep and thereby decrease the relative time spent on communication.

Quote:
Originally Posted by ZhouYoung View Post
3. On the other way, I found a 3D model called motorBike (incompressible/pisoFoam/les/motorBike), and its directory is not very same as the cavity or damBreak, it support a "Allrun" script and run with 6 thread by default! From the Allrun script, I found it's using the runParallel function defined in bin/tools/RunFunctions, So I could easily change the definition of runParallel that I could run the motorBike model with mpi cluster ?
I add a "--hostfile hostfile" to the mpirun line of runParallel function
Those are the tutorial run functions. I personally never use them for my own cases - I write my own scripts - but I reckon they could be of use to you.
floquation is offline   Reply With Quote

Old   December 5, 2016, 07:14
Unhappy Still the same , result goes worse with more cores
  #10
New Member
 
Join Date: Nov 2016
Posts: 7
Rep Power: 5
ZhouYoung is on a distinguished road
Quote:
Originally Posted by floquation View Post
I don't know much about hardware.
In terms of software: you are limiting your max\Delta t to a very small value. It depends on the physics whether this value is indeed small or high, but for most cases it will be small.
As a consequence, I think that during a single timestep virtually nothing changes. Therefore, the processors finish calculating almost instantly and then have to communicate. If this is the case, adding more processors will simply yield to more communication without any added benefits, causing the simulation to be slower.
In other words, you must give the processors more tasks to do to outweigh the time spent on communication.
So the problem come back to the model too small again ?
The computing data size is not so much, and it will be better just running locally.

Quote:
Originally Posted by floquation View Post
How to fix:
Set the following in system/controlDict:
Code:
runTimeModifiable yes;
adjustTimeStep  yes;
maxCo           0.5;
maxAlphaCo      0.5;
maxDeltaT       1;
Then OpenFoam will automatically choose the highest possible \Delta t. This will yield more work per timestep and thereby decrease the relative time spent on communication.
Ok, I've tried this on an public cloud:
fvSolution: tolerance change to 1e-14
controlDict : change as above
blockMesh : change as userguide (a little bigger)

AND then I run it on a 32core VM:
mpirun -np 16 (decomposePar set 4*4*1 ) only takes 32s!
mpirun -np 32 (decomposePar set 4*8*1) takes 44s !
and then I run it on 2*32vm cluster with -np 64 (8*8*1) : it takes 11mins! takes more than 20 times as much as 16 core!

So, the changing of \Delta t does not save this phenomenon


Quote:
Originally Posted by floquation View Post
Those are the tutorial run functions. I personally never use them for my own cases - I write my own scripts - but I reckon they could be of use to you.
Ok, I will test this model for checking if the phenomenon still there!
ZhouYoung is offline   Reply With Quote

Old   December 5, 2016, 08:36
Default
  #11
Senior Member
 
floquation's Avatar
 
Kevin van As
Join Date: Sep 2014
Location: TU Delft, The Netherlands
Posts: 253
Rep Power: 16
floquation will become famous soon enough
Quote:
Originally Posted by ZhouYoung View Post
So the problem come back to the model too small again ?
I didn't say that.
I said the workload per timestep is too small, meaning too much communication is required for a parallel computation to be worth it.


Quote:
Originally Posted by ZhouYoung View Post
AND then I run it on a 32core VM:
mpirun -np 16 (decomposePar set 4*4*1 ) only takes 32s!
mpirun -np 32 (decomposePar set 4*8*1) takes 44s !
and then I run it on 2*32vm cluster with -np 64 (8*8*1) : it takes 11mins! takes more than 20 times as much as 16 core!

So, the changing of \Delta t does not save this phenomenon
It does if you'd make the case sufficiently big. Your case runs within a minute - that's way too little of a workload.

Increase the number of cells by a big factor, say 10, while using the controlDict settings for \Delta t I mentioned in my last post. Then the total workload should increase by a factor 1000 for a 2D simulation, which comprises of a factor ~10 more timesteps and a factor ~100 more work per timestep. As you can see, this will increase the work that each processor has to do by a greater factor (x100) than the increase in the number of times they must communicate (x10).
If that doesn't work, use a yet bigger factor.

In a 3D case this is yet more effective, as refining your mesh by a factor 10 will mean a factor 1000 times more work per communication step. If your case requires that much of work, it becomes beneficial to use more processors, as the time spent communicating becomes (in a relative sense) smaller and smaller.
In your case, I reckon some processors are doing work on as little as 10 cells, instead of 1,000,000 cells.
floquation is offline   Reply With Quote

Old   December 7, 2016, 10:01
Default
  #12
New Member
 
Join Date: Nov 2016
Posts: 7
Rep Power: 5
ZhouYoung is on a distinguished road
Quote:
Originally Posted by floquation View Post
I didn't say that.
I said the workload per timestep is too small, meaning too much communication is required for a parallel computation to be worth it.
OK, you mean, my model only change the tolerance, but the computing task and scale i small, even the communication cost more CPU power than the computing itself!


Quote:
Originally Posted by floquation View Post
It does if you'd make the case sufficiently big. Your case runs within a minute - that's way too little of a workload.

Increase the number of cells by a big factor, say 10, while using the controlDict settings for \Delta t I mentioned in my last post. Then the total workload should increase by a factor 1000 for a 2D simulation, which comprises of a factor ~10 more timesteps and a factor ~100 more work per timestep. As you can see, this will increase the work that each processor has to do by a greater factor (x100) than the increase in the number of times they must communicate (x10).
If that doesn't work, use a yet bigger factor.
take the default blockMesh as example:
Code:
blocks
(
    hex (0 1 5 4 12 13 17 16) (23 8 1) simpleGrading (1 1 1)
    hex (2 3 7 6 14 15 19 18) (19 8 1) simpleGrading (1 1 1)
    hex (4 5 9 8 16 17 21 20) (23 42 1) simpleGrading (1 1 1)
    hex (5 6 10 9 17 18 22 21) (4 42 1) simpleGrading (1 1 1)
    hex (6 7 11 10 18 19 23 22) (19 42 1) simpleGrading (1 1 1)
);
So, I should add more blocks ? If I modify the bold numbers below, does that mean the cell increase to 10*10 = 100 times? SO I get a bigger factor?

Code:
blocks
(
    hex (0 1 5 4 12 13 17 16) (230 80 1) simpleGrading (1 1 1)
    hex (2 3 7 6 14 15 19 18) (190 80 1) simpleGrading (1 1 1)
    hex (4 5 9 8 16 17 21 20) (230 420 1) simpleGrading (1 1 1)
    hex (5 6 10 9 17 18 22 21) (40 420 1) simpleGrading (1 1 1)
    hex (6 7 11 10 18 19 23 22) (190 420 1) simpleGrading (1 1 1)
);
Quote:
Originally Posted by floquation View Post
In a 3D case this is yet more effective, as refining your mesh by a factor 10 will mean a factor 1000 times more work per communication step. If your case requires that much of work, it becomes beneficial to use more processors, as the time spent communicating becomes (in a relative sense) smaller and smaller.
In your case, I reckon some processors are doing work on as little as 10 cells, instead of 1,000,000 cells.
Here I still working on the tutorials/incompressible/simpleFoam/motorBike model! besides, I also want to get larger factor with motorBike MODEL. is the changing above is OK ?
After I modify the blocks phases 10 times to the ux,uy,uz, motorBike running time increase from 6mins to 70mins on a 4U8G VM.
ZhouYoung is offline   Reply With Quote

Reply

Tags
error, mpirun, performance testing

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
OpenFOAM Training, London, Chicago, Munich, Sep-Oct 2015 cfd.direct OpenFOAM Announcements from Other Sources 2 August 31, 2015 13:36
OpenFOAM doesn't run in parallel callumso OpenFOAM Running, Solving & CFD 0 July 11, 2013 12:17
OpenFoam installation environment error (using CAELinux) vahid_paris OpenFOAM Installation 1 November 22, 2011 15:11
how to add OpenFoam in module environment smpark OpenFOAM 4 May 12, 2011 14:00
CFX11 + Fortran compiler ? Mohan CFX 20 March 30, 2011 18:56


All times are GMT -4. The time now is 09:50.