How can I run OpenFOAM to benchmark/compare two environment performance

ZhouYoung · November 28, 2016, 08:44

Hello everyone, I'm a new comer here, thankyou for your attention.

I was asked to test the performance of OpenFOAM on a cloud cluster! But I have no any experience on these, and I think OpenFoam is a kind of simulate platform just like Matlab. So I run a typical model like pitzDaily or cavity on a cloud cluster and then judge or compare the performance with running time ?

And here I encounter two questions :
1. If I want to run a model (modify one under tutorials), I need to increase the computing work then I could put it on a hundreds CPU cluster. So, if there is some big model under tutorials ? or how could I modify a big enough model as soon as possible?

2. Currently I could run interFoam with foamJob, but I will get error with "/usr/lib64/openmpi/bin/mpirun --hostfile ~/hostfil -np 6 simpleFoam -case /root/OpenFOAM/root-3.0.0/run/pitzDaily -parallel" :
--> FOAM FATAL ERROR:
Cannot read "/root/OpenFOAM/root-3.0.0/run/pitzDaily/system/decomposeParDict"
Any one have idea about this ? Thank you !

floquation · November 28, 2016, 09:41

Quote:

Originally Posted by ZhouYoung

I was asked to test the performance of OpenFOAM on a cloud cluster! But I have no any experience on these, and I think OpenFoam is a kind of simulate platform just like Matlab. So I run a typical model like pitzDaily or cavity on a cloud cluster and then judge or compare the performance with running time ?

For benchmarking you'll be interested in the execution time of a given simulation. Accuracy shouldn't matter, as the simulation result should be independent of the hardware, and I reckon you want to compare different hardwares?

Quote:

Originally Posted by ZhouYoung

And here I encounter two questions :
1. If I want to run a model (modify one under tutorials), I need to increase the computing work then I could put it on a hundreds CPU cluster. So, if there is some big model under tutorials ? or how could I modify a big enough model as soon as possible?

Most tutorials are designed to be small as to be run quickly. If you want to increase its runtime, the easiest method is to either force it to perform more iterations (lower the value of "tolerance" in system/fvSolution or lower the value of "maxDeltaT" in system/controlDict as to limit the timestep), or to use a finer mesh (constant/polyMesh/blockMeshDict). Evidently, 3D simulations will require considerably more time than 2D simulations.
The most effective method will be to use a finer mesh.

With such a 'sufficiently fine' mesh, any tutorial will do. For example, the "damBreak" case of the "interFoam" application.

Quote:

Originally Posted by ZhouYoung

2. Currently I could run interFoam with foamJob, but I will get error with "/usr/lib64/openmpi/bin/mpirun --hostfile ~/hostfil -np 6 simpleFoam -case /root/OpenFOAM/root-3.0.0/run/pitzDaily -parallel" :
--> FOAM FATAL ERROR:
Cannot read "/root/OpenFOAM/root-3.0.0/run/pitzDaily/system/decomposeParDict"
Any one have idea about this ? Thank you !

You have to decompose the mesh before running a simulation on multiple processors. Have a look at the following tutorial (especially Sec. 2.3.11 "running in parallel"):
http://cfd.direct/openfoam/user-guid...x7-630002.3.11

elvis · November 28, 2016, 15:00

http://openfoamwiki.net/index.php/Sig_HPC

what about taking testcases from the SIG turbomachinery
http://openfoamwiki.net/index.php/Sig_Turbomachinery

http://openfoamwiki.net/index.php/Si...ion_test_cases

ZhouYoung · November 29, 2016, 03:13

Quote:

Originally Posted by floquation

For benchmarking you'll be interested in the execution time of a given simulation. Accuracy shouldn't matter, as the simulation result should be independent of the hardware, and I reckon you want to compare different hardwares?
http://cfd.direct/openfoam/user-guid...x7-630002.3.11

Yes, I just mean this! I need a big OpenFOAM job that I can easily compare two environment performance with the finish time!

Quote:

Originally Posted by floquation

Most tutorials are designed to be small as to be run quickly. If you want to increase its runtime, the easiest method is to either force it to perform more iterations (lower the value of "tolerance" in system/fvSolution or lower the value of "maxDeltaT" in system/controlDict as to limit the timestep), or to use a finer mesh (constant/polyMesh/blockMeshDict). Evidently, 3D simulations will require considerably more time than 2D simulations.
The most effective method will be to use a finer mesh.

With such a 'sufficiently fine' mesh, any tutorial will do. For example, the "damBreak" case of the "interFoam" application.
http://cfd.direct/openfoam/user-guid...x7-630002.3.11

Yes, I just tried these (use the dambreak model, change tolerance to 1e-15 and maxDeltaT to 0.00001) , and then the runtime of interFoam on 4CPU extend to 18mins, quite useful and easily to me !

But for your metioned finer mesh, firstly I found there is no constant/polyMesh/blockMeshDict under damBreak, is that the same as system/blockMeshDict ?
Are you mean I add more elements in the blockMeshDict ? or there is some parameter that I could change easily ??

Quote:

Originally Posted by floquation

You have to decompose the mesh before running a simulation on multiple processors. Have a look at the following tutorial (especially Sec. 2.3.11 "running in parallel"):
http://cfd.direct/openfoam/user-guid...x7-630002.3.11

OK, thankyou very much ! Currently I could run the foamJob on two machine, I think I could read the tutorial and the foamJob script to find what's wrong to me !!!
Thank you very much again！

ZhouYoung · November 29, 2016, 03:15

Quote:

Originally Posted by elvis

http://openfoamwiki.net/index.php/Sig_HPC

what about taking testcases from the SIG turbomachinery
http://openfoamwiki.net/index.php/Sig_Turbomachinery

http://openfoamwiki.net/index.php/Si...ion_test_cases

These seem a little old, I'm using the OF3.0.0 version, I don't know if these are compatible

floquation · November 29, 2016, 03:30

Quote:

Originally Posted by ZhouYoung

But for your metioned finer mesh, firstly I found there is no constant/polyMesh/blockMeshDict under damBreak, is that the same as system/blockMeshDict ?
Are you mean I add more elements in the blockMeshDict ? or there is some parameter that I could change easily ??

It appears that the location of that file was changed in a recent version, so yes, that is the same file.

Find lines like these:

Code:

blocks
(
    hex (0 1 5 4 12 13 17 16) (23 8 1) simpleGrading (1 1 1)
    hex (2 3 7 6 14 15 19 18) (19 8 1) simpleGrading (1 1 1)
    hex (4 5 9 8 16 17 21 20) (23 42 1) simpleGrading (1 1 1)
    hex (5 6 10 9 17 18 22 21) (4 42 1) simpleGrading (1 1 1)
    hex (6 7 11 10 18 19 23 22) (19 42 1) simpleGrading (1 1 1)
);

And increase the bold numbers to use more cells = a finer mesh.
(Note, these numbers are linked: 19+4=23 and 42=42, etc. So if you change something, simply multiply all numbers by 2 for example to retain those properties.)

ZhouYoung · December 4, 2016, 07:58

Quote:

Originally Posted by floquation

It appears that the location of that file was changed in a recent version, so yes, that is the same file.

Find lines like these:

Code:

blocks
(
    hex (0 1 5 4 12 13 17 16) (23 8 1) simpleGrading (1 1 1)
    hex (2 3 7 6 14 15 19 18) (19 8 1) simpleGrading (1 1 1)
    hex (4 5 9 8 16 17 21 20) (23 42 1) simpleGrading (1 1 1)
    hex (5 6 10 9 17 18 22 21) (4 42 1) simpleGrading (1 1 1)
    hex (6 7 11 10 18 19 23 22) (19 42 1) simpleGrading (1 1 1)
);

And increase the bold numbers to use more cells = a finer mesh.
(Note, these numbers are linked: 19+4=23 and 42=42, etc. So if you change something, simply multiply all numbers by 2 for example to retain those properties.)

OK, I change the blocks as the userguide and using lower tolerance, now it run much longer! Thank you for your patient guide.

But here I have another questions, for the running time with the modified mode:
1. I change the tolerance to 1e-15 and maxDeltaT to 0.00001, firstly I run with the default setting (mpirun -np 4, and simple Coeffs 2 2 1), the result for runing on two VM （2 CPU per VM） sometimes 17mins sometime 25 mins（ a large span

）.
And then, I change it to 12 np (2*6*1), it runs even longer! (about 28mins), and 9 np (3*3*1) runs 17mins too!
Quite strange, no rules to follow! So, what's the benefit for multi cores?

ps : my testing env is connected by Mellanox IB nic, I think it's not a network problem!

2. change the blocks setting with user guide, and then set tolerance to 1e-14.
To this mode, It runs longer than the upper model, and now I'm still testing it with different processor parallel.

3. On the other way, I found a 3D model called motorBike (incompressible/pisoFoam/les/motorBike), and its directory is not very same as the cavity or damBreak, it support a "Allrun" script and run with 6 thread by default! From the Allrun script, I found it's using the runParallel function defined in bin/tools/RunFunctions, So I could easily change the definition of runParallel that I could run the motorBike model with mpi cluster ?
I add a "--hostfile hostfile" to the mpirun line of runParallel function

ZhouYoung · December 4, 2016, 07:59

Quote:

Originally Posted by floquation

It appears that the location of that file was changed in a recent version, so yes, that is the same file.

Find lines like these:

Code:

blocks
(
    hex (0 1 5 4 12 13 17 16) (23 8 1) simpleGrading (1 1 1)
    hex (2 3 7 6 14 15 19 18) (19 8 1) simpleGrading (1 1 1)
    hex (4 5 9 8 16 17 21 20) (23 42 1) simpleGrading (1 1 1)
    hex (5 6 10 9 17 18 22 21) (4 42 1) simpleGrading (1 1 1)
    hex (6 7 11 10 18 19 23 22) (19 42 1) simpleGrading (1 1 1)
);

And increase the bold numbers to use more cells = a finer mesh.
(Note, these numbers are linked: 19+4=23 and 42=42, etc. So if you change something, simply multiply all numbers by 2 for example to retain those properties.)

OK, I change the blocks as the userguide and using lower tolerance, now it run much longer! Thank you for your patient guide.

But here I have another questions, for the running time with the modified mode:
1. I change the tolerance to 1e-15 and maxDeltaT to 0.00001, firstly I run with the default setting (mpirun -np 4, and simple Coeffs 2 2 1), the result for runing on two VM （2 CPU per VM） sometimes 17mins sometime 25 mins（ a large span

）.
And then, I change it to 12 np (2*6*1), it runs even longer! (about 28mins), and 9 np (3*3*1) runs 17mins too!
Quite strange, no rules to follow! So, what's the benefit for multi cores?

ps : my testing env is connected by Mellanox IB nic, I think it's not a network problem!

2. change the blocks setting with user guide, and then set tolerance to 1e-14.
To this mode, It runs longer than the upper model, and the result goes the same.

default 4 = 2*2 takes 28mins
try 9 = 3*3 takes 38mins
16 = 4*4 takes more than 680mins => I terminated it!

3. On the other way, I found a 3D model called motorBike (incompressible/pisoFoam/les/motorBike), and its directory is not very same as the cavity or damBreak, it support a "Allrun" script and run with 6 thread by default! From the Allrun script, I found it's using the runParallel function defined in bin/tools/RunFunctions, So I could easily change the definition of runParallel that I could run the motorBike model with mpi cluster ?
I add a "--hostfile hostfile" to the mpirun line of runParallel function

floquation · December 5, 2016, 04:13

Quote:

Originally Posted by ZhouYoung

1. I change the tolerance to 1e-15 and maxDeltaT to 0.00001, firstly I run with the default setting (mpirun -np 4, and simple Coeffs 2 2 1), the result for runing on two VM （2 CPU per VM） sometimes 17mins sometime 25 mins（ a large span

）.
And then, I change it to 12 np (2*6*1), it runs even longer! (about 28mins), and 9 np (3*3*1) runs 17mins too!
Quite strange, no rules to follow! So, what's the benefit for multi cores?

ps : my testing env is connected by Mellanox IB nic, I think it's not a network problem!

I don't know much about hardware.
In terms of software: you are limiting your max $\Delta t$ to a very small value. It depends on the physics whether this value is indeed small or high, but for most cases it will be small.
As a consequence, I think that during a single timestep virtually nothing changes. Therefore, the processors finish calculating almost instantly and then have to communicate. If this is the case, adding more processors will simply yield to more communication without any added benefits, causing the simulation to be slower.
In other words, you must give the processors more tasks to do to outweigh the time spent on communication.

How to fix:
Set the following in system/controlDict:

Code:

runTimeModifiable yes;
adjustTimeStep  yes;
maxCo           0.5;
maxAlphaCo      0.5;
maxDeltaT       1;

Then OpenFoam will automatically choose the highest possible $\Delta t$ . This will yield more work per timestep and thereby decrease the relative time spent on communication.

Quote:

Originally Posted by ZhouYoung

3. On the other way, I found a 3D model called motorBike (incompressible/pisoFoam/les/motorBike), and its directory is not very same as the cavity or damBreak, it support a "Allrun" script and run with 6 thread by default! From the Allrun script, I found it's using the runParallel function defined in bin/tools/RunFunctions, So I could easily change the definition of runParallel that I could run the motorBike model with mpi cluster ?
I add a "--hostfile hostfile" to the mpirun line of runParallel function

Those are the tutorial run functions. I personally never use them for my own cases - I write my own scripts - but I reckon they could be of use to you.

ZhouYoung · December 5, 2016, 07:14

Quote:

Originally Posted by floquation

I don't know much about hardware.
In terms of software: you are limiting your max $\Delta t$ to a very small value. It depends on the physics whether this value is indeed small or high, but for most cases it will be small.
As a consequence, I think that during a single timestep virtually nothing changes. Therefore, the processors finish calculating almost instantly and then have to communicate. If this is the case, adding more processors will simply yield to more communication without any added benefits, causing the simulation to be slower.
In other words, you must give the processors more tasks to do to outweigh the time spent on communication.

So the problem come back to the model too small again ?
The computing data size is not so much, and it will be better just running locally.

Quote:

Originally Posted by floquation

How to fix:
Set the following in system/controlDict:

Code:

runTimeModifiable yes;
adjustTimeStep  yes;
maxCo           0.5;
maxAlphaCo      0.5;
maxDeltaT       1;

Then OpenFoam will automatically choose the highest possible $\Delta t$ . This will yield more work per timestep and thereby decrease the relative time spent on communication.

Ok, I've tried this on an public cloud:
fvSolution: tolerance change to 1e-14
controlDict : change as above
blockMesh : change as userguide (a little bigger)

AND then I run it on a 32core VM:
mpirun -np 16 (decomposePar set 4*4*1 ) only takes 32s!
mpirun -np 32 (decomposePar set 4*8*1) takes 44s !
and then I run it on 2*32vm cluster with -np 64 (8*8*1) : it takes 11mins! takes more than 20 times as much as 16 core!

So, the changing of $\Delta t$ does not save this phenomenon

Quote:

Originally Posted by floquation

Those are the tutorial run functions. I personally never use them for my own cases - I write my own scripts - but I reckon they could be of use to you.

Ok, I will test this model for checking if the phenomenon still there!

floquation · December 5, 2016, 08:36

Quote:

Originally Posted by ZhouYoung

So the problem come back to the model too small again ?

I didn't say that.
I said the workload per timestep is too small, meaning too much communication is required for a parallel computation to be worth it.

Quote:

Originally Posted by ZhouYoung

AND then I run it on a 32core VM:
mpirun -np 16 (decomposePar set 4*4*1 ) only takes 32s!
mpirun -np 32 (decomposePar set 4*8*1) takes 44s !
and then I run it on 2*32vm cluster with -np 64 (8*8*1) : it takes 11mins! takes more than 20 times as much as 16 core!

So, the changing of $\Delta t$ does not save this phenomenon

It does if you'd make the case sufficiently big. Your case runs within a minute - that's way too little of a workload.

Increase the number of cells by a big factor, say 10, while using the controlDict settings for $\Delta t$ I mentioned in my last post. Then the total workload should increase by a factor 1000 for a 2D simulation, which comprises of a factor ~10 more timesteps and a factor ~100 more work per timestep. As you can see, this will increase the work that each processor has to do by a greater factor (x100) than the increase in the number of times they must communicate (x10).
If that doesn't work, use a yet bigger factor.

In a 3D case this is yet more effective, as refining your mesh by a factor 10 will mean a factor 1000 times more work per communication step. If your case requires that much of work, it becomes beneficial to use more processors, as the time spent communicating becomes (in a relative sense) smaller and smaller.
In your case, I reckon some processors are doing work on as little as 10 cells, instead of 1,000,000 cells.

ZhouYoung · December 7, 2016, 10:01

Quote:

Originally Posted by floquation

I didn't say that.
I said the workload per timestep is too small, meaning too much communication is required for a parallel computation to be worth it.

OK, you mean, my model only change the tolerance, but the computing task and scale i small, even the communication cost more CPU power than the computing itself!

Quote:

Originally Posted by floquation

It does if you'd make the case sufficiently big. Your case runs within a minute - that's way too little of a workload.

Increase the number of cells by a big factor, say 10, while using the controlDict settings for $\Delta t$ I mentioned in my last post. Then the total workload should increase by a factor 1000 for a 2D simulation, which comprises of a factor ~10 more timesteps and a factor ~100 more work per timestep. As you can see, this will increase the work that each processor has to do by a greater factor (x100) than the increase in the number of times they must communicate (x10).
If that doesn't work, use a yet bigger factor.

take the default blockMesh as example:

Code:

blocks
(
    hex (0 1 5 4 12 13 17 16) (23 8 1) simpleGrading (1 1 1)
    hex (2 3 7 6 14 15 19 18) (19 8 1) simpleGrading (1 1 1)
    hex (4 5 9 8 16 17 21 20) (23 42 1) simpleGrading (1 1 1)
    hex (5 6 10 9 17 18 22 21) (4 42 1) simpleGrading (1 1 1)
    hex (6 7 11 10 18 19 23 22) (19 42 1) simpleGrading (1 1 1)
);

So, I should add more blocks ? If I modify the bold numbers below, does that mean the cell increase to 10*10 = 100 times? SO I get a bigger factor?

Code:

blocks
(
    hex (0 1 5 4 12 13 17 16) (230 80 1) simpleGrading (1 1 1)
    hex (2 3 7 6 14 15 19 18) (190 80 1) simpleGrading (1 1 1)
    hex (4 5 9 8 16 17 21 20) (230 420 1) simpleGrading (1 1 1)
    hex (5 6 10 9 17 18 22 21) (40 420 1) simpleGrading (1 1 1)
    hex (6 7 11 10 18 19 23 22) (190 420 1) simpleGrading (1 1 1)
);

Quote:

Originally Posted by floquation

In a 3D case this is yet more effective, as refining your mesh by a factor 10 will mean a factor 1000 times more work per communication step. If your case requires that much of work, it becomes beneficial to use more processors, as the time spent communicating becomes (in a relative sense) smaller and smaller.
In your case, I reckon some processors are doing work on as little as 10 cells, instead of 1,000,000 cells.

Here I still working on the tutorials/incompressible/simpleFoam/motorBike model! besides, I also want to get larger factor with motorBike MODEL. is the changing above is OK ?
After I modify the blocks phases 10 times to the ux,uy,uz, motorBike running time increase from 6mins to 70mins on a 4U8G VM.

November 28, 2016, 08:44	How can I run OpenFOAM to benchmark/compare two environment performance	#1
ZhouYoung New Member Join Date: Nov 2016 Posts: 7 Rep Power: 9	Hello everyone, I'm a new comer here, thankyou for your attention. I was asked to test the performance of OpenFOAM on a cloud cluster! But I have no any experience on these, and I think OpenFoam is a kind of simulate platform just like Matlab. So I run a typical model like pitzDaily or cavity on a cloud cluster and then judge or compare the performance with running time ? And here I encounter two questions : 1. If I want to run a model (modify one under tutorials), I need to increase the computing work then I could put it on a hundreds CPU cluster. So, if there is some big model under tutorials ? or how could I modify a big enough model as soon as possible? 2. Currently I could run interFoam with foamJob, but I will get error with "/usr/lib64/openmpi/bin/mpirun --hostfile ~/hostfil -np 6 simpleFoam -case /root/OpenFOAM/root-3.0.0/run/pitzDaily -parallel" : --> FOAM FATAL ERROR: Cannot read "/root/OpenFOAM/root-3.0.0/run/pitzDaily/system/decomposeParDict" Any one have idea about this ? Thank you !

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
OpenFOAM Training, London, Chicago, Munich, Sep-Oct 2015	cfd.direct	OpenFOAM Announcements from Other Sources	2	August 31, 2015 13:36
OpenFOAM doesn't run in parallel	callumso	OpenFOAM Running, Solving & CFD	0	July 11, 2013 12:17
OpenFoam installation environment error (using CAELinux)	vahid_paris	OpenFOAM Installation	1	November 22, 2011 15:11
how to add OpenFoam in module environment	smpark	OpenFOAM	4	May 12, 2011 14:00
CFX11 + Fortran compiler ?	Mohan	CFX	20	March 30, 2011 18:56

November 28, 2016, 15:00		#3
elvis Senior Member Elvis Join Date: Mar 2009 Location: Sindelfingen, Germany Posts: 620 Blog Entries: 6 Rep Power: 24	http://openfoamwiki.net/index.php/Sig_HPC what about taking testcases from the SIG turbomachinery http://openfoamwiki.net/index.php/Sig_Turbomachinery http://openfoamwiki.net/index.php/Si...ion_test_cases