How can I run OpenFOAM to benchmark/compare two environment performance
Hello everyone, I'm a new comer here, thankyou for your attention.
I was asked to test the performance of OpenFOAM on a cloud cluster! But I have no any experience on these, and I think OpenFoam is a kind of simulate platform just like Matlab. So I run a typical model like pitzDaily or cavity on a cloud cluster and then judge or compare the performance with running time ? And here I encounter two questions : 1. If I want to run a model (modify one under tutorials), I need to increase the computing work then I could put it on a hundreds CPU cluster. So, if there is some big model under tutorials ? or how could I modify a big enough model as soon as possible? 2. Currently I could run interFoam with foamJob, but I will get error with "/usr/lib64/openmpi/bin/mpirun --hostfile ~/hostfil -np 6 simpleFoam -case /root/OpenFOAM/root-3.0.0/run/pitzDaily -parallel" : --> FOAM FATAL ERROR: Cannot read "/root/OpenFOAM/root-3.0.0/run/pitzDaily/system/decomposeParDict" Any one have idea about this ? Thank you ! |
Quote:
Quote:
The most effective method will be to use a finer mesh. With such a 'sufficiently fine' mesh, any tutorial will do. For example, the "damBreak" case of the "interFoam" application. Quote:
http://cfd.direct/openfoam/user-guid...x7-630002.3.11 |
http://openfoamwiki.net/index.php/Sig_HPC
what about taking testcases from the SIG turbomachinery http://openfoamwiki.net/index.php/Sig_Turbomachinery http://openfoamwiki.net/index.php/Si...ion_test_cases |
Quote:
Quote:
But for your metioned finer mesh, firstly I found there is no constant/polyMesh/blockMeshDict under damBreak, is that the same as system/blockMeshDict ? Are you mean I add more elements in the blockMeshDict ? or there is some parameter that I could change easily ?? Quote:
Thank you very much again! |
Quote:
|
Quote:
Find lines like these: Code:
blocks (Note, these numbers are linked: 19+4=23 and 42=42, etc. So if you change something, simply multiply all numbers by 2 for example to retain those properties.) |
Quote:
But here I have another questions, for the running time with the modified mode: 1. I change the tolerance to 1e-15 and maxDeltaT to 0.00001, firstly I run with the default setting (mpirun -np 4, and simple Coeffs 2 2 1), the result for runing on two VM (2 CPU per VM) sometimes 17mins sometime 25 mins( a large span:confused:). And then, I change it to 12 np (2*6*1), it runs even longer! (about 28mins), and 9 np (3*3*1) runs 17mins too! Quite strange, no rules to follow! So, what's the benefit for multi cores? ps : my testing env is connected by Mellanox IB nic, I think it's not a network problem! 2. change the blocks setting with user guide, and then set tolerance to 1e-14. To this mode, It runs longer than the upper model, and now I'm still testing it with different processor parallel. :rolleyes: 3. On the other way, I found a 3D model called motorBike (incompressible/pisoFoam/les/motorBike), and its directory is not very same as the cavity or damBreak, it support a "Allrun" script and run with 6 thread by default! From the Allrun script, I found it's using the runParallel function defined in bin/tools/RunFunctions, So I could easily change the definition of runParallel that I could run the motorBike model with mpi cluster ? I add a "--hostfile hostfile" to the mpirun line of runParallel function :p |
testing result strange
Quote:
But here I have another questions, for the running time with the modified mode: 1. I change the tolerance to 1e-15 and maxDeltaT to 0.00001, firstly I run with the default setting (mpirun -np 4, and simple Coeffs 2 2 1), the result for runing on two VM (2 CPU per VM) sometimes 17mins sometime 25 mins( a large span:confused:). And then, I change it to 12 np (2*6*1), it runs even longer! (about 28mins), and 9 np (3*3*1) runs 17mins too! Quite strange, no rules to follow! So, what's the benefit for multi cores? ps : my testing env is connected by Mellanox IB nic, I think it's not a network problem! 2. change the blocks setting with user guide, and then set tolerance to 1e-14. To this mode, It runs longer than the upper model, and the result goes the same. :rolleyes: default 4 = 2*2 takes 28mins try 9 = 3*3 takes 38mins 16 = 4*4 takes more than 680mins => I terminated it! 3. On the other way, I found a 3D model called motorBike (incompressible/pisoFoam/les/motorBike), and its directory is not very same as the cavity or damBreak, it support a "Allrun" script and run with 6 thread by default! From the Allrun script, I found it's using the runParallel function defined in bin/tools/RunFunctions, So I could easily change the definition of runParallel that I could run the motorBike model with mpi cluster ? I add a "--hostfile hostfile" to the mpirun line of runParallel function :p |
Still the same , result goes worse with more cores
Quote:
The computing data size is not so much, and it will be better just running locally. Quote:
fvSolution: tolerance change to 1e-14 controlDict : change as above blockMesh : change as userguide (a little bigger) AND then I run it on a 32core VM: mpirun -np 16 (decomposePar set 4*4*1 ) only takes 32s! mpirun -np 32 (decomposePar set 4*8*1) takes 44s ! and then I run it on 2*32vm cluster with -np 64 (8*8*1) : it takes 11mins! takes more than 20 times as much as 16 core! So, the changing of does not save this phenomenon:( Quote:
|
Quote:
I said the workload per timestep is too small, meaning too much communication is required for a parallel computation to be worth it. Quote:
Increase the number of cells by a big factor, say 10, while using the controlDict settings for I mentioned in my last post. Then the total workload should increase by a factor 1000 for a 2D simulation, which comprises of a factor ~10 more timesteps and a factor ~100 more work per timestep. As you can see, this will increase the work that each processor has to do by a greater factor (x100) than the increase in the number of times they must communicate (x10). If that doesn't work, use a yet bigger factor. In a 3D case this is yet more effective, as refining your mesh by a factor 10 will mean a factor 1000 times more work per communication step. If your case requires that much of work, it becomes beneficial to use more processors, as the time spent communicating becomes (in a relative sense) smaller and smaller. In your case, I reckon some processors are doing work on as little as 10 cells, instead of 1,000,000 cells. |
Quote:
Quote:
Code:
blocks Code:
blocks Quote:
After I modify the blocks phases 10 times to the ux,uy,uz, motorBike running time increase from 6mins to 70mins on a 4U8G VM. |
All times are GMT -4. The time now is 01:08. |