interFoam - Not getting better performance on parallel run

ctvanrees · August 1, 2021, 15:14

Dear Foamers,

I am relatively new to OpenFOAM and I was doing some validation test case using interFoam. My objective is to measure the fluid elevation at a specific point of space and compare it to experimental results. I am using a DynamicMesh approach, since I want to refine the regions where alpha.water is between 0.99 and 0.01, and get a better representation of the free surface. My problem is very similar to a dam brake problem but considering a slurry (low-concentration) for the studied fluid, using a non-Newtonian model (Bingham plastic, yield stress 2.5Pa and plastic viscosity 0.15Pa*s).

My first question is: I am not getting a better performance by increasing the core count for my mesh. I have tried with 6, 8 and 16 cores, and I am getting similar results, which is pretty slow (like 20 days to finish 15 seconds of simulation). I have a initial cell count of around 350,000 cells. Is there a way I can increase the parallel performance for interFoam? I am using wrong schemes/solvers? Attached are my case files.

Also, how can I improve the quality of my results? I am already using a maximum Co number of 0.05. I am also following the recommendations for interFoam shown on this paper: https://www.tandfonline.com/doi/full...0.2019.1609713

Attached you can see a graph of my preliminar results vs experimental data (took something like 10 days to complete). Green line are OpenFOAM results, blue dots is experiment.

Also attached is a snapshot of the model.

Thanks again for any insight and/or recommendation.

Regards,
CVP

geth03 · August 2, 2021, 07:39

when i start a new simulation with interfoam i start with easy basic stuff before going into complexity.

1. before using dynamic mesh, can't you use a static one to see if your case even runs? for 350k cells, you should be able to achieve speed up with 6 or more cores relatively easily.

2. use a higher Courant number than 0.05. this value is really low. 0.25 would be 5 times faster, not that accurate but a good value, too.
do you want to wait 5 times more or achieve a result faster with a few % deviation?

ctvanrees · August 2, 2021, 16:20

geth03,

Thank you for your reply. I have tried using static mesh, and also not being able to achieve better performance with more cores. Could be because my Co number is set too low? I will try if I get scaling by using Co=0.25 and the 350K cells case (simplest case). How much time, roughly, should I expect with these settings?

I should find a good trade-off between time and quality, in order to achieve "reasonable" results in a "reasonable" amount of time (no more than 3 days of computing).

My case runs and it is stable, the problem is that does not scale, and it is very slow. I was thinking that maybe one of the solution schemes I am using is one of my "bottlenecks".

Let me know what you think. I will get back with my results using higher Co number and static mesh.

Regards,
CVP

geth03 · August 3, 2021, 03:08

speed up with core size does not depend on the Courant number,
the Co will just set the time step, but the number of equations to be solved and the complexity of the matrix and therefor iterations needed is dependant on the cell size, quality and residuals.

you can easily check cell quality of a static mesh with checkMesh.
could you show results from the terminal?

i can't tell you out of my mind how to set the Co to get results within 3 days,
check the terminal output for the timestep and how much time is needed for one iteration, then calculate how many iterations you would need to finish your simulation. adjust your Co accordingly to get faster result, you should definitely not go over Co=1, the lower the Co the more accurate your result.

Yann · August 3, 2021, 03:25

Hi CVP,

I see you are using the yPlus function object with a writeControl set to timeStep but no writeInterval. This should lead to writing the yPlus field on every timeStep and it could significantly slow down your simulation.

Try to switch you yPlus writeControl from timeStep to writeTime and see if it improves your calculation time.

Yann

ctvanrees · August 3, 2021, 13:08

geth03,

Attached is my log by running checkMesh. You are right, the scaling is not dependent of the Co number. It looks like Co = 0.25 gives a reasonable amount of time to compute. I have to check if this values gives me reasonable results.

Also, I did a test run using 4 cores for 0.45s of simulation and got 1139s of clock time and 1322s using 8 cores (so actually less time using less cores...). Maybe is the way I am implementing the solver for alpha.water? (see my fvSolution attached in my first post).

Yenn,

Thanks for your suggestion. I have tried doing the modification you said and obtained the following results (for 350K cells):
-4 core for 0.45s of simulation: 1146s
-8 core for 0.45s of simulation: 1113s

So, I did not have a significant improvement, and also, I did not get a considerable better performance on more cores.

Regards,
CVP

randolph · August 7, 2021, 18:58

Chris,

What is your time step size?

From the snapshot, I would recommend you mute the surface tension calculation (sigma in the transport properties). Let me know if this helps.

Thanks,
Rdf

ctvanrees · August 8, 2021, 13:54

randolph,

My time step size is controlled by the Co number, so it is set automatically so that Co < 0.25. My initial time step is 1e-5.

How could I mute the surface tension calculation? From my understanding, it is not possible to do this. Maybe I could set the surface tension to 0?

Thanks for your reply,
CVP

randolph · August 8, 2021, 18:43

Chris,

Try this and see whether this will accelerate your simulation.

Code:

sigma           0;

Thanks,
Rdf

ctvanrees · August 10, 2021, 22:50

randolph,

So I followed your suggestion, and this is the result that I got for 0.45s of simulation:

-4 CPU cores = 1545s
-8 CPU cores = 1480s

So for some reason it took longer haha. Still not sure why I am not being able to get proper scaling. Also, attached you can find my log file for the decomposePar command for both cases. I noticed that the max number of faces between processors is too high for the case of 8 processors (I am using scoth method).

Let me know what you think, will appreciate your insights.

Regards,
CVP

randolph · August 11, 2021, 08:27

Chris,

That's interesting. My experience with existing surface tension calculation in OpenFOAM is that sometimes it will generate spurious oscillations in the water surface and slow down your calculation. It is a bazaar that muting the surface tension will slow down the computation. Nevertheless, with dynamic refinement at the water surface, I would expect some additional computational effort.

As for parallel scaling, I typically would not expect a linear scaling from OpenFOAM because of many reseasons. My simulation with interFoam is typical with mesh size from 8M to 20M, and rarely do I use more than 64 cores for these mesh range. I have been able to simulate 70 minutes on 8M mesh with 4 Xeon E5-2698 v3 processors (64 cores in total) in 7 days (wall time).

One ugly yet effective approach for prototyping the simulation (if the simulation speed is priority) is to drop all the schemes to first-order and use limited schemes for the gradient and Laplacian terms. I would recommend getting a solution that you can afford first and then gradually bring up the accuracy by using more accurate schemes.

Thanks,
Rdf

Code:

gradSchemes
{
    default            cellLimited Gauss linear 1;
}

Code:

laplacianSchemes
{
    default         Gauss linear limited 0.5;
}

interpolationSchemes
{
    default         linear;
}

snGradSchemes
{
    default         limited 0.5;
}

ctvanrees · August 15, 2021, 22:29

randolph,

Thanks again for your reply and suggestions, will make some runs with your proposed schemes. Do you think my bottleneck for not getting scaling could be the schemes that I am using?

I did another scaling test considering surface tension = 0 and laminar, with a total cell count of around 900K. The results are in the attached graph.

Do you think I could get better performance considering 16 cores for example?

Regards,
CVP

randolph · August 20, 2021, 21:54

Chris,

apologize for the late reply.

The bottleneck for scaling is complicated. But 0.9 M mesh, I would not go more than 8 processors. Typically, you need a larger mesh to have okay scaling. If you test your model on a small mesh, most time of your simulation time is used for communication instead of actual computation. I remember there is a tool in OpenFOAM to check your communications time and computation time.

If your application is okay with 0.9 M resolution, why is your purpose for test the scaling? 0.9K model is not computationally intense.

Thanks,
Rdf

ctvanrees · August 22, 2021, 21:28

randolph,

Thanks for your reply. I am using right now a machine with 8 cores since it does not have any sense to go any further with 900K cells. My issue is that my simulations are taking too long to complete (about 6 days for 20 seconds of simulation).

My first attempt to try to speed my simulations was to increase the core count. The second option would be to increase the Co number, but I still want to have some decent results (maximum Co = 0.5).

Let me know what you think. Maybe it is okey to have 6 days of computation time?

Regards,
CVP

randolph · August 28, 2021, 08:42

Chris,

Is the application sensitive to the surface wave resolution?

If not, I would somewhat lower the resolution on the surface wave. In my humble opinion, I think keeping the mass conservation (I would monitor the water mass in the simulation) is more important than resolving the waves.

I could also be wrong, maybe resolving the (shock) wave front is important in this type of dam breaking problem. Nevertheless, the CFL number may not be the constrain of the resolution on the wave surface. Many times, the spatial resolution is the constrain. In that case, having a tight CFL criterion does not really improve your resolution. I would make some tests and compare the influence of the time steps size (i.e., CFL).

8 core simulation for 6 days sounds okay to me. Of course, this entirely depends on the computational resource one has in their hands as well as the expectation of the model accuracy.

Thanks,
Rdf

ctvanrees · August 30, 2021, 12:11

randolph,

Thanks again for your reply. Yes, the validation exercise I am performing is to reproduce the experimental results of the slurry depth at a certain point. I will relax my CFL condition and use a finer mesh to see what I get in this case.

What Co number would you use? For both time stepping and interface Co? Currently I am limiting my Co to 0.3.

Let me know what you think.

Regards,
CVP

randolph · August 31, 2021, 08:30

Chris,

I would set both the convective CFL and wave CFL (interface) as large as I could as long as the simulation does not blow up (e.g., 0.9) and I typically just go with piso (or outer correction loop of 1 for pimple). One will typically find the wave CFL is the one that actually limits your time step. In that case, have a diffuse interface would accelerate and stabilize the computation, however at the cost of the accuracy.

Thanks,
Rdf

August 2, 2021, 07:39		#2
geth03 Senior Member Join Date: Dec 2019 Location: Cologne, Germany Posts: 363 Rep Power: 8	when i start a new simulation with interfoam i start with easy basic stuff before going into complexity. 1. before using dynamic mesh, can't you use a static one to see if your case even runs? for 350k cells, you should be able to achieve speed up with 6 or more cores relatively easily. 2. use a higher Courant number than 0.05. this value is really low. 0.25 would be 5 times faster, not that accurate but a good value, too. do you want to wait 5 times more or achieve a result faster with a few % deviation? piu58 likes this.

August 8, 2021, 18:43		#9
randolph Senior Member Reviewer #2 Join Date: Jul 2015 Location: Knoxville, TN Posts: 141 Rep Power: 11	Chris, Try this and see whether this will accelerate your simulation. Code: sigma 0; Thanks, Rdf

August 11, 2021, 08:27		#11
randolph Senior Member Reviewer #2 Join Date: Jul 2015 Location: Knoxville, TN Posts: 141 Rep Power: 11	Chris, That's interesting. My experience with existing surface tension calculation in OpenFOAM is that sometimes it will generate spurious oscillations in the water surface and slow down your calculation. It is a bazaar that muting the surface tension will slow down the computation. Nevertheless, with dynamic refinement at the water surface, I would expect some additional computational effort. As for parallel scaling, I typically would not expect a linear scaling from OpenFOAM because of many reseasons. My simulation with interFoam is typical with mesh size from 8M to 20M, and rarely do I use more than 64 cores for these mesh range. I have been able to simulate 70 minutes on 8M mesh with 4 Xeon E5-2698 v3 processors (64 cores in total) in 7 days (wall time). One ugly yet effective approach for prototyping the simulation (if the simulation speed is priority) is to drop all the schemes to first-order and use limited schemes for the gradient and Laplacian terms. I would recommend getting a solution that you can afford first and then gradually bring up the accuracy by using more accurate schemes. Thanks, Rdf Code: gradSchemes { default cellLimited Gauss linear 1; } Code: laplacianSchemes { default Gauss linear limited 0.5; } interpolationSchemes { default linear; } snGradSchemes { default limited 0.5; }

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
The problem when i use parallel computation for mesh deforming.	Hiroaki Sumikawa	OpenFOAM Running, Solving & CFD	0	November 20, 2018 02:58
Can run setFields in parallel while decomposed?	totalart	OpenFOAM Running, Solving & CFD	2	August 20, 2018 23:07
Explicitly filtered LES	saeedi	Main CFD Forum	16	October 14, 2015 11:58
Interfoam blows on parallel run	danvica	OpenFOAM Running, Solving & CFD	16	December 22, 2012 02:09
parallel performance on BX900	uzawa	OpenFOAM Installation	3	September 5, 2011 15:52

August 2, 2021, 16:20		#3
ctvanrees New Member christopher van rees Join Date: Jun 2021 Location: Chile Posts: 9 Rep Power: 5	geth03, Thank you for your reply. I have tried using static mesh, and also not being able to achieve better performance with more cores. Could be because my Co number is set too low? I will try if I get scaling by using Co=0.25 and the 350K cells case (simplest case). How much time, roughly, should I expect with these settings? I should find a good trade-off between time and quality, in order to achieve "reasonable" results in a "reasonable" amount of time (no more than 3 days of computing). My case runs and it is stable, the problem is that does not scale, and it is very slow. I was thinking that maybe one of the solution schemes I am using is one of my "bottlenecks". Let me know what you think. I will get back with my results using higher Co number and static mesh. Regards, CVP

August 3, 2021, 03:08		#4
geth03 Senior Member Join Date: Dec 2019 Location: Cologne, Germany Posts: 363 Rep Power: 8	speed up with core size does not depend on the Courant number, the Co will just set the time step, but the number of equations to be solved and the complexity of the matrix and therefor iterations needed is dependant on the cell size, quality and residuals. you can easily check cell quality of a static mesh with checkMesh. could you show results from the terminal? i can't tell you out of my mind how to set the Co to get results within 3 days, check the terminal output for the timestep and how much time is needed for one iteration, then calculate how many iterations you would need to finish your simulation. adjust your Co accordingly to get faster result, you should definitely not go over Co=1, the lower the Co the more accurate your result.

August 3, 2021, 03:25		#5
Yann Senior Member Yann Join Date: Apr 2012 Location: France Posts: 1,148 Rep Power: 27	Hi CVP, I see you are using the yPlus function object with a writeControl set to timeStep but no writeInterval. This should lead to writing the yPlus field on every timeStep and it could significantly slow down your simulation. Try to switch you yPlus writeControl from timeStep to writeTime and see if it improves your calculation time. Yann

August 7, 2021, 18:58		#7
randolph Senior Member Reviewer #2 Join Date: Jul 2015 Location: Knoxville, TN Posts: 141 Rep Power: 11	Chris, What is your time step size? From the snapshot, I would recommend you mute the surface tension calculation (sigma in the transport properties). Let me know if this helps. Thanks, Rdf

August 8, 2021, 13:54		#8
ctvanrees New Member christopher van rees Join Date: Jun 2021 Location: Chile Posts: 9 Rep Power: 5	randolph, My time step size is controlled by the Co number, so it is set automatically so that Co < 0.25. My initial time step is 1e-5. How could I mute the surface tension calculation? From my understanding, it is not possible to do this. Maybe I could set the surface tension to 0? Thanks for your reply, CVP

August 20, 2021, 21:54		#13
randolph Senior Member Reviewer #2 Join Date: Jul 2015 Location: Knoxville, TN Posts: 141 Rep Power: 11	Chris, apologize for the late reply. The bottleneck for scaling is complicated. But 0.9 M mesh, I would not go more than 8 processors. Typically, you need a larger mesh to have okay scaling. If you test your model on a small mesh, most time of your simulation time is used for communication instead of actual computation. I remember there is a tool in OpenFOAM to check your communications time and computation time. If your application is okay with 0.9 M resolution, why is your purpose for test the scaling? 0.9K model is not computationally intense. Thanks, Rdf

August 22, 2021, 21:28		#14
ctvanrees New Member christopher van rees Join Date: Jun 2021 Location: Chile Posts: 9 Rep Power: 5	randolph, Thanks for your reply. I am using right now a machine with 8 cores since it does not have any sense to go any further with 900K cells. My issue is that my simulations are taking too long to complete (about 6 days for 20 seconds of simulation). My first attempt to try to speed my simulations was to increase the core count. The second option would be to increase the Co number, but I still want to have some decent results (maximum Co = 0.5). Let me know what you think. Maybe it is okey to have 6 days of computation time? Regards, CVP

August 28, 2021, 08:42		#15
randolph Senior Member Reviewer #2 Join Date: Jul 2015 Location: Knoxville, TN Posts: 141 Rep Power: 11	Chris, Is the application sensitive to the surface wave resolution? If not, I would somewhat lower the resolution on the surface wave. In my humble opinion, I think keeping the mass conservation (I would monitor the water mass in the simulation) is more important than resolving the waves. I could also be wrong, maybe resolving the (shock) wave front is important in this type of dam breaking problem. Nevertheless, the CFL number may not be the constrain of the resolution on the wave surface. Many times, the spatial resolution is the constrain. In that case, having a tight CFL criterion does not really improve your resolution. I would make some tests and compare the influence of the time steps size (i.e., CFL). 8 core simulation for 6 days sounds okay to me. Of course, this entirely depends on the computational resource one has in their hands as well as the expectation of the model accuracy. Thanks, Rdf

August 30, 2021, 12:11		#16
ctvanrees New Member christopher van rees Join Date: Jun 2021 Location: Chile Posts: 9 Rep Power: 5	randolph, Thanks again for your reply. Yes, the validation exercise I am performing is to reproduce the experimental results of the slurry depth at a certain point. I will relax my CFL condition and use a finer mesh to see what I get in this case. What Co number would you use? For both time stepping and interface Co? Currently I am limiting my Co to 0.3. Let me know what you think. Regards, CVP

August 31, 2021, 08:30		#17
randolph Senior Member Reviewer #2 Join Date: Jul 2015 Location: Knoxville, TN Posts: 141 Rep Power: 11	Chris, I would set both the convective CFL and wave CFL (interface) as large as I could as long as the simulation does not blow up (e.g., 0.9) and I typically just go with piso (or outer correction loop of 1 for pimple). One will typically find the wave CFL is the one that actually limits your time step. In that case, have a diffuse interface would accelerate and stabilize the computation, however at the cost of the accuracy. Thanks, Rdf