CFD Online Discussion Forums - [snappyHexMesh] decomposotion options

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)

- OpenFOAM Meshing & Mesh Conversion (https://www.cfd-online.com/Forums/openfoam-meshing/)

- - [snappyHexMesh] decomposotion options - please help (https://www.cfd-online.com/Forums/openfoam-meshing/224002-decomposotion-options-please-help.html)

decomposotion options - please help

I currently run cases on separate servers but I am attempting to restructure into a proper cluster. Unfortunately my scaling has been poor and I suspect decomposition is partly to blame.

I have previously attached 2 servers together and achieved a ~1.2:1 speed up which I was very happy with. When clustering 6x machines I only complete a solve in roughly half the time of a single machine. When utilizing 3x machines I see iteration times of only slightly faster than a single machine (31s/iteration vs 36s/iteration).

I would like to try different methods but my case is not uniform in cell distribution so the Simple and Hierarchical arrangements don't seem like they will work well at first glance.

Also, minimizing cell interfaces is great, but what I really need is to minimize interfaces between the larger portions of each machine to reduce network utilization. Is there a method for this?

Hi,

Have you tried scotch as your decomposition method? That is supposed to give you a very load balanced mesh distribution...

Cheers,
Antimony

As an update:
Yes, absolutely scotch was where I started and all I have previously used.

I attempted various hierarchical methods and all were slower.

I decided to try using the 1gb network rather than the 10gb network (with the idea to see if it was slower, by how much, and if the same speed then my 10gb net just isn't functioning at peak speed). Surprise, the 1gb network was faster, achieving a reasonable speed-up (ie. 30% s/iteration reduction using 3x nodes). However, this is not consistent. It seems I have some network issues, I believe with the OS or drivers, that need fixing so I am approaching it from there.

What is the order of cells per processor? OF is known to scale well (up to 1.5 billion cells with virtually ideal weak scaling). If you have very few cells per proc, you very likely end up speed decrease at the expense of communication overhead.

You may run 'mpirun -np <procNo> renumberMesh -overwrite -parallel' to gain some speed.

Also, turn off the run time modification, and if possible all FOs. Selecting parallel-consistent solvers, particularly for pressure, is very important as well.

Might compilation with the highest level of optimisation can help.

Quote:

Originally Posted by HPE (Post 758204)

Selecting parallel-consistent solvers, particularly for pressure, is very important as well.

Just curious as to what exactly you mean by this?

I am getting between 60m - 80m cell meshes. I have previously seen much better scaling than I currently am, hence the real solution:

- I was getting very inconsistent iteration times when using multiple nodes so I tinkered with which NIC was being used. Using the 1gb (not the 10gb) I actually got faster times. However, this was not consistently repeatable. In some cases it was better if I switched which subnet was being used twice, coming back to the original to find it working faster than before. A friend commented "eh, probably a driver issue"

- On rebooting some machines the NIC was no longer accessible. The firmware was missing. This caused me to head down a variety of paths but in the end I was able to manually download and provide it the correct firmware which Ubuntu was already looking for (It had updated the firmware entry but not placed the actual firmware file in the location directed to).

- Yay, now my NIC's work. Hey, let's try a case. Surprise: Everything is working as intended and scaling well. Using 80 cores (2x nodes) my iteration time is almost exactly half of a 40 core solve. (23 s/iter for 79m cells vs 42 s/iter for 78m cells).

I realize this no longer really fits in this sub forum but this was the solution to the real issue which caused me to investigate decomposition options.

So, problem solved? No longer related to OpenFOAM?

May be an idiot was using the parent node without batching simulations and slowing down the entire system at the same time you do run your simulations in child nodes?

Hi me3840,

e.g. GAMG is inherently not parallel consistent, but PCG. Please do have a look at their diffs, since I usually try not to answer any engineering judgement questions, but only OpenFOAM-usage related questions.

Hope you understand.

Quote:

Originally Posted by HPE (Post 758590)

HA! The only idiot on this system is me... I do plenty of dumb things but the nodes are definitely only used for the cases at hand. I use TSP to keep things in line.

I'm reasonably sure this is exclusively related to the NIC drivers and was never related to decomposition in any way.

ahaha :D

Sorry!

You dont learn fast if you dont do mistakes, do you? :) Thanks for the info. :)

Thanks for the solver tip! I will investigate this further.