CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   OpenFOAM Running, Solving & CFD (https://www.cfd-online.com/Forums/openfoam-solving/)
-   -   Parallel Performance of Large Case (https://www.cfd-online.com/Forums/openfoam-solving/122998-parallel-performance-large-case.html)

andrea.pasquali February 16, 2015 09:19

Hi,
I did not see different running times when running on nodes on a single switch.
My test was with mesh generation whit the refinement stage in snappyHexMesh.
As I said, I did not investigate it in detail yet. I only tried once recompiling mpi and openfoam with intel12 but having still the same (bad) performance with infiniband...

Andrea

arnaud6 February 18, 2015 05:09

Hello,

So I have tried the rebumberMesh before solving and it looks like it has improved a bit the performances of both single and multiple switches, reducing the running time by ~10%.

But I still can't see why the running times are so slow across multiple switches. RenumberMesh or not, we should get roughly the same running time whatever the nodes selected, right ?

wyldckat February 22, 2015 14:12

Greetings to all!

@arnaud6:
Quote:

Originally Posted by arnaud6 (Post 532291)
RenumberMesh or not, we should get roughly the same running time whatever the nodes selected, right ?

InfiniBand uses a special addressing mechanism that is not used by Ethernet MPI technology; AFAIK, InfiniBand uses a mechanism for sharing memory directly between nodes, mapping out as much as possible, between both the RAM of the machines and by mapping out the "path ways of least resistance" for communicating between each machine. This to say that an InfiniBand switch is far more complex than an Ethernet switch, because as many as possible paths are mapped out between each pair of ports on that switch.

Problem is that when 3 switches are used, the tree becomes a lot larger and is sectioned in 3 parts, making it a bit harder to map out communications.

Commercial CFD software might already have these kinds of configurations taken into account, by either asking the InfiniBand controls to adjust accordingly, or the CFD software itself tries to balance this out on its own, by placing sub-domains closer to each other on the same machines that share a switch and keeping communication to a minimum when communicating with machines that are connected on other switches. But when you use OpenFOAM, you're probably not taking this into account.

I haven't had to deal with this myself, so I have no idea how this is properly configured, but there are at least a few things I can imagine that could work:
  • Have you tried PCG yet? If not, you better try it as well.
  • Try multi-level decomposition: http://www.cfd-online.com/Forums/ope...tml#post367979 post #8 - the idea is that you should have the first level divided by switch group.
    • Note: if you have 3 switches, either you have one master switch that connects only between the 2 other switches and has no direct machines, or you have 1 switch per group of machines in a daisy chain. Keep this in mind when using multi-level decomposition.
  • Contact your InfiniBand support line on how to configure mpirun to map out properly the communications.
Best regards,
Bruno

arnaud6 February 26, 2015 05:15

Hi Bruno,

Thanks for your ideas !

I am looking at the PCG solvers.
Would you advice to use the combination PCG for p and PBiCG for other variables or using PCG for p and keep other variables with a smopothSolver/Gauss Seidel ? In my cases it looks like p is the most difficult to solve (at least it is the variable that takes the longest time to be solved at each iteration).

The difficulty is that I don't have much control on the nodes thus the switches that will be selected when I submit my parallel job ...
I will see what I can do with the IB support.

wyldckat October 24, 2015 15:37

Hi arnaud6,

Quote:

Originally Posted by arnaud6 (Post 533476)
I am looking at the PCG solvers.
Would you advice to use the combination PCG for p and PBiCG for other variables or using PCG for p and keep other variables with a smopothSolver/Gauss Seidel ? In my cases it looks like p is the most difficult to solve (at least it is the variable that takes the longest time to be solved at each iteration).

Sorry for the really late reply, I've had this on my to-do list for a long time and only now did I take a quick look into it. But unfortunately I still don't have a specific answer/solution for this.
The best I could tell you back then and now is that you try running for a few iterations yourself with each configuration.
Even the GAMG matrix solver can sometimes be improved if you fine tune the parameters and do some trial and error sessions with your case, because these parameters depend on the case size and how the sub-domains in the case are structured.

Either way, I hope you managed to figure this out on your own.

Best regards,
Bruno

mgg November 4, 2015 10:49

Hi Bruno,

indeed. In my expericence, how the subdomain is structured has strong impact on the performance. So I choose to decompose manually.

My problem now is as following:

I am running a DNS case (22 Mio. cells) using buoyantPimpleFoam (OF V2.4). The case is a long pipe with an inlet and outlet. The fluid is air. Inlet Re is about 5400.

For getting better scalability, I use PCG for pressure equation. If I use perfect gas equation of state, the number of iterations will be around 100, which is acceptable. If I use icopolynom or rhoConst to describe the density, the number of iterations will be around 4000! If I use GAMG for p equation, number of iteration will be under 5, but the scalability is poor with above 500 cores. Does anyone has any opinion?

How can I improve PCG solver to decrease the number of iterations? Thank you.

Quote:

Originally Posted by wyldckat (Post 570125)
Hi arnaud6,


Sorry for the really late reply, I've had this on my to-do list for a long time and only now did I take a quick look into it. But unfortunately I still don't have a specific answer/solution for this.
The best I could tell you back then and now is that you try running for a few iterations yourself with each configuration.
Even the GAMG matrix solver can sometimes be improved if you fine tune the parameters and do some trial and error sessions with your case, because these parameters depend on the case size and how the sub-domains in the case are structured.

Either way, I hope you managed to figure this out on your own.

Best regards,
Bruno


wyldckat November 7, 2015 11:52

Quote:

Originally Posted by mgg (Post 571875)
If I use icopolynom or rhoConst to describe the density, the number of iterations will be around 4000! If I use GAMG for p equation, number of iteration will be under 5, but the scalability is poor with above 500 cores. Does anyone has any opinion?

How can I improve PCG solver to decrease the number of iterations? Thank you.

Quick questions/answers:
  • I don't know how to improve the PCG solver... perhaps you need to use another preconditioner? I can't remember right now, but isn't GAMG possible to be used as a preconditioner?
  • If GAMG can do it in 5 iterations, are those 5 iterations taking a lot longer than 4000 of the PCG?
  • I'm not familiar with DNS enough to know this, but isn't it possible to solve the same pressure equation a few times, with relaxation steps in between, like PIMPLE and SIMPLE have this ability?
  • GAMG is very configurable. Are you simply using a standard set of settings or have you tried to find the optimum settings for GAMG? Because GAMG can only scale well if you configure it correctly. I know there was a thread about this somewhere...
  • After a quick search:

arnaud6 March 25, 2016 05:03

Sorry for getting back so late on this one. The problem was mpirun 1.6.5. As soon as I switched to mpirun 1.8.3, the slowness disappeared !


All times are GMT -4. The time now is 11:01.