|
[Sponsors] |
massively parallel AMI: decomposePar with singleProcessorFaceSets constraint |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
April 9, 2020, 12:47 |
massively parallel AMI: decomposePar with singleProcessorFaceSets constraint
|
#1 |
Senior Member
|
I'm not able to get a balanced distribution with the automatic algorithms (scotch, kahip) when I have many patches pairs that are constrained to remain on the same processor.
I have a single AMI zone but It has more cells that what a balanced decomposition would give me. Both AMI patches together touch 44k cells but I would like to have 10k-14k cells per processor. What I do is to slice the cylinder into more AMI patch pairs, convert them to sets, and then give those as constrained set in decomposeParDict. When I leave the processor as -1, every set gets assigned to the same processor and I am back to having an unbalanced decomposition. I think it has to do with these lines of code: https://www.openfoam.com/documentati...ce.html#l00284 When I impose different processors myself, taking for example 5 or 50 subslices of my AMI cylinder, I still end up unbalanced. Some processor will even have 0 cells! Is my only option to go to manual decomposition? As an alternative I'm thinking of constraining the AMI to at least each remain on the same computer socket, using multilevel or multiregion decomposition, but I have the feeling that means a lot of hand tuning and wasted time. Anyone has experience with AMI interfaces that have more cells that an optimal parallel simulation can take? Comments/advice would be appreciated. |
|
December 1, 2020, 02:17 |
|
#2 |
New Member
Alexandra H
Join Date: May 2020
Posts: 4
Rep Power: 5 |
Hey Louis, did you ever find an effective solution to this issue? I am having the same problem.
All I can think of, is splitting the large cyclic patches into multiple smaller ones and grouping them into multiple faceSets |
|
December 1, 2020, 03:32 |
|
#3 |
Senior Member
|
Hi Alexandra,
I did try what you mention (so making "slices" of the AMIs), it works to some extent, but I usually still end up with a struggling decomposition algorithm that gives a quite unbalanced decomposition, even away from the AMI patches. It was not helping parallelization speedup. For my last attempts, I ended up doing multi-region decomposition: 1. split the domain into regions between outer, AMI-faces, and inner regions. 2. run a modified* decomposePar with the -cellDist and by regions, using for example scotch on the outer and inner regions and a simple, manual, or hierarchical decomposition on the faces of the AMIs. 3. use the resulting cell distribution to manually decompose your original mesh (the one that was not split into regions) *: I had to write my own decomposePar utility to do it, because otherwise each region is decomposed using the same processors, but here you want to have, for example, region0 on procs[00-05], region1 on procs[06-07], region2 on procs[08-16], etc and if you do it using python on the created cellDist files it takes forever, but with the modified decomposePar utility its quite fast. Just as a heads up, in the end I do not gain parallel speedup by doing so, the AMI algorithm is apparently not well parallelized and thus even when all the cells are on one or a few processors you still are slow. I made a splash talk about this last summer, only images here, but the plot on the last slide is self explanatory (free decomp is when I did not do anything to maintain the AMI faces/cells on a single or set of processors)... I was also hinted by Eugene de Villiers, that a precomputation of the AMI weights would help and was somehow already done by someone, but I never found the related code. Kind regards, |
|
September 6, 2021, 13:31 |
AMI Parallelization
|
#4 |
New Member
Jarlath
Join Date: Apr 2018
Posts: 12
Rep Power: 7 |
I am finding that OF is not scaling with number of processors (on Rescale). Based on the comments below it may be due to the AMI that I am using.
Is there anything more concrete available on how openFOAM scales when using an AMI? Is there a better alternative to allow for a better scaling with number of processors? I would like to use > 100 processors. |
|
September 7, 2021, 03:01 |
|
#5 |
Senior Member
|
AMI is probably your best bet. Try to make the interface as clean as possible and avoid using specific rules on your decomposition method. I've been lately able to get much better scaling as before, also with higher number of processors than what you mention. I think, however, the architecture of the supercomputer you're using also plays a major role...
|
|
September 15, 2021, 08:42 |
AMI and crossflow turbines
|
#6 |
New Member
Jarlath
Join Date: Apr 2018
Posts: 12
Rep Power: 7 |
Hi Louis
Based on the fact that you appear to be in Stuttgart, I imagine that you are familiar with the crossflow turbine problem. I am modelling a crossflow hydrokinetic turbine in Openfoam. Can you provide some more details of how you are able to scale up successfully to a large number of processors. All I can get to is 60 cores. Best Regards Jarlath |
|
September 15, 2021, 08:54 |
|
#7 |
Senior Member
|
No, I am not familiar with it. I do cycloidal rotors: https://www.youtube.com/watch?v=q0pafX63_x0
give me more details about your case and I'll see if I can pinpoint something, but scaling is, from my experience, very sensible... you need to do a lot of test runs... kind regards |
|
September 15, 2021, 09:08 |
AMI Parallelization
|
#8 |
New Member
Jarlath
Join Date: Apr 2018
Posts: 12
Rep Power: 7 |
Hi Louis
based on your video, I think we are doing the exact same problem. In general we use fixed pitch foils, but we are looking at variable pitch also. https://orpc.co/our-solutions/turbine-generator-unit So the problem setup is likely very similar. I usually setup a cylindrical AMI around the rotor zone, and generally there is only one AMI. In principle I'd like to model with y+ around 1 on the blades. That leads to large meshes. I usually use Scotch as the decomposition method, with no constraints. |
|
September 16, 2021, 03:11 |
|
#9 |
Senior Member
|
how many cells per processor do you have?
How many cells overall and how many AMI faces? Are the AMI cells faces similarly sized on both sides of your interface? What solve do you user? And Courant number? (I use pimple and very high courant number to avoid "wasting" time with AMI) |
|
September 20, 2021, 13:49 |
|
#10 |
Member
Join Date: May 2017
Posts: 31
Rep Power: 8 |
Hello
I've just had (and solved!) a similar issue I already had my AMI patches split into lots of smaller patches, and each patch pair had a faceSet, used in singleProcessorFaceSets But because it forces any cells that share a *point* with a face in a singleProcessorFaceSet, and my split AMI patches were next to each other, they all shared points and were forced onto the same processor So for patches with names Code:
AMI1_M, AMI1_S, AMI2_M, AMI2_S, AMI3_M, AMI3_S, etc Code:
{ name AMI1Set; type faceSet; action new; source patchToFace; sourceInfo { name "AMI1_[MS]"; } } The solution was to get topoSet to remove any faces that share points between two patches, so to get Code:
/ AMI1_M \ / AMI2_M \ o---o---o---o===o===o---o---o---o \___AMI1Set___/ \___AMI2Set___/ Code:
/ AMI1_M \ / AMI2_M \ o---o---o---o===o===o---o---o---o \_AMI1Set_/ \_AMI2Set_/ Before, the cells represented by === were part of both singleProcessorFaceSet constraints, because they both shared the middle point with both faceSets; after, each cell is only in one constraint I did this using topoSet with something like Code:
{ name AMI1Set; type faceSet; action new; source patchToFace; sourceInfo { name "AMI1_[MS]"; } } { name points1; type pointSet; action new; source faceToPoint; sourceInfo { option all; set AMI1Set; } } { name AMI2Set; type faceSet; action new; source patchToFace; sourceInfo { name "AMI2_[MS]"; } } { name points2; type pointSet; action new; source faceToPoint; sourceInfo { option all; set AMI2Set; } } { name AMI1Set; type faceSet; action delete; source pointToFace; sourceInfo { set points2; option any; } } { name AMI2Set; type faceSet; action delete; source pointToFace; sourceInfo { set points1; option any; } } This way the constraints become much less severe, and the decomposition gets much better, as long as you split the cyclicAMI into small enough patches (i.e. with much fewer adjacent per patch than your target/average cells per processor) Hope this helps! |
|
September 21, 2021, 16:56 |
|
#11 |
New Member
Jarlath
Join Date: Apr 2018
Posts: 12
Rep Power: 7 |
60 processors
11.5 million cells Number of processor faces = 574347 Max number of cells = 191344 (0.999487% above average 189450) Max number of processor patches = 22 (204.147% above average 7.23333) Max number of faces between processors = 35445 (85.1407% above average 19144.9) One AMI. Uses two patches (AMI_inner and AMI_outer). The meshes on each of these patches should have been the exact same mesh, but I just noticed that the log file says that there are difference numbers of faces between source and target. "AMI: Creating addressing and weights between 27332 source faces and 28772 target faces" There are three surfaces to the AMI, the cylinder surface, and the two end caps. The mesh size on all three surfaces is similar. Using pimpleFOAM "Courant Number mean: 0.0945093 max: 1315.62" I am using a "time step" of 1 degree. If I increase the time step to 4 degrees, the results are still good, but a little different from the 1 degree run. As time step increases to 6 degree and 8 degree, the results start to become different from the 1 degree case. Cheers! Jarlath |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[swak4Foam] reconstructPar error in MRFsimpleFoam (ami apprach) using GroovyBC velocity inlet | Krao | OpenFOAM Community Contributions | 3 | August 19, 2019 05:40 |
MPI error in parallel application | usv001 | OpenFOAM Programming & Development | 2 | September 14, 2017 11:30 |
chtMultiRegionSimpleFoam: crash on parallel run | student666 | OpenFOAM Running, Solving & CFD | 3 | April 20, 2017 11:05 |
snappyHexMesh in parallel with AMI | louisgag | OpenFOAM Pre-Processing | 8 | September 15, 2014 02:57 |
New sixDoFRigidBody BC working with laplaceFaceDecomposition | Ya_Squall2010 | OpenFOAM Running, Solving & CFD | 13 | April 17, 2013 02:04 |