CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM > OpenFOAM Pre-Processing

massively parallel AMI: decomposePar with singleProcessorFaceSets constraint

Register Blogs Members List Search Today's Posts Mark Forums Read

Like Tree2Likes
  • 1 Post By louisgag
  • 1 Post By louisgag

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   April 9, 2020, 12:47
Default massively parallel AMI: decomposePar with singleProcessorFaceSets constraint
  #1
Senior Member
 
louisgag's Avatar
 
Louis Gagnon
Join Date: Mar 2009
Location: Stuttgart, Germany
Posts: 338
Rep Power: 18
louisgag is on a distinguished road
Send a message via ICQ to louisgag
I'm not able to get a balanced distribution with the automatic algorithms (scotch, kahip) when I have many patches pairs that are constrained to remain on the same processor.


I have a single AMI zone but It has more cells that what a balanced decomposition would give me. Both AMI patches together touch 44k cells but I would like to have 10k-14k cells per processor.


What I do is to slice the cylinder into more AMI patch pairs, convert them to sets, and then give those as constrained set in decomposeParDict.


When I leave the processor as -1, every set gets assigned to the same processor and I am back to having an unbalanced decomposition. I think it has to do with these lines of code: https://www.openfoam.com/documentati...ce.html#l00284


When I impose different processors myself, taking for example 5 or 50 subslices of my AMI cylinder, I still end up unbalanced. Some processor will even have 0 cells!


Is my only option to go to manual decomposition? As an alternative I'm thinking of constraining the AMI to at least each remain on the same computer socket, using multilevel or multiregion decomposition, but I have the feeling that means a lot of hand tuning and wasted time.


Anyone has experience with AMI interfaces that have more cells that an optimal parallel simulation can take? Comments/advice would be appreciated.
louisgag is offline   Reply With Quote

Old   December 1, 2020, 02:17
Default
  #2
New Member
 
Alexandra H
Join Date: May 2020
Posts: 4
Rep Power: 5
simply-alex is on a distinguished road
Hey Louis, did you ever find an effective solution to this issue? I am having the same problem.

All I can think of, is splitting the large cyclic patches into multiple smaller ones and grouping them into multiple faceSets
simply-alex is offline   Reply With Quote

Old   December 1, 2020, 03:32
Default
  #3
Senior Member
 
louisgag's Avatar
 
Louis Gagnon
Join Date: Mar 2009
Location: Stuttgart, Germany
Posts: 338
Rep Power: 18
louisgag is on a distinguished road
Send a message via ICQ to louisgag
Hi Alexandra,

I did try what you mention (so making "slices" of the AMIs), it works to some extent, but I usually still end up with a struggling decomposition algorithm that gives a quite unbalanced decomposition, even away from the AMI patches. It was not helping parallelization speedup.


For my last attempts, I ended up doing multi-region decomposition:
1. split the domain into regions between outer, AMI-faces, and inner regions.
2. run a modified* decomposePar with the -cellDist and by regions, using for example scotch on the outer and inner regions and a simple, manual, or hierarchical decomposition on the faces of the AMIs.
3. use the resulting cell distribution to manually decompose your original mesh (the one that was not split into regions)
*: I had to write my own decomposePar utility to do it, because otherwise each region is decomposed using the same processors, but here you want to have, for example, region0 on procs[00-05], region1 on procs[06-07], region2 on procs[08-16], etc and if you do it using python on the created cellDist files it takes forever, but with the modified decomposePar utility its quite fast.


Just as a heads up, in the end I do not gain parallel speedup by doing so, the AMI algorithm is apparently not well parallelized and thus even when all the cells are on one or a few processors you still are slow.
I made a splash talk about this last summer, only images here, but the plot on the last slide is self explanatory (free decomp is when I did not do anything to maintain the AMI faces/cells on a single or set of processors)...


I was also hinted by Eugene de Villiers, that a precomputation of the AMI weights would help and was somehow already done by someone, but I never found the related code.


Kind regards,
simply-alex likes this.
louisgag is offline   Reply With Quote

Old   September 6, 2021, 13:31
Default AMI Parallelization
  #4
New Member
 
Jarlath
Join Date: Apr 2018
Posts: 12
Rep Power: 7
JMcnetee is on a distinguished road
I am finding that OF is not scaling with number of processors (on Rescale). Based on the comments below it may be due to the AMI that I am using.

Is there anything more concrete available on how openFOAM scales when using an AMI? Is there a better alternative to allow for a better scaling with number of processors? I would like to use > 100 processors.
JMcnetee is offline   Reply With Quote

Old   September 7, 2021, 03:01
Default
  #5
Senior Member
 
louisgag's Avatar
 
Louis Gagnon
Join Date: Mar 2009
Location: Stuttgart, Germany
Posts: 338
Rep Power: 18
louisgag is on a distinguished road
Send a message via ICQ to louisgag
AMI is probably your best bet. Try to make the interface as clean as possible and avoid using specific rules on your decomposition method. I've been lately able to get much better scaling as before, also with higher number of processors than what you mention. I think, however, the architecture of the supercomputer you're using also plays a major role...
JMcnetee likes this.
louisgag is offline   Reply With Quote

Old   September 15, 2021, 08:42
Default AMI and crossflow turbines
  #6
New Member
 
Jarlath
Join Date: Apr 2018
Posts: 12
Rep Power: 7
JMcnetee is on a distinguished road
Hi Louis

Based on the fact that you appear to be in Stuttgart, I imagine that you are familiar with the crossflow turbine problem.

I am modelling a crossflow hydrokinetic turbine in Openfoam. Can you provide some more details of how you are able to scale up successfully to a large number of processors.

All I can get to is 60 cores.

Best Regards
Jarlath
JMcnetee is offline   Reply With Quote

Old   September 15, 2021, 08:54
Default
  #7
Senior Member
 
louisgag's Avatar
 
Louis Gagnon
Join Date: Mar 2009
Location: Stuttgart, Germany
Posts: 338
Rep Power: 18
louisgag is on a distinguished road
Send a message via ICQ to louisgag
No, I am not familiar with it. I do cycloidal rotors: https://www.youtube.com/watch?v=q0pafX63_x0


give me more details about your case and I'll see if I can pinpoint something, but scaling is, from my experience, very sensible... you need to do a lot of test runs...


kind regards
louisgag is offline   Reply With Quote

Old   September 15, 2021, 09:08
Default AMI Parallelization
  #8
New Member
 
Jarlath
Join Date: Apr 2018
Posts: 12
Rep Power: 7
JMcnetee is on a distinguished road
Hi Louis

based on your video, I think we are doing the exact same problem. In general we use fixed pitch foils, but we are looking at variable pitch also.

https://orpc.co/our-solutions/turbine-generator-unit

So the problem setup is likely very similar.

I usually setup a cylindrical AMI around the rotor zone, and generally there is only one AMI. In principle I'd like to model with y+ around 1 on the blades. That leads to large meshes.

I usually use Scotch as the decomposition method, with no constraints.
JMcnetee is offline   Reply With Quote

Old   September 16, 2021, 03:11
Default
  #9
Senior Member
 
louisgag's Avatar
 
Louis Gagnon
Join Date: Mar 2009
Location: Stuttgart, Germany
Posts: 338
Rep Power: 18
louisgag is on a distinguished road
Send a message via ICQ to louisgag
how many cells per processor do you have?
How many cells overall and how many AMI faces?
Are the AMI cells faces similarly sized on both sides of your interface?
What solve do you user? And Courant number? (I use pimple and very high courant number to avoid "wasting" time with AMI)
louisgag is offline   Reply With Quote

Old   September 20, 2021, 13:49
Default
  #10
Member
 
Join Date: May 2017
Posts: 31
Rep Power: 8
sqek is on a distinguished road
Hello
I've just had (and solved!) a similar issue
I already had my AMI patches split into lots of smaller patches, and each patch pair had a faceSet, used in singleProcessorFaceSets
But because it forces any cells that share a *point* with a face in a singleProcessorFaceSet, and my split AMI patches were next to each other, they all shared points and were forced onto the same processor
So for patches with names
Code:
AMI1_M, AMI1_S, AMI2_M, AMI2_S, AMI3_M, AMI3_S, etc
and a topoSetDict with
Code:
{ name AMI1Set; type faceSet; action new; source patchToFace; sourceInfo { name "AMI1_[MS]"; } }
for each AMI patch pair, the point shared by AMI1_M and AMI2_M still forced them to be on the same processor.

The solution was to get topoSet to remove any faces that share points between two patches, so to get
Code:
 /   AMI1_M    \ /   AMI2_M    \
o---o---o---o===o===o---o---o---o
 \___AMI1Set___/ \___AMI2Set___/
to change to
Code:
 /   AMI1_M    \ /   AMI2_M    \
o---o---o---o===o===o---o---o---o
 \_AMI1Set_/         \_AMI2Set_/
( - s represent faces/cells, os represent points, = s are faces/cells that are highlighted - hopefully this makes some sense)
Before, the cells represented by === were part of both singleProcessorFaceSet constraints, because they both shared the middle point with both faceSets; after, each cell is only in one constraint
I did this using topoSet with something like
Code:
{ name AMI1Set; type faceSet; action new; source patchToFace; sourceInfo { name "AMI1_[MS]"; } }
{ name points1; type pointSet; action new; source faceToPoint; sourceInfo { option all; set AMI1Set; } }
{ name AMI2Set; type faceSet; action new; source patchToFace; sourceInfo { name "AMI2_[MS]"; } }
{ name points2; type pointSet; action new; source faceToPoint; sourceInfo { option all; set AMI2Set; } }
{ name AMI1Set; type faceSet; action delete; source pointToFace; sourceInfo { set points2; option any; } }
{ name AMI2Set; type faceSet; action delete; source pointToFace; sourceInfo { set points1; option any; } }
The faces that have been removed from the sets, are still kept on the same processor between AMI1Set and AMI2Set, because they are on cells which neighbour the end points of the faces which have been kept - basically if you imagine decomposePar making a pointSet from the faceSets you give it, then making a cellSet with option any, then keeping cells in that cellSet on the same processor

This way the constraints become much less severe, and the decomposition gets much better, as long as you split the cyclicAMI into small enough patches (i.e. with much fewer adjacent per patch than your target/average cells per processor)

Hope this helps!
sqek is offline   Reply With Quote

Old   September 21, 2021, 16:56
Default
  #11
New Member
 
Jarlath
Join Date: Apr 2018
Posts: 12
Rep Power: 7
JMcnetee is on a distinguished road
60 processors
11.5 million cells

Number of processor faces = 574347
Max number of cells = 191344 (0.999487% above average 189450)
Max number of processor patches = 22 (204.147% above average 7.23333)
Max number of faces between processors = 35445 (85.1407% above average 19144.9)

One AMI. Uses two patches (AMI_inner and AMI_outer). The meshes on each of these patches should have been the exact same mesh, but I just noticed that the log file says that there are difference numbers of faces between source and target.

"AMI: Creating addressing and weights between 27332 source faces and 28772 target faces"

There are three surfaces to the AMI, the cylinder surface, and the two end caps. The mesh size on all three surfaces is similar.

Using pimpleFOAM

"Courant Number mean: 0.0945093 max: 1315.62"

I am using a "time step" of 1 degree. If I increase the time step to 4 degrees, the results are still good, but a little different from the 1 degree run. As time step increases to 6 degree and 8 degree, the results start to become different from the 1 degree case.


Cheers!
Jarlath
JMcnetee is offline   Reply With Quote

Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
[swak4Foam] reconstructPar error in MRFsimpleFoam (ami apprach) using GroovyBC velocity inlet Krao OpenFOAM Community Contributions 3 August 19, 2019 05:40
MPI error in parallel application usv001 OpenFOAM Programming & Development 2 September 14, 2017 11:30
chtMultiRegionSimpleFoam: crash on parallel run student666 OpenFOAM Running, Solving & CFD 3 April 20, 2017 11:05
snappyHexMesh in parallel with AMI louisgag OpenFOAM Pre-Processing 8 September 15, 2014 02:57
New sixDoFRigidBody BC working with laplaceFaceDecomposition Ya_Squall2010 OpenFOAM Running, Solving & CFD 13 April 17, 2013 02:04


All times are GMT -4. The time now is 21:21.