CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > OpenFOAM Running, Solving & CFD

Performance of GGI case in parallel

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Display Modes
Old   May 13, 2009, 09:17
Default Performance of GGI case in parallel
  #1
Member
 
Hannes Kröger
Join Date: Mar 2009
Location: Rostock, Germany
Posts: 94
Rep Power: 9
hannes is on a distinguished road
Hello everyone,

I just tried to run a case using interDyMFoam in parallel. The case consists of a non-moving outer mesh and an inner cylindrical mesh that is rotating (with a surface-piercing propeller in it). I use GGI to connect both meshes. The inner mesh is polyhedral and the outer one hexahedral.
The entire case consists of approx. 1 million cells (most of them in the inner mesh)

I have run this case in parallel on a different number of processors on a SMP machine with 8 quad opteron processors (decompositionMethod metis):

#Proc; time per timestep; speedup
1; 360s; 1
4; 155s; 2.3
8; 146s; 2.4
16; 130s; 2.7

So the speedup doesn't even reach 3. A similar case where the whole domain is rotating and the mesh consists only of polyhedra shows a linear speedup up to 8 processors and a decreasing parallel efficieny beyond that.

I wonder if this has to do with the GGI interface? I tried to stitch it and repeat the test but unfortunately stitchMesh failed. Does anyone have an idea how to improve the parallel efficiency?

Best regards, Hannes

PS: Despite the missing parallel efficiency, the case seems to run fine. Typical output:

Courant Number mean: 0.0005296648 max: 29.60281 velocity magnitude: 56.38879
GGI pair (slider, inside_slider) : 1.694001 1.692101 Diff = 1.71989e-05 or 0.001015283 %
Time = 0.004368
Execution time for mesh.update() = 5.04 s
Evaluation of GGI weighting factors:
Largest slave weighting factor correction : 0.05648662 average: 0.001119383
Largest master weighting factor correction: 0.02077904 average: 0.0006102291
--> FOAM Warning :
From function min(const UList<Type>&)
in file lnInclude/FieldFunctions.C at line 342
empty field, returning zero
time step continuity errors : sum local = 2.139033e-14, global = -1.99808e-16, cumulative = -1.372663e-05
PCG: Solving for pcorr, Initial residual = 1, Final residual = 0.000847008, No Iterations 17
PCG: Solving for pcorr, Initial residual = 0.08500426, Final residual = 0.0003398914, No Iterations 4
time step continuity errors : sum local = 2.889153e-17, global = -1.458812e-19, cumulative = -1.372663e-05
MULES: Solving for gamma
Liquid phase volume fraction = 0.6001454 Min(gamma) = 0 Max(gamma) = 1
MULES: Solving for gamma
Liquid phase volume fraction = 0.6001454 Min(gamma) = 0 Max(gamma) = 1
MULES: Solving for gamma
Liquid phase volume fraction = 0.6001454 Min(gamma) = 0 Max(gamma) = 1
MULES: Solving for gamma
Liquid phase volume fraction = 0.6001454 Min(gamma) = 0 Max(gamma) = 1
MULES: Solving for gamma
Liquid phase volume fraction = 0.6001454 Min(gamma) = 0 Max(gamma) = 1
MULES: Solving for gamma
Liquid phase volume fraction = 0.6001454 Min(gamma) = 0 Max(gamma) = 1
MULES: Solving for gamma
Liquid phase volume fraction = 0.6001454 Min(gamma) = 0 Max(gamma) = 1
MULES: Solving for gamma
Liquid phase volume fraction = 0.6001454 Min(gamma) = 0 Max(gamma) = 1
smoothSolver: Solving for Ux, Initial residual = 7.74393e-06, Final residual = 1.011977e-07, No Iterations 1
smoothSolver: Solving for Uy, Initial residual = 3.776374e-06, Final residual = 4.505313e-08, No Iterations 1
smoothSolver: Solving for Uz, Initial residual = 1.477087e-05, Final residual = 1.429332e-07, No Iterations 1
GAMG: Solving for pd, Initial residual = 4.955159e-05, Final residual = 9.252356e-07, No Iterations 2
GAMG: Solving for pd, Initial residual = 6.533451e-06, Final residual = 2.441995e-07, No Iterations 2
time step continuity errors : sum local = 1.379978e-12, global = 6.242956e-15, cumulative = -1.372663e-05
GAMG: Solving for pd, Initial residual = 3.37058e-06, Final residual = 1.664205e-07, No Iterations 2
PCG: Solving for pd, Initial residual = 1.65434e-06, Final residual = 4.308411e-09, No Iterations 5
time step continuity errors : sum local = 2.435524e-14, global = 7.917505e-17, cumulative = -1.372663e-05
ExecutionTime = 81834.14 s ClockTime = 81851 s
hannes is offline   Reply With Quote

Old   May 27, 2009, 08:23
Default Solved
  #2
Member
 
Hannes Kröger
Join Date: Mar 2009
Location: Rostock, Germany
Posts: 94
Rep Power: 9
hannes is on a distinguished road
I just updated to SVN revision 1266 because of the performance updates for GGI.
It helped. The above case now scales linearly up to 8 processors. Thanks for this, Hrv.

Best regards, Hannes
hannes is offline   Reply With Quote

Old   May 31, 2009, 06:37
Default
  #3
Member
 
Hai Yu
Join Date: Mar 2009
Location: OvgU, Magdeburg
Posts: 65
Rep Power: 8
yuhai is on a distinguished road
my experience also shows that 8 cores gives the best speed.
while it slows down for 16 cores.
I wonder it is because for me, each computer has 8 cores, and the communication between two computers is much inefficient.
yuhai is offline   Reply With Quote

Old   September 9, 2009, 10:28
Default
  #4
Senior Member
 
BastiL
Join Date: Mar 2009
Posts: 471
Rep Power: 11
bastil is on a distinguished road
I have consicered similar problems with the ggi performance in paralllel and I still get no speedup for more than 8 cores. THis is extremely bad since I have large cases typically running on 32 cores with lots of interfaces in. Running them on 8 cores is dam slow for me. Hrv: Are there plans to further improve parallel performance of ggi?

Regards

BastiL
bastil is offline   Reply With Quote

Old   October 28, 2009, 16:05
Default GGI in parallel
  #5
New Member
 
Dnyanesh Digraskar
Join Date: Mar 2009
Location: Amherst, MA, United States
Posts: 10
Rep Power: 8
ddigrask is on a distinguished road
Hello All,

I am running turbDyMFoam with GGI on a full wind turbine. So, the mesh is huge (~ 4 million cells). I am having problems with running in parallel. I am running on 32 processors. It runs very slowly and eventually one of the processes dies.

The same job runs perfectly fine in serial, but I think this case would take a long time to finish in serial.

I will be very much thankful to you if you shed some light on improving the ggi parallel performance.

Thank you

--
Dnyanesh
ddigrask is offline   Reply With Quote

Old   October 31, 2009, 14:27
Default
  #6
Senior Member
 
Martin Beaudoin
Join Date: Mar 2009
Posts: 330
Rep Power: 13
mbeaudoin will become famous soon enough
Hello Dnyanesh,

I need a bit more information in order to try to help you out.

  1. Which svn version of 1.5-dev are you running? Please make sure you are running with the latest svn release in order to get the best GGI implementation available.
  2. Your problem might be hardware related. Can you provide some speed up numbers you did achieve with your hardware while running OpenFOAM simulations with some none-GGI test cases?
  3. Could you provide a bit more information about your hardware setup? Mostly about the interconnect between the computing nodes, and the memory available on the nodes?
  4. Are you using something like VMware machines for your simulation? I have seen that one before...
  5. I would like to see the following piece of information from your case:
    • file constant/polyMesh/boundary
    • file system/controlDict
    • file system/decomposeParDict
    • a log file of your failed parallel run
    • the exact command you are using to start your failed parallel run
    • Are you using MPI? if so, which flavor, and which version?
Please take note that this is the kind of information that you can provide up-front with your question and that can really help other people to help you out quickly.

Regards,

Martin

Quote:
Originally Posted by ddigrask View Post
Hello All,

I am running turbDyMFoam with GGI on a full wind turbine. So, the mesh is huge (~ 4 million cells). I am having problems with running in parallel. I am running on 32 processors. It runs very slowly and eventually one of the processes dies.

The same job runs perfectly fine in serial, but I think this case would take a long time to finish in serial.

I will be very much thankful to you if you shed some light on improving the ggi parallel performance.

Thank you

--
Dnyanesh
mbeaudoin is offline   Reply With Quote

Old   October 31, 2009, 15:56
Default
  #7
New Member
 
Dnyanesh Digraskar
Join Date: Mar 2009
Location: Amherst, MA, United States
Posts: 10
Rep Power: 8
ddigrask is on a distinguished road
Hello Mr. Beaudoin,

Thank you for your reply. I realize I should have given all the details before it self. I am sorry for that. Wont happen in future.

1. I was earlier using the latest SVN version. But later, since I was facing the parallel problems, I read more and thought I should follow the ERCOFTAC page. So I reverted back to version 1238. I will upgrade to latest one now.

2. I am quite sure that it is not a hardware related issue. I am running my cases on our college supercomputer cluster. I had successfully run MRFSimpleFoam on 32 cores for quite a long time. I used to get linear speed up till 32 cores.
Some numbers from my case:
150K cells / processor - 12 Proc - 8 sec / time step
110K cells / processor - 24 Proc - 6 sec / time step
about 75K cells / processor - 32 Proc - 2.5 sec / time step

after that it would be almost stable, and later start to increase.

3. The cluster is a 72 node cluster with 8 processors per node. Each node has 4 GB RAM. The interconnect between the nodes is GigaByte Ethernet.

4. I am not using a VMWare machine.

5. The required files are copied below. Answers to some more questions:
a. MPI version is mpich2-1.0.8
b. command used : mpiexec-pbs turbDyMFoam -parallel > outfile
c. boundary file:
FoamFile
{
version 2.0;
format ascii;
class polyBoundaryMesh;
location "constant/polyMesh";
object boundary;
}
// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //

11
(
outerSliderOutlet
{
type ggi;
nFaces 25814;
startFace 5565326;
shadowPatch innerSliderOutlet;
zone outerSliderOutlet_zone;
bridgeOverlap false;
}
outerSliderWall
{
type ggi;
nFaces 43870;
startFace 5591140;
shadowPatch innerSliderWall;
zone outerSliderWall_zone;
bridgeOverlap false;
}
outerSliderInlet
{
type ggi;
nFaces 25814;
startFace 5635010;
shadowPatch innerSliderInlet;
zone outerSliderInlet_zone;
bridgeOverlap false;
}
innerSliderOutlet
{
type ggi;
nFaces 18596;
startFace 5660824;
shadowPatch outerSliderOutlet;
zone innerSliderOutlet_zone;
bridgeOverlap false;
}
innerSliderInlet
{
type ggi;
nFaces 1148;
startFace 5679420;
shadowPatch outerSliderInlet;
zone innerSliderInlet_zone;
bridgeOverlap false;
}
innerSliderWall
{
type ggi;
nFaces 5424;
startFace 5680568;
shadowPatch outerSliderWall;
zone innerSliderWall_zone;
bridgeOverlap false;
}
tower_plate
{
type wall;
nFaces 13024;
startFace 5685992;
}
rotor
{
type wall;
nFaces 19180;
startFace 5699016;
}
outlet
{
type patch;
nFaces 2052;
startFace 5718196;
}
outer_wall
{
type wall;
nFaces 10390;
startFace 5720248;
}
inlet
{
type patch;
nFaces 2052;
startFace 5730638;
}
)

d. CONTROLDICT:

FoamFile
{
version 2.0;
format ascii;
class dictionary;
object controlDict;
}

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //

applicationClass icoTopoFoam;

startFrom startTime;

startTime 0.108562;

stopAt endTime;

endTime 5;

deltaT 0.05;

writeControl timeStep;

writeInterval 20;

cycleWrite 0;

writeFormat ascii;

writePrecision 6;

writeCompression uncompressed;

timeFormat general;

timePrecision 6;

runTimeModifiable yes;

adjustTimeStep yes;
maxCo 1;

maxDeltaT 1.0;

functions
(
ggiCheck
{
// Type of functionObject
type ggiCheck;

phi phi;

// Where to load it from (if not already in solver)
functionObjectLibs ("libsampling.so");
}
);

e. decomposeParDict:

FoamFile
{
version 2.0;
format ascii;

root "";
case "";
instance "";
local "";

class dictionary;
object decomposeParDict;
}

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //


numberOfSubdomains 8;

// Patches or Face Zones which need to face both cells on the same CPU
//preservePatches (innerSliderInlet outerSliderInlet innerSliderWall outerSliderWall outerSliderOutlet innerSliderOutlet);
//preserveFaceZones (innerSliderInlet_zone outerSliderInlet_zone innerSliderWall_zone outerSliderWall_zone outerSliderOutlet_zone innerSliderOutlet_zone);


// Face zones which need to be present on all CPUs in its entirety
globalFaceZones (innerSliderInlet_zone outerSliderInlet_zone innerSliderWall_zone outerSliderWall_zone outerSliderOutlet_zone innerSliderOutlet_zone);

method metis;


simpleCoeffs
{
n (4 2 1);
delta 0.001;
}

hierarchicalCoeffs
{
n (1 1 1);
delta 0.001;
order xyz;
}
metisCoeffs
{
processorWeights
(
1
1
1
1
1
1
1
1
);
}
manualCoeffs
{
dataFile "cellDecomposition";
}

distributed no;

roots
(
);

f. dynamicMeshDict:

dynamicFvMeshLib "libtopoChangerFvMesh.so";
dynamicFvMesh mixerGgiFvMesh;

mixerGgiFvMeshCoeffs
{
coordinateSystem
{
type cylindrical;
origin (0 0 0);
axis (0 0 1);
direction (0 1 0);
}

rpm -72;

slider
{
moving ( innerSliderInlet innerSliderWall innerSliderOutlet );
static ( outerSliderInlet outerSliderWall outerSliderOutlet );
}
}

NOTE: I had a doubt. Since my rotating zone has the finest mesh. So, the GGI faces have the maximum number of cells. When I use globalFaceZones in decomposeParDict, does it copy all the ggi faces on all processors? If that is the case, then it would run really slow, because, it will take time to interpolate between 100K cells and communicate data. Please forgive me if what I am thinking is wrong.

Thank you very much for your help. I am grateful.
Sincerely,

--
Dnyanesh Digraskar

Last edited by ddigrask; October 31, 2009 at 19:30.
ddigrask is offline   Reply With Quote

Old   October 31, 2009, 23:13
Default
  #8
Senior Member
 
Martin Beaudoin
Join Date: Mar 2009
Posts: 330
Rep Power: 13
mbeaudoin will become famous soon enough
Hello Dnyanesh,

Thank you for the information, this is much more helpful.

From the information present in your boundary file, I can see that your GGI interfaces are indeed composed of large sets of facets.

With the actual implementation of the GGI, this will have an impact because the GGI faceZones are shared on all the processors and communication will take it's toll.
Also, one internal algorithm of the GGI is a bit slow when using very large numbers of facets for the GGI patches (my bad here, but I am working on it...).

But not to the point of a simulation to crash and burn like you are describing.

So another important piece of information I need is your simulation log file; not the PBS log file, but the log messages generated by turbDyMFoam when running your 32 processors parallel run.

This file is probably quite large for posting on the Forum, so I would like to see at least the very early log messages, from line #1 (the turbDyMFoam splash header) down to let's say the 10th simulation time step.

I also need to see the log for the last 10 time steps, just before your application crashed.

As a side note: As I mentioned, I am currently working on some improvements to the GGI in order speed up the code when using GGI patches with a large number of facets (100K and +).

My research group needs to run large GGI cases like that, so this is a priority for me to get this nailed down asap. We will contribute our modifications to Hrv's dev version, so you will have access to the improvements as well.

Regards,

Martin


Quote:
Originally Posted by ddigrask View Post
Hello Mr. Beaudoin,

Thank you for your reply. I realize I should have given all the details before it self. I am sorry for that. Wont happen in future.

1. I was earlier using the latest SVN version. But later, since I was facing the parallel problems, I read more and thought I should follow the ERCOFTAC page. So I reverted back to version 1238. I will upgrade to latest one now.

2. I am quite sure that it is not a hardware related issue. I am running my cases on our college supercomputer cluster. I had successfully run MRFSimpleFoam on 32 cores for quite a long time. I used to get linear speed up till 32 cores.
Some numbers from my case:
150K cells / processor - 12 Proc - 8 sec / time step
110K cells / processor - 24 Proc - 6 sec / time step
about 75K cells / processor - 32 Proc - 2.5 sec / time step

after that it would be almost stable, and later start to increase.

3. The cluster is a 72 node cluster with 8 processors per node. Each node has 4 GB RAM. The interconnect between the nodes is GigaByte Ethernet.

4. I am not using a VMWare machine.

5. The required files are copied below. Answers to some more questions:
a. MPI version is mpich2-1.0.8
b. command used : mpiexec-pbs turbDyMFoam -parallel > outfile
c. boundary file:
FoamFile
{
version 2.0;
format ascii;
class polyBoundaryMesh;
location "constant/polyMesh";
object boundary;
}
// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //

11
(
outerSliderOutlet
{
type ggi;
nFaces 25814;
startFace 5565326;
shadowPatch innerSliderOutlet;
zone outerSliderOutlet_zone;
bridgeOverlap false;
}
outerSliderWall
{
type ggi;
nFaces 43870;
startFace 5591140;
shadowPatch innerSliderWall;
zone outerSliderWall_zone;
bridgeOverlap false;
}
outerSliderInlet
{
type ggi;
nFaces 25814;
startFace 5635010;
shadowPatch innerSliderInlet;
zone outerSliderInlet_zone;
bridgeOverlap false;
}
innerSliderOutlet
{
type ggi;
nFaces 18596;
startFace 5660824;
shadowPatch outerSliderOutlet;
zone innerSliderOutlet_zone;
bridgeOverlap false;
}
innerSliderInlet
{
type ggi;
nFaces 1148;
startFace 5679420;
shadowPatch outerSliderInlet;
zone innerSliderInlet_zone;
bridgeOverlap false;
}
innerSliderWall
{
type ggi;
nFaces 5424;
startFace 5680568;
shadowPatch outerSliderWall;
zone innerSliderWall_zone;
bridgeOverlap false;
}
tower_plate
{
type wall;
nFaces 13024;
startFace 5685992;
}
rotor
{
type wall;
nFaces 19180;
startFace 5699016;
}
outlet
{
type patch;
nFaces 2052;
startFace 5718196;
}
outer_wall
{
type wall;
nFaces 10390;
startFace 5720248;
}
inlet
{
type patch;
nFaces 2052;
startFace 5730638;
}
)

d. CONTROLDICT:

FoamFile
{
version 2.0;
format ascii;
class dictionary;
object controlDict;
}

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //

applicationClass icoTopoFoam;

startFrom startTime;

startTime 0.108562;

stopAt endTime;

endTime 5;

deltaT 0.05;

writeControl timeStep;

writeInterval 20;

cycleWrite 0;

writeFormat ascii;

writePrecision 6;

writeCompression uncompressed;

timeFormat general;

timePrecision 6;

runTimeModifiable yes;

adjustTimeStep yes;
maxCo 1;

maxDeltaT 1.0;

functions
(
ggiCheck
{
// Type of functionObject
type ggiCheck;

phi phi;

// Where to load it from (if not already in solver)
functionObjectLibs ("libsampling.so");
}
);

e. decomposeParDict:

FoamFile
{
version 2.0;
format ascii;

root "";
case "";
instance "";
local "";

class dictionary;
object decomposeParDict;
}

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //


numberOfSubdomains 8;

// Patches or Face Zones which need to face both cells on the same CPU
//preservePatches (innerSliderInlet outerSliderInlet innerSliderWall outerSliderWall outerSliderOutlet innerSliderOutlet);
//preserveFaceZones (innerSliderInlet_zone outerSliderInlet_zone innerSliderWall_zone outerSliderWall_zone outerSliderOutlet_zone innerSliderOutlet_zone);


// Face zones which need to be present on all CPUs in its entirety
globalFaceZones (innerSliderInlet_zone outerSliderInlet_zone innerSliderWall_zone outerSliderWall_zone outerSliderOutlet_zone innerSliderOutlet_zone);

method metis;


simpleCoeffs
{
n (4 2 1);
delta 0.001;
}

hierarchicalCoeffs
{
n (1 1 1);
delta 0.001;
order xyz;
}
metisCoeffs
{
processorWeights
(
1
1
1
1
1
1
1
1
);
}
manualCoeffs
{
dataFile "cellDecomposition";
}

distributed no;

roots
(
);

f. dynamicMeshDict:

dynamicFvMeshLib "libtopoChangerFvMesh.so";
dynamicFvMesh mixerGgiFvMesh;

mixerGgiFvMeshCoeffs
{
coordinateSystem
{
type cylindrical;
origin (0 0 0);
axis (0 0 1);
direction (0 1 0);
}

rpm -72;

slider
{
moving ( innerSliderInlet innerSliderWall innerSliderOutlet );
static ( outerSliderInlet outerSliderWall outerSliderOutlet );
}
}

NOTE: I had a doubt. Since my rotating zone has the finest mesh. So, the GGI faces have the maximum number of cells. When I use globalFaceZones in decomposeParDict, does it copy all the ggi faces on all processors? If that is the case, then it would run really slow, because, it will take time to interpolate between 100K cells and communicate data. Please forgive me if what I am thinking is wrong.

Thank you very much for your help. I am grateful.
Sincerely,

--
Dnyanesh Digraskar
mbeaudoin is offline   Reply With Quote

Old   November 1, 2009, 15:42
Default
  #9
New Member
 
Dnyanesh Digraskar
Join Date: Mar 2009
Location: Amherst, MA, United States
Posts: 10
Rep Power: 8
ddigrask is on a distinguished road
Dear Mr. Beaudoin,

Sorry for a bit late reply. The turbDyMFoam output is attached below. The code doesnot crash because of solver settings, it just waits on some step during the calculation and finally dies giving MPI error.

After carefully looking at each time step output, I have observed that the maximum time consuming part of the solution is the GGI Interpolation step. That is where the solver takes about 2 - 3 minutes to post the output.

Following is the turbDyMFoam output:

/*---------------------------------------------------------------------------*\
| ========= | |
| \\ / F ield | OpenFOAM: The Open Source CFD Toolbox |
| \\ / O peration | Version: 1.5-dev |
| \\ / A nd | Revision: 1388 |
| \\/ M anipulation | Web: http://www.OpenFOAM.org |
\*---------------------------------------------------------------------------*/
Exec : turbDyMFoam -parallel
Date : Nov 01 2009
Time : 14:13:26
Host : node76
PID : 26246
Case : /home/ddigrask/OpenFOAM/ddigrask-1.5-dev/run/fall2009/ggi/turbineGgi_bigMesh
nProcs : 32
Slaves :
31
(
node76.26247
node76.26248
node76.26249
node76.26250
node76.26251
node76.26252
node76.26253
node23.22676
node23.22677
node23.22678
node23.22679
node23.22680
node23.22681
node23.22682
node23.22683
node42.22800
node42.22801
node42.22802
node42.22803
node42.22804
node42.22805
node42.22806
node42.22807
node31.31933
node31.31934
node31.31935
node31.31936
node31.31937
node31.31938
node31.31939
node31.31940
)
Pstream initialized with:
floatTransfer : 0
nProcsSimpleSum : 0
commsType : nonBlocking

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
Create time

Create dynamic mesh for time = 0

Selecting dynamicFvMesh mixerGgiFvMesh
void mixerGgiFvMesh::addZonesAndModifiers() : Zones and modifiers already present. Skipping.
Mixer mesh:
origin: (0 0 0)
axis : (0 0 1)
rpm : -72
Reading field p

Reading field U

Reading/calculating face flux field phi

Initializing the GGI interpolator between master/shadow patches: outerSliderOutlet/innerSliderOutlet
Evaluation of GGI weighting factors:
Largest slave weighting factor correction : 0.00102394 average: 1.66438e-06
Largest master weighting factor correction: 0.0922472 average: 0.000376558

--> FOAM Warning :
From function min(const UList<Type>&)
in file lnInclude/FieldFunctions.C at line 342
empty field, returning zero
--> FOAM Warning :
From function min(const UList<Type>&)
in file lnInclude/FieldFunctions.C at line 342
empty field, returning zero
Initializing the GGI interpolator between master/shadow patches: outerSliderWall/innerSliderWall
Evaluation of GGI weighting factors:
Largest slave weighting factor correction : 0.00104841 average: 0.000148099
Largest master weighting factor correction: 2.79095e-06 average: 4.34772e-09

--> FOAM Warning :
From function min(const UList<Type>&)
in file lnInclude/FieldFunctions.C at line 342
empty field, returning zero
--> FOAM Warning :
From function min(const UList<Type>&)
in file lnInclude/FieldFunctions.C at line 342
empty field, returning zero
Initializing the GGI interpolator between master/shadow patches: outerSliderInlet/innerSliderInlet
Evaluation of GGI weighting factors:
Largest slave weighting factor correction : 1.51821e-05 average: 3.4379e-07
Largest master weighting factor correction: 0.176367 average: 0.00114207

--> FOAM Warning :
From function min(const UList<Type>&)
in file lnInclude/FieldFunctions.C at line 342
empty field, returning zero
Selecting incompressible transport model Newtonian
Selecting RAS turbulence model SpalartAllmaras
Reading field rAU if present


Starting time loop

Courant Number mean: 0.00864601 max: 1.0153 velocity magnitude: 10
deltaT = 0.000492466
--> FOAM Warning :
From function dlLibraryTable:pen(const dictionary& dict, const word& libsEntry, const TablePtr tablePtr)
in file lnInclude/dlLibraryTableTemplates.C at line 68
library "libsampling.so" did not introduce any new entries

Creating ggi check
Time = 0.000492466

Initializing the GGI interpolator between master/shadow patches: outerSliderOutlet/innerSliderOutlet
Evaluation of GGI weighting factors:
Largest slave weighting factor correction : 0.000966753 average: 1.68023e-06
Largest master weighting factor correction: 0.0926134 average: 0.000376611

--> FOAM Warning :
From function min(const UList<Type>&)
in file lnInclude/FieldFunctions.C at line 342
empty field, returning zero
--> FOAM Warning :
From function min(const UList<Type>&)
in file lnInclude/FieldFunctions.C at line 342
empty field, returning zero
Initializing the GGI interpolator between master/shadow patches: outerSliderWall/innerSliderWall
Evaluation of GGI weighting factors:
Largest slave weighting factor correction : 0.00104722 average: 0.000148213
Largest master weighting factor correction: 2.90604e-06 average: 4.45196e-09

--> FOAM Warning :
From function min(const UList<Type>&)
in file lnInclude/FieldFunctions.C at line 342
empty field, returning zero
--> FOAM Warning :
From function min(const UList<Type>&)
in file lnInclude/FieldFunctions.C at line 342
empty field, returning zero
Initializing the GGI interpolator between master/shadow patches: outerSliderInlet/innerSliderInlet
Evaluation of GGI weighting factors:
Largest slave weighting factor correction : 1.56139e-05 average: 3.5192e-07
Largest master weighting factor correction: 0.179572 average: 0.0011419

--> FOAM Warning :
From function min(const UList<Type>&)
in file lnInclude/FieldFunctions.C at line 342
empty field, returning zero
PBiCG: Solving for Ux, Initial residual = 1, Final residual = 1.15144e-06, No Iterations 9
PBiCG: Solving for Uy, Initial residual = 1, Final residual = 1.69491e-06, No Iterations 9
PBiCG: Solving for Uz, Initial residual = 1, Final residual = 5.0914e-06, No Iterations 7
GAMG: Solving for p, Initial residual = 1, Final residual = 0.0214318, No Iterations 6
time step continuity errors : sum local = 1.50927e-07, global = -1.79063e-08, cumulative = -1.79063e-08
GAMG: Solving for p, Initial residual = 0.284648, Final residual = 0.00377362, No Iterations 2
time step continuity errors : sum local = 5.39845e-07, global = 3.49001e-09, cumulative = -1.44163e-08
(END)


It is not even one complete time step. This is all the code has run for. After this the code quits with MPI error.

mpiexec-pbs: Warning: tasks 0-29,31 died with signal 15 (Terminated).
mpiexec-pbs: Warning: task 30 died with signal 9 (Killed).

Thank you again for your help.

I am also trying to run the same case with just 2 ggi faces (instead of 6), i.e. ggiInside and ggiOutside. But even this does not help from making it run faster.

Sincerely,

--
Dnyanesh Digraskar

Last edited by ddigrask; November 1, 2009 at 16:42.
ddigrask is offline   Reply With Quote

Old   November 1, 2009, 17:57
Default
  #10
Senior Member
 
Martin Beaudoin
Join Date: Mar 2009
Posts: 330
Rep Power: 13
mbeaudoin will become famous soon enough
Hello,

Some comments:

1: It would be useful to see a stack trace in your log file when your run aborts. Could you set the environment variable FOAM_ABORT=1 and make sure every parallel task got this variable activated as well? That way, we could see where the parallel tasks are crashing through the stack trace in the log file.

2: You said your cluster has 72 nodes, 8 processors per node and each node has 4 GB RAM.

3: From your log file, we can see that you have 8 parallel tasks running on each node. Overall, your parallel run is using only 4 nodes on your cluster (node76, node23, node42 and node31).

4: So basically, for a ~4 million cells mesh, you are using only 4 computers, each with only 4 GB of RAM, and 8 tasks per node fighting simultaneously for access to this amount of RAM.

Am I right?

If so, because of your large mesh, your 4 nodes probably don't have enough memory available, and could be swapping for virtual memory on the hard-drive, which is quite slow.

And depending on your memory bus architecture, your 8 tasks will have to compete for access to the memory bus, which will slow you down as well.

Did you meant 4 GB RAM per processor instead, which would give you 32 GB RAM per node or computer?

Could you just double-check that your cluster information is accurate?

Martin



Quote:
Originally Posted by ddigrask View Post
Dear Mr. Beaudoin,

Sorry for a bit late reply. The turbDyMFoam output is attached below. The code doesnot crash because of solver settings, it just waits on some step during the calculation and finally dies giving MPI error.

After carefully looking at each time step output, I have observed that the maximum time consuming part of the solution is the GGI Interpolation step. That is where the solver takes about 2 - 3 minutes to post the output.

Following is the turbDyMFoam output:

/*---------------------------------------------------------------------------*\
| ========= | |
| \\ / F ield | OpenFOAM: The Open Source CFD Toolbox |
| \\ / O peration | Version: 1.5-dev |
| \\ / A nd | Revision: 1388 |
| \\/ M anipulation | Web: http://www.OpenFOAM.org |
\*---------------------------------------------------------------------------*/
Exec : turbDyMFoam -parallel
Date : Nov 01 2009
Time : 14:13:26
Host : node76
PID : 26246
Case : /home/ddigrask/OpenFOAM/ddigrask-1.5-dev/run/fall2009/ggi/turbineGgi_bigMesh
nProcs : 32
Slaves :
31
(
node76.26247
node76.26248
node76.26249
node76.26250
node76.26251
node76.26252
node76.26253
node23.22676
node23.22677
node23.22678
node23.22679
node23.22680
node23.22681
node23.22682
node23.22683
node42.22800
node42.22801
node42.22802
node42.22803
node42.22804
node42.22805
node42.22806
node42.22807
node31.31933
node31.31934
node31.31935
node31.31936
node31.31937
node31.31938
node31.31939
node31.31940
)
Pstream initialized with:
floatTransfer : 0
nProcsSimpleSum : 0
commsType : nonBlocking

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
Create time

Create dynamic mesh for time = 0

Selecting dynamicFvMesh mixerGgiFvMesh
void mixerGgiFvMesh::addZonesAndModifiers() : Zones and modifiers already present. Skipping.
Mixer mesh:
origin: (0 0 0)
axis : (0 0 1)
rpm : -72
Reading field p

Reading field U

Reading/calculating face flux field phi

Initializing the GGI interpolator between master/shadow patches: outerSliderOutlet/innerSliderOutlet
Evaluation of GGI weighting factors:
Largest slave weighting factor correction : 0.00102394 average: 1.66438e-06
Largest master weighting factor correction: 0.0922472 average: 0.000376558

--> FOAM Warning :
From function min(const UList<Type>&)
in file lnInclude/FieldFunctions.C at line 342
empty field, returning zero
--> FOAM Warning :
From function min(const UList<Type>&)
in file lnInclude/FieldFunctions.C at line 342
empty field, returning zero
Initializing the GGI interpolator between master/shadow patches: outerSliderWall/innerSliderWall
Evaluation of GGI weighting factors:
Largest slave weighting factor correction : 0.00104841 average: 0.000148099
Largest master weighting factor correction: 2.79095e-06 average: 4.34772e-09

--> FOAM Warning :
From function min(const UList<Type>&)
in file lnInclude/FieldFunctions.C at line 342
empty field, returning zero
--> FOAM Warning :
From function min(const UList<Type>&)
in file lnInclude/FieldFunctions.C at line 342
empty field, returning zero
Initializing the GGI interpolator between master/shadow patches: outerSliderInlet/innerSliderInlet
Evaluation of GGI weighting factors:
Largest slave weighting factor correction : 1.51821e-05 average: 3.4379e-07
Largest master weighting factor correction: 0.176367 average: 0.00114207

--> FOAM Warning :
From function min(const UList<Type>&)
in file lnInclude/FieldFunctions.C at line 342
empty field, returning zero
Selecting incompressible transport model Newtonian
Selecting RAS turbulence model SpalartAllmaras
Reading field rAU if present


Starting time loop

Courant Number mean: 0.00864601 max: 1.0153 velocity magnitude: 10
deltaT = 0.000492466
--> FOAM Warning :
From function dlLibraryTable:pen(const dictionary& dict, const word& libsEntry, const TablePtr tablePtr)
in file lnInclude/dlLibraryTableTemplates.C at line 68
library "libsampling.so" did not introduce any new entries

Creating ggi check
Time = 0.000492466

Initializing the GGI interpolator between master/shadow patches: outerSliderOutlet/innerSliderOutlet
Evaluation of GGI weighting factors:
Largest slave weighting factor correction : 0.000966753 average: 1.68023e-06
Largest master weighting factor correction: 0.0926134 average: 0.000376611

--> FOAM Warning :
From function min(const UList<Type>&)
in file lnInclude/FieldFunctions.C at line 342
empty field, returning zero
--> FOAM Warning :
From function min(const UList<Type>&)
in file lnInclude/FieldFunctions.C at line 342
empty field, returning zero
Initializing the GGI interpolator between master/shadow patches: outerSliderWall/innerSliderWall
Evaluation of GGI weighting factors:
Largest slave weighting factor correction : 0.00104722 average: 0.000148213
Largest master weighting factor correction: 2.90604e-06 average: 4.45196e-09

--> FOAM Warning :
From function min(const UList<Type>&)
in file lnInclude/FieldFunctions.C at line 342
empty field, returning zero
--> FOAM Warning :
From function min(const UList<Type>&)
in file lnInclude/FieldFunctions.C at line 342
empty field, returning zero
Initializing the GGI interpolator between master/shadow patches: outerSliderInlet/innerSliderInlet
Evaluation of GGI weighting factors:
Largest slave weighting factor correction : 1.56139e-05 average: 3.5192e-07
Largest master weighting factor correction: 0.179572 average: 0.0011419

--> FOAM Warning :
From function min(const UList<Type>&)
in file lnInclude/FieldFunctions.C at line 342
empty field, returning zero
PBiCG: Solving for Ux, Initial residual = 1, Final residual = 1.15144e-06, No Iterations 9
PBiCG: Solving for Uy, Initial residual = 1, Final residual = 1.69491e-06, No Iterations 9
PBiCG: Solving for Uz, Initial residual = 1, Final residual = 5.0914e-06, No Iterations 7
GAMG: Solving for p, Initial residual = 1, Final residual = 0.0214318, No Iterations 6
time step continuity errors : sum local = 1.50927e-07, global = -1.79063e-08, cumulative = -1.79063e-08
GAMG: Solving for p, Initial residual = 0.284648, Final residual = 0.00377362, No Iterations 2
time step continuity errors : sum local = 5.39845e-07, global = 3.49001e-09, cumulative = -1.44163e-08
(END)


It is not even one complete time step. This is all the code has run for. After this the code quits with MPI error.

mpiexec-pbs: Warning: tasks 0-29,31 died with signal 15 (Terminated).
mpiexec-pbs: Warning: task 30 died with signal 9 (Killed).

Thank you again for your help.

I am also trying to run the same case with just 2 ggi faces (instead of 6), i.e. ggiInside and ggiOutside. But even this does not help from making it run faster.

Sincerely,

--
Dnyanesh Digraskar
mbeaudoin is offline   Reply With Quote

Old   November 1, 2009, 20:59
Default
  #11
New Member
 
Dnyanesh Digraskar
Join Date: Mar 2009
Location: Amherst, MA, United States
Posts: 10
Rep Power: 8
ddigrask is on a distinguished road
Hello Mr. Beaudoin,

Thank you for your reply. I was a little bit confused between cores and processors. The cluster is

72 nodes --- 8 cores per node --- 4 GB RAM per node.

My information about memory per node (computer) is correct.

I had also tried running the same job on 32 nodes with one process per node. That takes more time than this.

I will post the stack trace log soon.
I will also try running the case on more cores (maybe 48 or 56) in order to avoid the memory bottleneck.

Thank you again for help.

Sincerely,

--
Dnyanesh Digraskar
ddigrask is offline   Reply With Quote

Old   November 2, 2009, 17:22
Default
  #12
Senior Member
 
BastiL
Join Date: Mar 2009
Posts: 471
Rep Power: 11
bastil is on a distinguished road
Martin,

this sounds great to me since I have similar problems with large models with many ggi-pairs. I am really looking forward to this.

Regards BastiL
bastil is offline   Reply With Quote

Old   November 2, 2009, 18:23
Default
  #13
New Member
 
Dnyanesh Digraskar
Join Date: Mar 2009
Location: Amherst, MA, United States
Posts: 10
Rep Power: 8
ddigrask is on a distinguished road
Hello Mr. Beaudoin,

Some updates to my previous post. After enabling FOAM_ABORT=1, I get a detailed MPI error message than earlier one:

Create time

Create dynamic mesh for time = 0

Selecting dynamicFvMesh mixerGgiFvMesh
void mixerGgiFvMesh::addZonesAndModifiers() : Zones and modifiers already present. Skipping.
Mixer mesh:
origin: (0 0 0)
axis : (0 0 1)
rpm : -72
Reading field p

Reading field U

Reading/calculating face flux field phi

Initializing the GGI interpolator between master/shadow patches: outerSliderOutlet/innerSliderOutlet
Evaluation of GGI weighting factors:
Largest slave weighting factor correction : 0.00102394 average: 1.66438e-06
Largest master weighting factor correction: 0.0922472 average: 0.000376558

--> FOAM Warning :
From function min(const UList<Type>&)
in file lnInclude/FieldFunctions.C at line 342
empty field, returning zero
--> FOAM Warning :
From function min(const UList<Type>&)
in file lnInclude/FieldFunctions.C at line 342
empty field, returning zero
Initializing the GGI interpolator between master/shadow patches: outerSliderWall/innerSliderWall
Evaluation of GGI weighting factors:
Largest slave weighting factor correction : 0.00104841 average: 0.000148099
Largest master weighting factor correction: 2.79095e-06 average: 4.34772e-09

--> FOAM Warning :
From function min(const UList<Type>&)
in file lnInclude/FieldFunctions.C at line 342
empty field, returning zero
--> FOAM Warning :
From function min(const UList<Type>&)
in file lnInclude/FieldFunctions.C at line 342
empty field, returning zero
Initializing the GGI interpolator between master/shadow patches: outerSliderInlet/innerSliderInlet
Evaluation of GGI weighting factors:
Largest slave weighting factor correction : 1.51821e-05 average: 3.4379e-07
Largest master weighting factor correction: 0.176367 average: 0.00114207

--> FOAM Warning :
From function min(const UList<Type>&)
in file lnInclude/FieldFunctions.C at line 342
empty field, returning zero
Selecting incompressible transport model Newtonian
Selecting RAS turbulence model SpalartAllmaras
Reading field rAU if present
Working directory is /home/ddigrask/OpenFOAM/ddigrask-1.5-dev/run/fall2009/ggi/turbineGgi_bigMesh
--> FOAM Warning :
From function min(const UList<Type>&)
in file lnInclude/FieldFunctions.C at line 342
empty field, returning zero
Selecting incompressible transport model Newtonian
Selecting RAS turbulence model SpalartAllmaras
Reading field rAU if present


Starting time loop

Courant Number mean: 0.00865431 max: 1.0153 velocity magnitude: 10
deltaT = 0.000492466
--> FOAM Warning :
From function dlLibraryTable:pen(const dictionary& dict, const word& libsEntry, const TablePtr tablePtr)
in file lnInclude/dlLibraryTableTemplates.C at line 68
library "libsampling.so" did not introduce any new entries

Creating ggi check
Time = 0.000492466

Initializing the GGI interpolator between master/shadow patches: outerSliderOutlet/innerSliderOutlet
Evaluation of GGI weighting factors:
Largest slave weighting factor correction : 0.000966753 average: 1.68023e-06
Largest master weighting factor correction: 0.0926134 average: 0.000376611

--> FOAM Warning :
From function min(const UList<Type>&)
in file lnInclude/FieldFunctions.C at line 342
empty field, returning zero
--> FOAM Warning :
From function min(const UList<Type>&)
in file lnInclude/FieldFunctions.C at line 342
empty field, returning zero
Initializing the GGI interpolator between master/shadow patches: outerSliderWall/innerSliderWall
Evaluation of GGI weighting factors:
Largest slave weighting factor correction : 0.00104722 average: 0.000148213
Largest master weighting factor correction: 2.90604e-06 average: 4.45196e-09

--> FOAM Warning :
From function min(const UList<Type>&)
in file lnInclude/FieldFunctions.C at line 342
empty field, returning zero
--> FOAM Warning :
From function min(const UList<Type>&)
in file lnInclude/FieldFunctions.C at line 342
empty field, returning zero
Initializing the GGI interpolator between master/shadow patches: outerSliderInlet/innerSliderInlet
Evaluation of GGI weighting factors:
Largest slave weighting factor correction : 1.56139e-05 average: 3.5192e-07
Largest master weighting factor correction: 0.179572 average: 0.0011419

Fatal error in MPI_Send: Other MPI error, error stack:
MPI_Send(173).............................: MPI_Send(buf=0x7f4487e25010, count=1052888, MPI_PACKED, dest=0, tag=1, MPI_COMM_WORLD) failed
MPIDI_CH3i_Progress_wait(215).............: an error occurred while handling an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(420):
MPIDU_Socki_handle_read(637)..............: connection failure (set=0,sock=1,errno=104:Connection reset by peer)[cli_8]: aborting job:
Fatal error in MPI_Send: Other MPI error, error stack:
MPI_Send(173).............................: MPI_Send(buf=0x7f4487e25010, count=1052888, MPI_PACKED, dest=0, tag=1, MPI_COMM_WORLD) failed
MPIDI_CH3i_Progress_wait(215).............: an error occurred while handling an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(420):
MPIDU_Socki_handle_read(637)..............: connection failure (set=0,sock=1,errno=104:Connection reset by peer)


Again, this is not even one complete time step and the code quits after almost 20 mins.

2. I have also tried running the same case on 56 and 64 cores. (i.e 7 and 8 nodes respectively). There was no change in the output.

3. Mr. Oliver Petit suggested me to manually create movingCells cellzone. I will try that to see if it helps.

Thank you.

Sincerely.

--
Dnyanesh Digraskar
ddigrask is offline   Reply With Quote

Old   November 2, 2009, 20:31
Default
  #14
Senior Member
 
Martin Beaudoin
Join Date: Mar 2009
Posts: 330
Rep Power: 13
mbeaudoin will become famous soon enough
Hello,

Quote:
Originally Posted by ddigrask View Post

Fatal error in MPI_Send: Other MPI error, error stack:
MPI_Send(173).............................: MPI_Send(buf=0x7f4487e25010, count=1052888, MPI_PACKED, dest=0, tag=1, MPI_COMM_WORLD) failed
MPIDI_CH3i_Progress_wait(215).............: an error occurred while handling an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(420):
MPIDU_Socki_handle_read(637)..............: connection failure (set=0,sock=1,errno=104:Connection reset by peer)[cli_8]: aborting job:
Fatal error in MPI_Send: Other MPI error, error stack:
MPI_Send(173).............................: MPI_Send(buf=0x7f4487e25010, count=1052888, MPI_PACKED, dest=0, tag=1, MPI_COMM_WORLD) failed
MPIDI_CH3i_Progress_wait(215).............: an error occurred while handling an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(420):
MPIDU_Socki_handle_read(637)..............: connection failure (set=0,sock=1,errno=104:Connection reset by peer)
You mean this is the complete stack trace? Nothing more?

It only says that it crashed in a MPI operation. We don't know where. It can be in the GGI code, it can be in the solver, it can be anywhere MPI is being used. So unfortunately, this stack trace is useless.

I don't have enough information to help you much more.

Try logging on your compute nodes to see if you have enough memory while the parallel job runs. 20 mins gives you plenty of time to catch this.

Try checking if your nodes are not swapping on disk for virtual memory on disk.

I hope to be able to contribute some improvements to the GGI soon. I do not know if this will help you. Let's hope for the best.

Regards,

Martin
mbeaudoin is offline   Reply With Quote

Old   November 2, 2009, 20:34
Default
  #15
Senior Member
 
Martin Beaudoin
Join Date: Mar 2009
Posts: 330
Rep Power: 13
mbeaudoin will become famous soon enough
Hello BastiL,

Out of curiosity, how large is large?

I mean how many GGIs, and how many faces per GGI pairs?

Martin

Quote:
Originally Posted by bastil View Post
Martin,

this sounds great to me since I have similar problems with large models with many ggi-pairs. I am really looking forward to this.

Regards BastiL
mbeaudoin is offline   Reply With Quote

Old   November 3, 2009, 04:15
Default
  #16
Senior Member
 
BastiL
Join Date: Mar 2009
Posts: 471
Rep Power: 11
bastil is on a distinguished road
Quote:
Originally Posted by mbeaudoin View Post
Hello BastiL,

Out of curiosity, how large is large?

I mean how many GGIs, and how many faces per GGI pairs?

Martin
About 40 Mio. cells, about 15 ggi pairs with very different numbers of faces per ggi.
bastil is offline   Reply With Quote

Old   November 26, 2009, 05:45
Default
  #17
Senior Member
 
BastiL
Join Date: Mar 2009
Posts: 471
Rep Power: 11
bastil is on a distinguished road
Martin,

I am wondering how your work is going on? Is there some way I can support your work, e.g. by testing improvements with our models so please let me know. Thanks.

Regards BastiL
bastil is offline   Reply With Quote

Old   November 27, 2009, 10:54
Default
  #18
Senior Member
 
Martin Beaudoin
Join Date: Mar 2009
Posts: 330
Rep Power: 13
mbeaudoin will become famous soon enough
Hey BastiL,

I am actively working on that one.
Thanks for the offer. I will keep you posted.

Regards,

Martin

Quote:
Originally Posted by bastil View Post
Martin,

I am wondering how your work is going on? Is there some way I can support your work, e.g. by testing improvements with our models so please let me know. Thanks.

Regards BastiL
mbeaudoin is offline   Reply With Quote

Old   March 16, 2010, 04:17
Default
  #19
Senior Member
 
BastiL
Join Date: Mar 2009
Posts: 471
Rep Power: 11
bastil is on a distinguished road
Quote:
Originally Posted by mbeaudoin View Post
I am actively working on that one.
Thanks for the offer. I will keep you posted.
Martin,

I am wondering how work at the ggi-Implementation is proceeding? Thanks.

Regards Bastian
bastil is offline   Reply With Quote

Old   March 17, 2010, 09:43
Default
  #20
Senior Member
 
Hrvoje Jasak
Join Date: Mar 2009
Location: London, England
Posts: 1,758
Rep Power: 21
hjasak will become famous soon enough
Actually, I've got an update for you. There is a new layer of optimisation code built into the GGI interpolation, aimed at sorting out the loss of performance in parallel for a large number of CPUs. In short, each GGI will recognise whether it is located on a single CPU or not and, based on this, it will adjust the communications pattern in parallel.

This has shown good improvement on a bunch of cases I have tried but you need to be careful on the parallel decomposition you choose. There are two further optimisation steps we can do, but they are much more intrusive. I am delaying this until we start doing projects with real multi-stage compressors (lots of GGIs) and until we get the mixing plane code rocking (friends involved here).

Further updates are likely to follow, isn't that right Martin?

Hrv
__________________
Hrvoje Jasak
Providing commercial FOAM/OpenFOAM and CFD Consulting: http://wikki.co.uk
hjasak is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Superlinear speedup in OpenFOAM 13 msrinath80 OpenFOAM Running, Solving & CFD 18 March 3, 2015 06:36
Script to Run Parallel Jobs in Rocks Cluster asaha OpenFOAM Running, Solving & CFD 12 July 4, 2012 22:51
Serial vs parallel different results luca OpenFOAM Bugs 2 December 3, 2008 11:12
Parallel Performance of Fluent Soheyl FLUENT 2 October 30, 2005 07:11
PC vs. Workstation Tim Franke Main CFD Forum 5 September 29, 1999 15:01


All times are GMT -4. The time now is 05:10.