CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   OpenFOAM (https://www.cfd-online.com/Forums/openfoam/)
-   -   stop when I run in parallel (https://www.cfd-online.com/Forums/openfoam/75760-stop-when-i-run-parallel.html)

Nolwenn May 4, 2010 13:06

stop when I run in parallel
 
Hello everyone,

When I run a parallel case it stops (or sometimes it succeeds) without any error message. Its seems to be busy (all cpu at 100%) but there is no progress. It happens at the beginnig or later, a kind of random error.
I'm using OpenFoam1.6.x with Ubuntu 9.10 and gcc 4.4.1 as compiler.
I have no problem when I run a case with a single processor.

Has anyone an idea of what happen?

Here is a case which run and stop. I just modify the number of processors from the tutorial case.

Thank you for your help.

Nolwenn

OpenFOAM sourced
mecaflu@monarch01:~$ cd OpenFOAM/mecaflu-1.6.x/run/damBreak/
mecaflu@monarch01:~/OpenFOAM/mecaflu-1.6.x/run/damBreak$ mpirun -np 6 interFoam -parallel
/*---------------------------------------------------------------------------*\
| ========= | |
| \\ / F ield | OpenFOAM: The Open Source CFD Toolbox |
| \\ / O peration | Version: 1.6.x |
| \\ / A nd | Web: www.OpenFOAM.org |
| \\/ M anipulation | |
\*---------------------------------------------------------------------------*/
Build : 1.6.x-605bfc578b21
Exec : interFoam -parallel
Date : May 04 2010
Time : 18:46:25
Host : monarch01
PID : 23017
Case : /media/teradrive01/mecaflu-1.6.x/run/damBreak
nProcs : 6
Slaves :
5
(
monarch01.23018
monarch01.23019
monarch01.23020
monarch01.23021
monarch01.23022
)

Pstream initialized with:
floatTransfer : 0
nProcsSimpleSum : 0
commsType : nonBlocking
SigFpe : Enabling floating point exception trapping (FOAM_SIGFPE).

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
Create time

Create mesh for time = 0


Reading g
Reading field p

Reading field alpha1

Reading field U

Reading/calculating face flux field phi

Reading transportProperties

Selecting incompressible transport model Newtonian
Selecting incompressible transport model Newtonian
Selecting turbulence model type laminar
time step continuity errors : sum local = 0, global = 0, cumulative = 0
DICPCG: Solving for pcorr, Initial residual = 0, Final residual = 0, No Iterations 0
time step continuity errors : sum local = 0, global = 0, cumulative = 0
Courant Number mean: 0 max: 0

Starting time loop

jayrup May 6, 2010 00:19

Hi Nolwenn
I want to ask you, are you using the distributed parallelization or on silgle machine you are giving the : mpirun -np 6 ..... .
Regards
Jay

Nolwenn May 6, 2010 03:36

Hi Jay,

I am using a single machine with mpirun -np 6 interFoam -parallel. When I run with 2 processors it seems it runs more iterations than with 4 or more...

Regards

Nolwenn

wyldckat May 6, 2010 20:40

Greetings Nolwenn,

It could be a memory issue. OpenFOAM is known to crash and/or freeze Linux boxes when memory isn't enough. Check this post (or the whole thread it's on) for more on it: mpirun problems post # 3

Also, try using the parallelTest utility - information available on this post: OpenFOAM updates post #19
The parallelTest utility (it's part of OpenFOAM's test utilities) can aid you in sorting out the more basic MPI problems, like communication problems or missing environment settings or libraries not found, without running any particular solver functionalities.. For example: for some weird reason, there might me something missing in the mpirun command to allow the 6 cores to work properly together!

Best regards,
Bruno

Nolwenn May 7, 2010 04:33

Hello Bruno,

Thank you for your answer, I run parallel test and obtain this :


Code:

Executing: mpirun -np 6 /home/mecaflu/OpenFOAM/OpenFOAM-1.6.x/bin/foamExec parallelTest -parallel | tee log
[0]
Starting transfers
[0]
[0] master receiving from slave 1
[0] (0 1 2)
[0] master receiving from slave 2
[0] (0 1 2)
[0] master receiving from slave 3
[0] (0 1 2)
[0] master receiving from slave 4
[0] (0 1 2)
[0] master receiving from slave 5
[0] (0 1 2)
[0] master sending to slave 1
[0] master sending to slave 2
[0] master sending to slave 3
[0] master sending to slave 4
[0] master sending to slave 5
[1]
Starting transfers
[1]
[1] slave sending to master 0
[1] slave receiving from master 0
[1] (0 1 2)
[2]
Starting transfers
[2]
[2] slave sending to master 0
[2] slave receiving from master 0
[2] (0 1 2)
[3]
Starting transfers
[3]
[3] slave sending to master 0
[3] slave receiving from master 0
[3] (0 1 2)
[4]
Starting transfers
[4]
[4] slave sending to master 0
[4] slave receiving from master 0
[4] (0 1 2)
/*---------------------------------------------------------------------------*\
| =========                |                                                |
| \\      /  F ield        | OpenFOAM: The Open Source CFD Toolbox          |
|  \\    /  O peration    | Version:  1.6.x                                |
|  \\  /    A nd          | Web:      www.OpenFOAM.org                      |
|    \\/    M anipulation  |                                                |
\*---------------------------------------------------------------------------*/
Build  : 1.6.x-605bfc578b21
Exec  : parallelTest -parallel
Date  : May 07 2010
Time  : 10:09:41
Host  : monarch01
PID    : 4344
Case  : /media/teradrive01/mecaflu-1.6.x/run/mine/7
nProcs : 6
Slaves :
5
(
monarch01.4345
monarch01.4346
monarch01.4347
monarch01.4348
monarch01.4349
)

Pstream initialized with:
    floatTransfer    : 0
    nProcsSimpleSum  : 0
    commsType        : nonBlocking
SigFpe : Enabling floating point exception trapping (FOAM_SIGFPE).

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
Create time

End

Finalising parallel run
[5]
Starting transfers
[5]
[5] slave sending to master 0
[5] slave receiving from master 0
[5] (0 1 2)

Test on processor 5 comes after the end, I don't know if it could be a reason for stopping...
I have 8 GiB of memory and 3GiB of swap so memory seems to be ok!

Best regards

Nolwenn

wyldckat May 7, 2010 14:34

Greetings Nolwenn,

That is a strange output... it seems a bit out of sync :( It has happened to me once sometime ago, but the OpenFOAM header always came first!

Doesn't the script foamJob work for you? Or does it output the exact same thing?

Another possibility, is that it could actually reveal a bug in OpenFOAM! So, how did you decompose the domains for each processor?

Best regards,
Bruno

Nolwenn May 10, 2010 03:57

Hello Bruno,

Here is the result of foamJob, I can't find a lot of information!

Code:

/*---------------------------------------------------------------------------*\
| =========                |                                                |
| \\      /  F ield        | OpenFOAM: The Open Source CFD Toolbox          |
|  \\    /  O peration    | Version:  1.6.x                                |
|  \\  /    A nd          | Web:      www.OpenFOAM.org                      |
|    \\/    M anipulation  |                                                |
\*---------------------------------------------------------------------------*/
Build  : 1.6.x-605bfc578b21
Exec  : parallelTest -parallel
Date  : May 07 2010
Time  : 10:09:41
Host  : monarch01
PID    : 4344
Case  : /media/teradrive01/mecaflu-1.6.x/run/mine/7
nProcs : 6
Slaves :
5
(
monarch01.4345
monarch01.4346
monarch01.4347
monarch01.4348
monarch01.4349
)

Pstream initialized with:
    floatTransfer    : 0
    nProcsSimpleSum  : 0
    commsType        : nonBlocking
SigFpe : Enabling floating point exception trapping (FOAM_SIGFPE).

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
Create time

End

Finalising parallel run

For decomposition I didn't create anything, it is the one of a tutorial.

Code:

// The FOAM Project // File: decomposeParDict
/*
-------------------------------------------------------------------------------
 =========        | dictionary
 \\      /        |
  \\    /          | Name:  decomposeParDict
  \\  /          | Family: FoamX configuration file
    \\/            |
    F ield        | FOAM version: 2.1
    O peration    | Product of Nabla Ltd.
    A and          |
    M anipulation  | Email: Enquiries@Nabla.co.uk
-------------------------------------------------------------------------------
*/
// FoamX Case Dictionary.

FoamFile
{
    version        2.0;
    format          ascii;
    class          dictionary;
    object          decomposeParDict;
}

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //


numberOfSubdomains 6;

method          hierarchical;
//method          metis;
//method          parMetis;

simpleCoeffs
{
    n              (2 1 2);
    delta          0.001;
}

hierarchicalCoeffs
{
    n              (3 1 2);
    delta          0.001;
    order          xyz;
}

manualCoeffs
{
    dataFile        "cellDecomposition";
}

metisCoeffs
{
    //n                  (5 1 1);
    //cellWeightsFile    "constant/cellWeightsFile";
}


// ************************************************************************* //

When I first have this problem I re-install OF-1.6.x but the problem is the same.

I use gcc compiler, is it possible another compiler solve this?

Thank you for your help Bruno!

Best regards,

Nolwenn

scott May 10, 2010 22:58

How many processors or cores does your machine have?

I would presume if you have 8gb that you prob only have a quad core machine, hence I would only partition the domain into 4 volumes.

If you have a dual core machine then that would explain why it is ok with 2 processors, because that all you have.

Please post up your machine specs so that we can try and be more helpful.

Cheers,

Scott

Nolwenn May 11, 2010 04:28

Hello Scott!

I have 8 processors on my machine, I tried to find specs :

Code:

r3@monarch01:~$ cat /proc/cpuinfo
processor    : 0
vendor_id    : AuthenticAMD
cpu family    : 15
model        : 33
model name    : Dual Core AMD Opteron(tm) Processor 865
stepping    : 0
cpu MHz        : 1799.670
cache size    : 1024 KB
physical id    : 0
siblings    : 2
core id        : 0
cpu cores    : 2
apicid        : 0
initial apicid    : 0
fpu        : yes
fpu_exception    : yes
cpuid level    : 1
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good extd_apicid pni lahf_lm cmp_legacy
bogomips    : 3599.34
TLB size    : 1024 4K pages
clflush size    : 64
cache_alignment    : 64
address sizes    : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

processor    : 1
vendor_id    : AuthenticAMD
cpu family    : 15
model        : 33
model name    : Dual Core AMD Opteron(tm) Processor 865
stepping    : 0
cpu MHz        : 1799.670
cache size    : 1024 KB
physical id    : 0
siblings    : 2
core id        : 1
cpu cores    : 2
apicid        : 1
initial apicid    : 1
fpu        : yes
fpu_exception    : yes
cpuid level    : 1
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good extd_apicid pni lahf_lm cmp_legacy
bogomips    : 3600.11
TLB size    : 1024 4K pages
clflush size    : 64
cache_alignment    : 64
address sizes    : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

processor    : 2
vendor_id    : AuthenticAMD
cpu family    : 15
model        : 33
model name    : Dual Core AMD Opteron(tm) Processor 865
stepping    : 0
cpu MHz        : 1799.670
cache size    : 1024 KB
physical id    : 1
siblings    : 2
core id        : 0
cpu cores    : 2
apicid        : 2
initial apicid    : 2
fpu        : yes
fpu_exception    : yes
cpuid level    : 1
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good extd_apicid pni lahf_lm cmp_legacy
bogomips    : 3600.10
TLB size    : 1024 4K pages
clflush size    : 64
cache_alignment    : 64
address sizes    : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

processor    : 3
vendor_id    : AuthenticAMD
cpu family    : 15
model        : 33
model name    : Dual Core AMD Opteron(tm) Processor 865
stepping    : 0
cpu MHz        : 1799.670
cache size    : 1024 KB
physical id    : 1
siblings    : 2
core id        : 1
cpu cores    : 2
apicid        : 3
initial apicid    : 3
fpu        : yes
fpu_exception    : yes
cpuid level    : 1
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good extd_apicid pni lahf_lm cmp_legacy
bogomips    : 3600.11
TLB size    : 1024 4K pages
clflush size    : 64
cache_alignment    : 64
address sizes    : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

processor    : 4
vendor_id    : AuthenticAMD
cpu family    : 15
model        : 33
model name    : Dual Core AMD Opteron(tm) Processor 865
stepping    : 0
cpu MHz        : 1799.670
cache size    : 1024 KB
physical id    : 2
siblings    : 2
core id        : 0
cpu cores    : 2
apicid        : 4
initial apicid    : 4
fpu        : yes
fpu_exception    : yes
cpuid level    : 1
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good extd_apicid pni lahf_lm cmp_legacy
bogomips    : 3600.10
TLB size    : 1024 4K pages
clflush size    : 64
cache_alignment    : 64
address sizes    : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

processor    : 5
vendor_id    : AuthenticAMD
cpu family    : 15
model        : 33
model name    : Dual Core AMD Opteron(tm) Processor 865
stepping    : 0
cpu MHz        : 1799.670
cache size    : 1024 KB
physical id    : 2
siblings    : 2
core id        : 1
cpu cores    : 2
apicid        : 5
initial apicid    : 5
fpu        : yes
fpu_exception    : yes
cpuid level    : 1
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good extd_apicid pni lahf_lm cmp_legacy
bogomips    : 3600.10
TLB size    : 1024 4K pages
clflush size    : 64
cache_alignment    : 64
address sizes    : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

processor    : 6
vendor_id    : AuthenticAMD
cpu family    : 15
model        : 33
model name    : Dual Core AMD Opteron(tm) Processor 865
stepping    : 0
cpu MHz        : 1799.670
cache size    : 1024 KB
physical id    : 3
siblings    : 2
core id        : 0
cpu cores    : 2
apicid        : 6
initial apicid    : 6
fpu        : yes
fpu_exception    : yes
cpuid level    : 1
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good extd_apicid pni lahf_lm cmp_legacy
bogomips    : 3600.10
TLB size    : 1024 4K pages
clflush size    : 64
cache_alignment    : 64
address sizes    : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

processor    : 7
vendor_id    : AuthenticAMD
cpu family    : 15
model        : 33
model name    : Dual Core AMD Opteron(tm) Processor 865
stepping    : 0
cpu MHz        : 1799.670
cache size    : 1024 KB
physical id    : 3
siblings    : 2
core id        : 1
cpu cores    : 2
apicid        : 7
initial apicid    : 7
fpu        : yes
fpu_exception    : yes
cpuid level    : 1
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good extd_apicid pni lahf_lm cmp_legacy
bogomips    : 3600.11
TLB size    : 1024 4K pages
clflush size    : 64
cache_alignment    : 64
address sizes    : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

When I run the parallel test with 4 proc I have the same problem : OpenFOAM header comes almost at the end of the test ...

Best regards

Nolwenn

scott May 11, 2010 17:34

Have you tried with all 8 processors?

I dont have this problem on mine when I use all of the processors. Make sure you use decomposepar to get 8 partitions before you try.

Scott

scott May 11, 2010 17:42

Also are these 8 processes all on the same machine or are they on different machines? ie, is it a small cluster?

I haven't done this on a cluster setup before so can't be of any help with that. I was assuming that you had two quad core processors on a single motherboard, but I just went through it again and its either 8 dual core processors, or it is 4 dual core processers reporting a process for each core.

Can you confirm exactly what it is and maybe someone else can help you.

If its a cluster than you may have load issues, interconnect problems, or questionable installations on other machines.

Cheers,

Scott

Nolwenn May 12, 2010 04:34

Sorry, I am not very familiar with machine specs !
It is a single machine with 4 dual cores processors.

When I run with all processors the problem is the same :

Code:

Parallel processing using OPENMPI with 8 processors
Executing: mpirun -np 8 /home/mecaflu/OpenFOAM/OpenFOAM-1.6.x/bin/foamExec parallelTest -parallel | tee log
[0]
Starting transfers
[0]
[0] master receiving from slave 1
[0] (0 1 2)
[0] master receiving from slave 2
[0] (0 1 2)
[0] master receiving from slave 3
[0] (0 1 2)
[0] master receiving from slave 4
[0] (0 1 2)
[0] master receiving from slave 5
[0] (0 1 2)
[0] master receiving from slave 6
[0] (0 1 2)
[0] master receiving from slave 7
[0] (0 1 2)
[0] master sending to slave 1
[0] master sending to slave 2
[0] master sending to slave 3
[0] master sending to slave 4
[0] master sending to slave 5
[0] master sending to slave 6
[0] master sending to slave 7
[1]
Starting transfers
[1]
[1] slave sending to master 0
[1] slave receiving from master 0
[1] (0 1 2)
[2]
Starting transfers
[2]
[2] slave sending to master 0
[2] slave receiving from master 0
[2] (0 1 2)
[3]
Starting transfers
[3]
[3] slave sending to master 0
[3] slave receiving from master 0
[3] (0 1 2)
[4]
Starting transfers
[4]
[4] slave sending to master 0
[4] slave receiving from master 0
[4] (0 1 2)
[5]
Starting transfers
[5]
[5] slave sending to master 0
[5] slave receiving from master 0
[5] (0 1 2)
[6]
Starting transfers
[6]
[6] slave sending to master 0
[6] slave receiving from master 0
[6] (0 1 2)
[7]
Starting transfers
[7]
[7] slave sending to master 0
[7] slave receiving from master 0
[7] (0 1 2)
/*---------------------------------------------------------------------------*\
| =========                |                                                |
| \\      /  F ield        | OpenFOAM: The Open Source CFD Toolbox          |
|  \\    /  O peration    | Version:  1.6.x                                |
|  \\  /    A nd          | Web:      www.OpenFOAM.org                      |
|    \\/    M anipulation  |                                                |
\*---------------------------------------------------------------------------*/
Build  : 1.6.x-605bfc578b21
Exec  : parallelTest -parallel
Date  : May 12 2010
Time  : 10:22:27
Host  : monarch01
PID    : 4894
Case  : /media/teradrive01/mecaflu-1.6.x/run/mine/9
nProcs : 8
Slaves :
7
(
monarch01.4895
monarch01.4896
monarch01.4899
monarch01.4919
monarch01.4922
monarch01.4966
monarch01.4980
)

Pstream initialized with:
    floatTransfer    : 0
    nProcsSimpleSum  : 0
    commsType        : nonBlocking
SigFpe : Enabling floating point exception trapping (FOAM_SIGFPE).

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
Create time

End

Finalising parallel run

And in my log file I have just the end (from OF header).

Best regards

Nolwenn

bfa May 12, 2010 05:03

I encounter the same problem as Nolwenn. I use 12 cores on a single machine. parallelTest works fine and prints out results in a reasonable order. But when I run foamJob computation hangs on solving the first UEqn. All cores are on 100% but nothing is happening.
solver: simpleFoam
case: pitzDaily
decomposition: simple
OpenFOAM 1.6.x

Here is the output from mpirun -H localhost -np 12 simpleFoam -parallel
Code:

/*---------------------------------------------------------------------------*\
| =========                |                                                |
| \\      /  F ield        | OpenFOAM: The Open Source CFD Toolbox          |
|  \\    /  O peration    | Version:  1.6.x                                |
|  \\  /    A nd          | Web:      www.OpenFOAM.org                      |
|    \\/    M anipulation  |                                                |
\*---------------------------------------------------------------------------*/
Build  : 1.6.x-1d1db32a12b0
Exec  : simpleFoam -parallel
Date  : May 12 2010
Time  : 09:31:45
Host  : brahms
PID    : 11694
Case  : /home/fabritius/OpenFOAM/OpenFOAM-1.6.x/tutorials/incompressible/simpleFoam/pitzDaily
nProcs : 12
Slaves :
11
(
brahms.11695
brahms.11696
brahms.11697
brahms.11698
brahms.11699
brahms.11700
brahms.11701
brahms.11702
brahms.11703
brahms.11704
brahms.11705
)

Pstream initialized with:
    floatTransfer    : 0
    nProcsSimpleSum  : 0
    commsType        : nonBlocking
SigFpe : Enabling floating point exception trapping (FOAM_SIGFPE).

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
Create time

Create mesh for time = 0

Reading field p

Reading field U

Reading/calculating face flux field phi

Selecting incompressible transport model Newtonian
Selecting RAS turbulence model kEpsilon
kEpsilonCoeffs
{
    Cmu            0.09;
    C1              1.44;
    C2              1.92;
    sigmaEps        1.3;
}


Starting time loop

Time = 1

...and there it stops!

I tried a verbose mode of mpirun but that delivered no useful information either. Unfortunately I have no profiling tools at hand for parallel code. If anyone of you has vampir or sth similar and could try this out, that would be great.

wyldckat May 15, 2010 12:12

Greetings to all,

Well, this is quite odd. The only solutions that come to mind is to test the same working conditions with other build scenarios, namely:
  • use the system's OpenMPI, which is a valid option in $WM_PROJECT_DIR/etc/bashrc in OpenFOAM 1.6.x.
  • trying using the pre-built OpenFOAM 1.6 available on www.openfoam.com;
  • building OpenFOAM 1.6.x with gcc 4.3.3, which comes in the ThirdParty folder.
If you can manage to getting it running with one of the above or with some other solution, please tell us about it.

Because the only reasons that come to mind for the solvers to just jam up and not do anything productive, is that something didn't get built how it is suppose to be.


As for the output from parallelTest to come out with the outputs swapped, it could be an stdout buffering issue, where mpirun outputs the text from the slaves prior to the master's output, because the master's output didn't fill up fast enough to trigger the limit of number of characters before flushing.


Best regards,
Bruno

Nolwenn May 17, 2010 03:50

Hello everyone,

Now everything seems to be ok for me! I came back to OF 1.6 (prebuilt) with Ubuntu 8.04 and I have no longer problem.

Thank you again for your help Bruno!

Cheers,

Nolwenn

gtampier May 27, 2010 15:29

Hello everyone,

I'm experiencing the very same problem with openSuse. I've tried the pre-compiled 1.6 version and it worked! My problem arises again when I recompile openmpi. I do this in order to add the torque (batch system) and ofed options. Since we have a small cluster, this options are necessary for running cases in more than one node. Even if I recompile openmpi without this options (and just recompile it, nothing else), I get the same problem! (calculations stop, sometimes earlier, sometimes later and sometimes at the beginning, w/o any error mssg. and keeping all CPU's at 100%). This is quite strange - I would be glad if someone has further ideas... I'll keep you informed if I make some progress.

regards
Gonzalo

wyldckat May 27, 2010 20:41

Greetings Gonzalo,

Let's see... here are my questions for you:
  • How did you rebuild OpenMPI? Did you add the build options to the Allwmake script available in the folder $HOME/OpenFOAM/ThirdParty-1.6? Or did you rebuild the library by hand (./configure then make)?
  • What version of gcc did you use to rebuild OpenMPI? OpenFOAM's gcc 4.3.3 or openSUSE's version?
  • Then, after rebuilding the OpenMPI library, did you rebuild OpenFOAM as well? If not, something may be miss-linked somehow.
  • Do you know if your installed openSUSE version has a version of OpenMPI available in YaST (the Software Package Manager part of it) that has the characteristics you need? Because if it does, OpenFOAM 1.6.x has an option to use the system's OpenMPI version instead of the one that comes with OpenFOAM! And even if you need to stick to OpenFOAM 1.6, it should as easy as copying bashrc and settings.sh from the OpenFOAM-1.6.x/etc folder to OpenFOAM-1.6/etc folder!
I personally haven't had the time to try and reproduce these odd MPI freezing issues, but I also think it won't be very easy to reproduce them :(

The easiest way to avoid these issues, would be to use the same version of distros as the pre-built binaries came from, namely, if I'm not mistaken, Ubuntu 9.04 and openSUSE 11.0 or 11.1, because they have gcc 4.3.3 as their system compiler.

Best regards,
Bruno

gtampier May 28, 2010 03:09

Hello Bruno, hello all,

thanks for your comments. I compiled now openmpi again and it worked! I was trying to compile it with the system's gcc (4.4.1) of openSuse 11.2 first, which apparently caused the problems. Now I've tried it again with the ThirdParty gcc (4.3.3) and it works!
In both cases I compiled it with Allwmake from the ThirdParty-1.6 directory, after uncommenting the openib and openib-libdir options and adding the --with-tm option for torque. Then I deleted the openmpi-1.3.3/platform dir and executed Allwmake in ThirdParty-1.6. After this, it wasn't necessary to recompile OpenFOAM again.
Now I have run first tests with 2 nodes and a total of 16 processes (finer damBreak tutorial) and it seems to work fine!
It still remains for me a strange task, since I made the same for 1.6.x and it didn't work! I'll try now with the system's compiler for both OpenFOAM-1.6.x and ThirdParty when I have more time.
Thanks again!
Gonzalo

bunni May 28, 2010 15:39

parallel problem
 
Hi,

I've got a problem running a code in parallel. (one machine, quad core). I'm using openfoam 1.6 prebuilt binaries, on fedora 12.

The error I get is:

/*---------------------------------------------------------------------------*\
| ========= | |
| \\ / F ield | OpenFOAM: The Open Source CFD Toolbox |
| \\ / O peration | Version: 1.6 |
| \\ / A nd | Web: www.OpenFOAM.org |
| \\/ M anipulation | |
\*---------------------------------------------------------------------------*/
Build : 1.6-f802ff2d6c5a
Exec : interFoam -parallel
Date : May 28 2010
Time : 12:27:10
Host : blue
PID : 23136
Case : /home/bunni/OpenFOAM/OpenFOAM-1.6/tutorials/quartcyl
nProcs : 2
Slaves :
1
(
blue.23137
)

Pstream initialized with:
floatTransfer : 0
nProcsSimpleSum : 0
commsType : nonBlocking
SigFpe : Enabling floating point exception trapping (FOAM_SIGFPE).

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
Create time

Create mesh for time = 0

[blue:23137] *** An error occurred in MPI_Bsend
[blue:23137] *** on communicator MPI_COMM_WORLD
[blue:23137] *** MPI_ERR_BUFFER: invalid buffer pointer
[blue:23137] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
--------------------------------------------------------------------------
mpirun has exited due to process rank 1 with PID 23137 on
node blue exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[blue:23135] 1 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[blue:23135] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

- so I take it the program is crashing in the mesh part? It seems to run fine on a single proc. (and another geometry I had ran fine for parallel jobs). I've meshed a quarter of a cylinder, with the cylinder aligned on the z-axis. I've done simple decomposition along the z-axis, thinking that the circular geometry might be causing the problem.

Above, bruno mentioned the scripts: runParallel, parallelTest. Where are those scripts?

Cheers

wyldckat May 28, 2010 21:41

Greetings bunni,

Quote:

Originally Posted by bunni (Post 260763)
- so I take it the program is crashing in the mesh part? It seems to run fine on a single proc. (and another geometry I had ran fine for parallel jobs). I've meshed a quarter of a cylinder, with the cylinder aligned on the z-axis. I've done simple decomposition along the z-axis, thinking that the circular geometry might be causing the problem.

You might be hitting an existing bug in OpenFOAM 1.6, that could already be solved in OpenFOAM 1.6.x. For building OpenFOAM 1.6.x in Fedora 12, check this post: Problem Installing OF 1.6 Ubuntu 9.10 (64 bit) - How to use GCC 4.4.1 post #11

Quote:

Originally Posted by bunni (Post 260763)
Above, bruno mentioned the scripts: runParallel, parallelTest. Where are those scripts?

Check my post #4 in this current thread.

Best regards,
Bruno

marval June 3, 2010 13:30

DamBreak tuorial
 
Hi all!

I'm running through the tutorials and have problems with parallel running in the dam break tutorial.

This is the error I get;

Quote:

marco@marco-laptop:~/OpenFOAM/marco-1.6.x/run/tutorials/multiphase/interFoam/laminar/damBreakFine/system$ mpirun -np 4 interFoam -parallel > log &
[1] 27989
marco@marco-laptop:~/OpenFOAM/marco-1.6.x/run/tutorials/multiphase/interFoam/laminar/damBreakFine/system$ [0]
[0]
[0] --> FOAM FATAL ERROR:
[0] Cannot read "/home/marco/OpenFOAM/marco-1.6.x/run/tutorials/multiphase/interFoam/laminar/damBreakFine/system/system/decomposeParDict"
[0]
FOAM parallel run exiting
[0]
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 27990 on
node marco-laptop exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------

[1]+ Exit 1 mpirun -np 4 interFoam -parallel > log
And as I understand it, I should change the 'decomposeParDict'-file, currently it looks like this:

Quote:

/*--------------------------------*- C++ -*----------------------------------*\
| ========= | |
| \\ / F ield | OpenFOAM: The Open Source CFD Toolbox |
| \\ / O peration | Version: 1.6 |
| \\ / A nd | Web: www.OpenFOAM.org |
| \\/ M anipulation | |
\*---------------------------------------------------------------------------*/
FoamFile
{
version 2.0;
format ascii;
class dictionary;
location "system";
object decomposeParDict;
}
// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //

numberOfSubdomains 4;

method simple;

simpleCoeffs
{
n ( 2 2 1 );
delta 0.001;
}

hierarchicalCoeffs
{
n ( 1 1 1 );
delta 0.001;
order xyz;
}

metisCoeffs
{
processorWeights ( 1 1 1 1 );
}

manualCoeffs
{
dataFile "";
}

distributed no;

roots ( );


// ************************************************** *********************** //
I think I'm trying to run with more processors (4 processors in the manual) than I have available (2 processors), but don't know exact how to fix it (probably in the file).

Regards
Marco

wyldckat June 3, 2010 14:55

Hi Marco!

This has happened to me more then once :rolleyes:
Quote:

Originally Posted by marval (Post 261575)
Cannot read "/home/marco/OpenFOAM/marco-1.6.x/run/tutorials/multiphase/interFoam/laminar/damBreakFine/system/system/decomposeParDict"

Just do:
Code:

cd ..
and try again ;)


By the way, personally I've grown use to using the script foamJob, so in your case, I would use:
Code:

foamJob -s -p interFoam
The advantage of foamJob is that it will launch foamExec prior to running the desired solver/utility, thus activating the OpenFOAM environment on the remote machine/node and then going on with the usual OpenFOAM business :)

Best regards,
Bruno

bunni June 4, 2010 16:24

1 Attachment(s)
Ok, I'm back. After having installed openfoam 1.6.x, I'm having exactly the same problem running in parallel as I was before. It runs fine on a single processor.

I've tried to attach the output from the screen, which is what I posted above.
I'll try to post the details of the case in another message.

bunni June 4, 2010 16:41

gtz with the file stuff
 
1 Attachment(s)
Here should be the data to recreate the case. You'll need to run blockMesh on it, but hopefully the rest of the files are there. I've saved it as quart.tgz.gz because the uploader would not take quart.tgz. Therefore :
step 1 $ mv quart.tgz.gz quart.tgz
step 2 $ tar xvfz quart.tgz

and, should be created a directory tree called qcyl. You can descend into this to run blockMesh, etc.

It has been running for days with 1 proc, but crashes immediately with 2 or more. I've got a simple decomposition through the z-plane with 2 procs in the decomposeParDict.

Anyway, thanks for any ideas.

wyldckat June 6, 2010 06:56

Hi Bunni,

Well, the same things that happen with you, have happened with me too. I've confirmed that my OpenMPI is working with OpenFOAM, by testing parallelTest and the interFoam/laminar/damBreak case with dual core parallel execution.
edit: I forgot to mentioned that I used Ubuntu 8.04 i686 and OpenFOAM 1.6.x

I've managed to solve in part the error you get. Just edit the file "OpenFOAM-1.6.x/etc/settings.sh", find the variable minBufferSize and increase the buffer size:
Code:

# Set the minimum MPI buffer size (used by all platforms except SGI MPI)
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
minBufferSize=150000000

And don't forget to start a new terminal and use foamJob to ensure that the proper value is used.
But by the tests I've made of increasing the buffer size, it only seems to postpone the crash farther in time. interFoam seems to always crashes during the preparation part of the case. Now I know why other users report that it freezes... in fact, it just takes reaaally long playing around with meshes and memory and MPI messages, at least AFAIK, and sooner or later it will just crash without doing anything useful :(
edit2: yep, 400MB of buffer and it still crashes...

So, Bunni, I suggest that you try increasing that buffer variable, in an attempt to avoid the crashing. But my best bet is what I've said to you previously about OpenFOAM 1.6: this seems to be a bug in OpenFOAM, which apparently is yet to be fixed! So please post a bug report on the Bug report part of the OpenFOAM forum: http://www.cfd-online.com/Forums/openfoam-bugs/
If you want to save some time, you can refer to your post #23 from here and onward!

Best regards,
Bruno

bunni June 6, 2010 20:56

thanks
 
Thanks for checking that. I will post a bug report. I'm running on fedora and centos. I'll check out that variable change. As for right now, it's been running on a single processor without crashing for 4 days, so the run itself is stable.

PerryLJohnson September 20, 2011 12:35

Hello,

I have recently encountered the same problem as Nolwenn and Gonzalo regarding the stopping of a solver at the first time loop without any error message or the job exiting the queue (procs still occupied at 100%). I am in OF1.7.1 (with gcc 4.5.1) on a cluster with RHEL 5.4. The issue only occurs when running large cases, smaller cases work perfectly fine; but, there is plenty of memory per node even for the large cases (~50GB). The parallelTest utility reports fine as suggested above. Is there any knowledge to fixing this issue besides switching compilers? If not, which compilers should I switch to for OF1.7.1, since there is no default compiler?

Thanks in advance for any helpful advice,
Perry

wyldckat September 20, 2011 16:36

Greetings Perry,

Before I answer you, I just want to wrap up the solution to bunni's predicament - the thread where the solution is, is this one: http://www.cfd-online.com/Forums/ope...-parallel.html

Now back to you Perry: OK, when it comes to the issue of compiler version, there are two/three other libraries whose versions are also important, namely: GMP, MPFR and MPC. For example, from my experience, MPFR 3.0.0 doesn't work very well, so I still hang on to the older 2.4.2 version.
As for Gcc 4.5.1, it should work just fine with OpenFOAM 1.7.1. I might on the other hand, be triggering a couple of old bugs that have been solved since then. As I vaguely remember, they were related to some issues with cyclic or wedge or some other special type of patch, that would crash the solver when used in parallel. Aside from such old bugs, one still needs to use (if I'm not mistaken) the "preservePatches" parameter in decomposeParDict.

Either way, I've got a blog post where I'm gathering information on how to run in parallel with OpenFOAM (it's accessible from the link on my signature): Notes about running OpenFOAM in parallel
The ones that might interest you:
Knowing a bit more about the large case might help in trying to isolate the problem, namely:
  • Which decomposition method are you using?
  • What's the solver being used, or if it's a customized version, on which solver(s) is it based on?
  • What kinds of patches are involved? Any cyclic, wedge, baffle, etc...
  • What kind of turbulence models are being used, if any? RAS, LES, laminar or something else?
  • Have you tried gradually scaling up the size of your case? If so, did you take into account the respective calibration of the parameters in controlDict?
Last but not least, any chance of also trying OpenFOAM 2.0.1 or 2.0.x? Because if you are triggering a bug, it'll be easier to get help on this problem on the dedicated bug tracker.

Best regards,
Bruno

PerryLJohnson September 20, 2011 17:56

Quote:

Originally Posted by wyldckat (Post 324934)
Greetings Perry,

Now back to you Perry: OK, when it comes to the issue of compiler version, there are two/three other libraries whose versions are also important, namely: GMP, MPFR and MPC. For example, from my experience, MPFR 3.0.0 doesn't work very well, so I still hang on to the older 2.4.2 version.

Can you elaborate concerning what you mean by MPFR 3.0.0 does not work well?

Quote:

Originally Posted by wyldckat (Post 324934)
As for Gcc 4.5.1, it should work just fine with OpenFOAM 1.7.1. I might on the other hand, be triggering a couple of old bugs that have been solved since then. As I vaguely remember, they were related to some issues with cyclic or wedge or some other special type of patch, that would crash the solver when used in parallel. Aside from such old bugs, one still needs to use (if I'm not mistaken) the "preservePatches" parameter in decomposeParDict.

There is a cyclic patch in my case, however, a coarser mesh of the same domain and B.C., nor have I ever run into cyclic problems in past cases with 1.7.1.

Quote:

Originally Posted by wyldckat (Post 324934)
Either way, I've got a blog post where I'm gathering information on how to run in parallel with OpenFOAM (it's accessible from the link on my signature): Notes about running OpenFOAM in parallel
The ones that might interest you:

Thanks for the links!

Quote:

Originally Posted by wyldckat (Post 324934)
Knowing a bit more about the large case might help in trying to isolate the problem, namely:
  • Which decomposition method are you using?
  • What's the solver being used, or if it's a customized version, on which solver(s) is it based on?
  • What kinds of patches are involved? Any cyclic, wedge, baffle, etc...
  • What kind of turbulence models are being used, if any? RAS, LES, laminar or something else?
  • Have you tried gradually scaling up the size of your case? If so, did you take into account the respective calibration of the parameters in controlDict?

1) I've tried both simple and metis, sometimes metis stops at "Creating mesh for time = 0".

2) I'm using a custom solver based on simpleFoam (with an extra equation for passive scalar transport), but have also tested on simpleFoam itself with no difference.

3) I have one cyclic patch, 3 directMapped patches, and a number of inlets, outlets, and walls.

4) Right now, I'm using RAS, k-w SST.

5) I have not tried scaling the geometry, but I have run the same geometry on a coarser mesh successfully with the same boundary conditions. I only experience this problem on my fine mesh.

Quote:

Originally Posted by wyldckat (Post 324934)
Last but not least, any chance of also trying OpenFOAM 2.0.1 or 2.0.x? Because if you are triggering a bug, it'll be easier to get help on this problem on the dedicated bug tracker.

Best regards,
Bruno

This is a possibility if no simpler solutions exist.

Thanks very much for your help,
Perry

wyldckat September 21, 2011 15:11

Hi Perry,

Quote:

Originally Posted by PerryLJohnson (Post 324938)
Can you elaborate concerning what you mean by MPFR 3.0.0 does not work well?

See here as an example: https://github.com/OpenCFD/OpenFOAM-...ttings.sh#L119 - there you can see reference to the gcc+mpfr+gmp+mpc versions defined by default for OpenFOAM 1.7.x, which were defined and tested when 1.7.1 was released. If the Gcc build you are using happens to be linked to MPFR 3.0.0/1, then this might be one of the reasons, since there are some problems with mathematical operations, if I remember correctly.
Oh, here is the link to the makeGcc file on ThirdParty 2.0.x: https://github.com/OpenFOAM/ThirdPar...master/makeGcc - as you can see, Gcc 4.5.x needs MPFR, GMP and MPC to build properly.


Quote:

Originally Posted by PerryLJohnson (Post 324938)
There is a cyclic patch in my case, however, a coarser mesh of the same domain and B.C., nor have I ever run into cyclic problems in past cases with 1.7.1.

When the domain is decomposed, one might get lucky and not get the cyclic patch split in half between sub-domains. When luck runs out, the preserve patches parameter is a must.


Quote:

Originally Posted by PerryLJohnson (Post 324938)
1) I've tried both simple and metis, sometimes metis stops at "Creating mesh for time = 0".

2) I'm using a custom solver based on simpleFoam (with an extra equation for passive scalar transport), but have also tested on simpleFoam itself with no difference.

3) I have one cyclic patch, 3 directMapped patches, and a number of inlets, outlets, and walls.

4) Right now, I'm using RAS, k-w SST.

5) I have not tried scaling the geometry, but I have run the same geometry on a coarser mesh successfully with the same boundary conditions. I only experience this problem on my fine mesh.

1) Please remind me: then what happens with simple decomposition? When does it stop?
2) OK, then the problem must be elsewhere...
3) Are the directMapped patches also protected by the preserve patches parameter?
4) OK, seems pretty standard...
5) Have you tried visualizing the sub-domains in ParaView, to check where things are being split?

Have you executed checkMesh on the fine resolution mesh before decomposing, to verify if the mesh is OK?

There is an environment variable that OpenMPI uses that is defined in settings.sh... ah, line 347: https://github.com/OpenCFD/OpenFOAM-...ttings.sh#L347 - try increasing that value, perhaps 10x. Although this is only a valid solution in some cases.

And I know I've seen more reports like this before... and if I'm not mistaken, most were related to the patches being split between sub-domains, but my memory hasn't been very trustworthy lately :(
If my memory gets better, I'll search for what I've read in the past and post here.

...Wait... maybe it's the nonBlocking flag: https://github.com/OpenCFD/OpenFOAM-...ntrolDict#L875 - have you tried with other possibilities for the parameter commsType? I know there was a bug report a while back that was fixed in 2.0.x... here we go, it's in fact related to "directMappedPatch", although it might not affect your case: http://www.openfoam.com/mantisbt/view.php?id=280

Best regard and good luck!
Bruno

PerryLJohnson September 21, 2011 20:05

Quote:

Originally Posted by wyldckat (Post 325121)
Hi Perry,

See here as an example: https://github.com/OpenCFD/OpenFOAM-...ttings.sh#L119 - there you can see reference to the gcc+mpfr+gmp+mpc versions defined by default for OpenFOAM 1.7.x, which were defined and tested when 1.7.1 was released. If the Gcc build you are using happens to be linked to MPFR 3.0.0/1, then this might be one of the reasons, since there are some problems with mathematical operations, if I remember correctly.
Oh, here is the link to the makeGcc file on ThirdParty 2.0.x: https://github.com/OpenFOAM/ThirdPar...master/makeGcc - as you can see, Gcc 4.5.x needs MPFR, GMP and MPC to build properly.

This is a possibility, since the current setup uses MPFR 3.0.0 with Gcc 4.5.

Quote:

Originally Posted by wyldckat (Post 325121)
When the domain is decomposed, one might get lucky and not get the cyclic patch split in half between sub-domains. When luck runs out, the preserve patches parameter is a must.


1) Please remind me: then what happens with simple decomposition? When does it stop?

metis stops while building the mesh, which could be well explained by the lack of patch-preservation...thanks for that tip...

simple stops after building the mesh and fields, while starting the first time loop: "Time = 1", as if it is taking hours to complete the U-eqn; it also stops here when running serially (just tested)

Quote:

Originally Posted by wyldckat (Post 325121)
2) OK, then the problem must be elsewhere...
3) Are the directMapped patches also protected by the preserve patches parameter?
4) OK, seems pretty standard...
5) Have you tried visualizing the sub-domains in ParaView, to check where things are being split?

As for the preserve patches on the cyclic, I think that may be the issue with metis (it stops while building the mesh), but not for simple decomposition or serial runs (stops while performing U-Eqn at first time step).

Preserving patches for directMapped does not make sense to me, since the directMapped patch is not a shared boundary situation, but rather a case where the inlet looks to the nearest interior cell to a given offset location and finds the value there.

Is there a good way to visualize all of the sub-domains in one paraview session?

Quote:

Originally Posted by wyldckat (Post 325121)
Have you executed checkMesh on the fine resolution mesh before decomposing, to verify if the mesh is OK?

Three notifications from check mesh:
1) Two regions not connected by any faces (which is purposeful for my simulation, e.g. one region feeds the other via directMappedPatch).

2) 156 Non-orthogonal faces, but still says OK (max 86.4).

3) 3 skew faces, says that mesh fails, but this has not stopped me in the past. Would you think this problem could be related to 3 skewed faces (max skewness 4.66)? Doesn't seem like skew cells could prevent the solver from running, but I could be wrong...?

Quote:

Originally Posted by wyldckat (Post 325121)
There is an environment variable that OpenMPI uses that is defined in settings.sh... ah, line 347: https://github.com/OpenCFD/OpenFOAM-...ttings.sh#L347 - try increasing that value, perhaps 10x. Although this is only a valid solution in some cases.

Already did that :)

Quote:

Originally Posted by wyldckat (Post 325121)
And I know I've seen more reports like this before... and if I'm not mistaken, most were related to the patches being split between sub-domains, but my memory hasn't been very trustworthy lately :(
If my memory gets better, I'll search for what I've read in the past and post here.

Setting the cyclic patch to be preserved does not fix for simple decomposition (just attempted today).

Quote:

Originally Posted by wyldckat (Post 325121)
...Wait... maybe it's the nonBlocking flag: https://github.com/OpenCFD/OpenFOAM-...ntrolDict#L875 - have you tried with other possibilities for the parameter commsType? I know there was a bug report a while back that was fixed in 2.0.x... here we go, it's in fact related to "directMappedPatch", although it might not affect your case: http://www.openfoam.com/mantisbt/view.php?id=280

Best regard and good luck!
Bruno

The bug report states that the 'blocking' option is faulty but that the other two are ok. I have 'nonBlocking' enabled.

Thanks for your continued ideas,
Perry

wyldckat September 22, 2011 14:41

Hi Perry,

Quote:

Originally Posted by PerryLJohnson (Post 325146)
This is a possibility, since the current setup uses MPFR 3.0.0 with Gcc 4.5.

Well, AFAIK that isn't a supported combination of versions for building OpenFOAM, so my first bet would be to play it on the safe side. Any chance there is a gcc 4.4.x or 4.3.x lying around in the systems you have access to?


Quote:

Originally Posted by PerryLJohnson (Post 325146)
metis stops while building the mesh, which could be well explained by the lack of patch-preservation...thanks for that tip...

You're welcome :)
Quote:

Originally Posted by PerryLJohnson (Post 325146)
simple stops after building the mesh and fields, while starting the first time loop: "Time = 1", as if it is taking hours to complete the U-eqn; it also stops here when running serially (just tested)

Ah, now we are getting somewhere! If it stops when running in serial/single process, don't expect it to run in parallel! I had forgotten that this was one of the reasons why I asked about the turbulence model... I don't have much experience in this, but I do know that improper definitions for certain characteristic parameters or setting wrongly the boundary conditions will lead to simulations not running at all or crashing sooner or later. Such example would be bad initial values for the turbulence models, or for the parameters themselves.

Quote:

Originally Posted by PerryLJohnson (Post 325146)
As for the preserve patches on the cyclic, I think that may be the issue with metis (it stops while building the mesh), but not for simple decomposition or serial runs (stops while performing U-Eqn at first time step).

Preserving patches for directMapped does not make sense to me, since the directMapped patch is not a shared boundary situation, but rather a case where the inlet looks to the nearest interior cell to a given offset location and finds the value there.

It's always good to test things, just in case...

Quote:

Originally Posted by PerryLJohnson (Post 325146)
Is there a good way to visualize all of the sub-domains in one paraview session?

There are at least two ways of doing this:
  1. Using the internal reader in ParaView 3.8.x or 3.10.x. The internal reader uses the file extension ".foam". Run:
    Code:

    touch case.foam
    and open this file with ParaView. There should be on the object inspector an option to see the decomposed mesh.
    The decomposed mesh will appear in a single mesh volume, as if it were the serial case, but with the exception that it will show the processor boundary surfaces. Using the filters:
    • Extract Surface
      • Extract cells by region
    You can then see only the surfaces between processors.
  2. Using the official reader, which uses the file extension ".OpenFOAM", you'll have to create a file for each processor and open each one manually. You can generate the files for each processor like this:
    Code:

    for a in processor*; do paraFoam -touch -case $a; done
Quote:

Originally Posted by PerryLJohnson (Post 325146)
Three notifications from check mesh:
1) Two regions not connected by any faces (which is purposeful for my simulation, e.g. one region feeds the other via directMappedPatch).

2) 156 Non-orthogonal faces, but still says OK (max 86.4).

3) 3 skew faces, says that mesh fails, but this has not stopped me in the past. Would you think this problem could be related to 3 skewed faces (max skewness 4.66)? Doesn't seem like skew cells could prevent the solver from running, but I could be wrong...?

You can try using setSet to remove the damaged cells associated to those faces: http://openfoamwiki.net/index.php/SetSet - the mesh will be missing a few cells, but at least you can verify if these are the guilty party or not.

Quote:

Originally Posted by PerryLJohnson (Post 325146)
Setting the cyclic patch to be preserved does not fix for simple decomposition (just attempted today).
(...)
The bug report states that the 'blocking' option is faulty but that the other two are ok. I have 'nonBlocking' enabled.

As for these two points, my guess is that if it's not working in serial, then it's unlikely it will work in parallel...

Best regards and good luck!
Bruno

PerryLJohnson September 22, 2011 22:40

Bruno,

Quote:

Originally Posted by wyldckat (Post 325272)
Well, AFAIK that isn't a supported combination of versions for building OpenFOAM, so my first bet would be to play it on the safe side. Any chance there is a gcc 4.4.x or 4.3.x lying around in the systems you have access to?

Tested on machine with gcc 4.4, problem remains...

Quote:

Originally Posted by wyldckat (Post 325272)
Ah, now we are getting somewhere! If it stops when running in serial/single process, don't expect it to run in parallel! I had forgotten that this was one of the reasons why I asked about the turbulence model... I don't have much experience in this, but I do know that improper definitions for certain characteristic parameters or setting wrongly the boundary conditions will lead to simulations not running at all or crashing sooner or later. Such example would be bad initial values for the turbulence models, or for the parameters themselves.

Runs perfectly fine with exact same setup on coarser mesh (I literally replace the polyMesh dictionary and it changes from working to not-working).

I have narrowed it down to one of the directMapped B.C.'s (the other two are fine). When I switch it to fixedValue, everything works fine. Switch it back to direct mapped, and it stalls at the first Time loop. The checkMesh utility gives me a cellToRegions file, suggesting that I should use the splitMeshRegions utility and use two different regions. The problematic directMapped boundary pulls its data from a separate domain of the flow that is not connected geometrically. Could this explain the problems I am having? (I am completely unfamiliar with multiple regions in OF, other that the little reading I have done today.:D)

Quote:

*Number of regions: 2
The mesh has multiple regions which are not connected by any face.
<<Writing region information to "0/cellToRegion"
As always, thanks for the insight you provide,

Regards,
Perry

wyldckat September 23, 2011 14:36

Hi Perry,

Mmm... well, at least MPFR doesn't seem to be the one to blame... for now :)

And this is getting further into details that I'm not very familiar with either. Did checkMesh on the coarser mesh give you the same information, that it should divide the mesh into two separate regions?

For multiple regions, I only know about two solvers that should support this (I don't know if all other solvers support this or not); they are (and respective tutorials):
If I'm not mistaken, in OpenFOAM 2.0.x was introduced a boundary condition of type "Fan" which might be the type of variable you're looking for, although I'm not so sure... Here's a thread that asks about it, although no specific information is posted there: http://www.cfd-online.com/Forums/ope...enfoam200.html

Best regards and Good luck!
Bruno

PerryLJohnson September 24, 2011 14:51

Bruno,

I really appreciate all your help with this issue, and the side-tips along the way. I narrowed it down to the influence of the directMapped boundary condition with the fine mesh I was using, so I played around with meshing until I got one to work. I seem to have resolved the issue just by using a different mesh.

Sincere regards,
Perry

elham usefi May 26, 2017 17:36

non-interactive parallel run
 
greetings all!
I have installed OF-2.4.0(with gcc-4.8.1 , gmp-5.1.2 , mpc-1.0.1 , mpfr-3.1.2 ) on a cluster with CentOS 6.5 with this instructions
HTML Code:

https://openfoamwiki.net/index.php/Installation/Linux/OpenFOAM-2.3.0/CentOS_SL_RHEL
I’m running pitzdaily tutorial. Both serial and parallel runs go perfectly. The problem comes when I want to run non-interactively. Serial runs continue after I close putty window but parallel runs don’t! (no matter hove many processors I use)
I use this command
Code:

$ nohup foamJob -s -p simpleFoam &
System OpenMpi is 1.6.2 and I’ve installed OF with both OpenMpi-1.8.5 and 1.6.2, but D problem is D same!
nohup.out is like
Code:

3 total processes killed (some possibly by mpirun during cleanup)".
Has anybody any idea what’s happenning?

febriyan91 March 21, 2021 04:56

2 Attachment(s)
Quote:

Originally Posted by wyldckat (Post 261815)
Hi Bunni,

Well, the same things that happen with you, have happened with me too. I've confirmed that my OpenMPI is working with OpenFOAM, by testing parallelTest and the interFoam/laminar/damBreak case with dual core parallel execution.
edit: I forgot to mentioned that I used Ubuntu 8.04 i686 and OpenFOAM 1.6.x

I've managed to solve in part the error you get. Just edit the file "OpenFOAM-1.6.x/etc/settings.sh", find the variable minBufferSize and increase the buffer size:
Code:

# Set the minimum MPI buffer size (used by all platforms except SGI MPI)
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
minBufferSize=150000000

And don't forget to start a new terminal and use foamJob to ensure that the proper value is used.
But by the tests I've made of increasing the buffer size, it only seems to postpone the crash farther in time. interFoam seems to always crashes during the preparation part of the case. Now I know why other users report that it freezes... in fact, it just takes reaaally long playing around with meshes and memory and MPI messages, at least AFAIK, and sooner or later it will just crash without doing anything useful :(
edit2: yep, 400MB of buffer and it still crashes...

So, Bunni, I suggest that you try increasing that buffer variable, in an attempt to avoid the crashing. But my best bet is what I've said to you previously about OpenFOAM 1.6: this seems to be a bug in OpenFOAM, which apparently is yet to be fixed! So please post a bug report on the Bug report part of the OpenFOAM forum: http://www.cfd-online.com/Forums/openfoam-bugs/
If you want to save some time, you can refer to your post #23 from here and onward!

Best regards,
Bruno


Hi Bruno, I am using OF v2012 on Ubuntu 20.04 Focal Fossa.
I could not find file "setting.sh" in "OpenFoam-v2012/etc" folder. I found the file "settings.sh" inside "OpenFoam-v2012/etc/config.csh" and "OpenFoam-v2012/etc/config.csh" folders. Unfortunately I could not find the string " minBufferSize".
I wonder whether my OpenFoam installation is correct or not. I followed this instruction: http://openfoamwiki.net/index.php/In...M-v1806/Ubuntu
I change the version string v1806 to v2012. I attach the settings files.

my big appreciation if you could give me some hints.

Thank you in advance..


All times are GMT -4. The time now is 11:14.