CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM

stop when I run in parallel

Register Blogs Members List Search Today's Posts Mark Forums Read

Like Tree5Likes

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   May 4, 2010, 14:06
Default stop when I run in parallel
  #1
New Member
 
Nolwenn
Join Date: Apr 2010
Posts: 26
Rep Power: 15
Nolwenn is on a distinguished road
Hello everyone,

When I run a parallel case it stops (or sometimes it succeeds) without any error message. Its seems to be busy (all cpu at 100%) but there is no progress. It happens at the beginnig or later, a kind of random error.
I'm using OpenFoam1.6.x with Ubuntu 9.10 and gcc 4.4.1 as compiler.
I have no problem when I run a case with a single processor.

Has anyone an idea of what happen?

Here is a case which run and stop. I just modify the number of processors from the tutorial case.

Thank you for your help.

Nolwenn

OpenFOAM sourced
mecaflu@monarch01:~$ cd OpenFOAM/mecaflu-1.6.x/run/damBreak/
mecaflu@monarch01:~/OpenFOAM/mecaflu-1.6.x/run/damBreak$ mpirun -np 6 interFoam -parallel
/*---------------------------------------------------------------------------*\
| ========= | |
| \\ / F ield | OpenFOAM: The Open Source CFD Toolbox |
| \\ / O peration | Version: 1.6.x |
| \\ / A nd | Web: www.OpenFOAM.org |
| \\/ M anipulation | |
\*---------------------------------------------------------------------------*/
Build : 1.6.x-605bfc578b21
Exec : interFoam -parallel
Date : May 04 2010
Time : 18:46:25
Host : monarch01
PID : 23017
Case : /media/teradrive01/mecaflu-1.6.x/run/damBreak
nProcs : 6
Slaves :
5
(
monarch01.23018
monarch01.23019
monarch01.23020
monarch01.23021
monarch01.23022
)

Pstream initialized with:
floatTransfer : 0
nProcsSimpleSum : 0
commsType : nonBlocking
SigFpe : Enabling floating point exception trapping (FOAM_SIGFPE).

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
Create time

Create mesh for time = 0


Reading g
Reading field p

Reading field alpha1

Reading field U

Reading/calculating face flux field phi

Reading transportProperties

Selecting incompressible transport model Newtonian
Selecting incompressible transport model Newtonian
Selecting turbulence model type laminar
time step continuity errors : sum local = 0, global = 0, cumulative = 0
DICPCG: Solving for pcorr, Initial residual = 0, Final residual = 0, No Iterations 0
time step continuity errors : sum local = 0, global = 0, cumulative = 0
Courant Number mean: 0 max: 0

Starting time loop
Nolwenn is offline   Reply With Quote

Old   May 6, 2010, 01:19
Default
  #2
New Member
 
Jay
Join Date: Feb 2010
Posts: 15
Rep Power: 16
jayrup is on a distinguished road
Hi Nolwenn
I want to ask you, are you using the distributed parallelization or on silgle machine you are giving the : mpirun -np 6 ..... .
Regards
Jay
jayrup is offline   Reply With Quote

Old   May 6, 2010, 04:36
Default
  #3
New Member
 
Nolwenn
Join Date: Apr 2010
Posts: 26
Rep Power: 15
Nolwenn is on a distinguished road
Hi Jay,

I am using a single machine with mpirun -np 6 interFoam -parallel. When I run with 2 processors it seems it runs more iterations than with 4 or more...

Regards

Nolwenn
Nolwenn is offline   Reply With Quote

Old   May 6, 2010, 21:40
Default
  #4
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,974
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Greetings Nolwenn,

It could be a memory issue. OpenFOAM is known to crash and/or freeze Linux boxes when memory isn't enough. Check this post (or the whole thread it's on) for more on it: mpirun problems post # 3

Also, try using the parallelTest utility - information available on this post: OpenFOAM updates post #19
The parallelTest utility (it's part of OpenFOAM's test utilities) can aid you in sorting out the more basic MPI problems, like communication problems or missing environment settings or libraries not found, without running any particular solver functionalities.. For example: for some weird reason, there might me something missing in the mpirun command to allow the 6 cores to work properly together!

Best regards,
Bruno
elham usefi likes this.
__________________
wyldckat is offline   Reply With Quote

Old   May 7, 2010, 05:33
Default
  #5
New Member
 
Nolwenn
Join Date: Apr 2010
Posts: 26
Rep Power: 15
Nolwenn is on a distinguished road
Hello Bruno,

Thank you for your answer, I run parallel test and obtain this :


Code:
Executing: mpirun -np 6 /home/mecaflu/OpenFOAM/OpenFOAM-1.6.x/bin/foamExec parallelTest -parallel | tee log
[0] 
Starting transfers
[0] 
[0] master receiving from slave 1
[0] (0 1 2)
[0] master receiving from slave 2
[0] (0 1 2)
[0] master receiving from slave 3
[0] (0 1 2)
[0] master receiving from slave 4
[0] (0 1 2)
[0] master receiving from slave 5
[0] (0 1 2)
[0] master sending to slave 1
[0] master sending to slave 2
[0] master sending to slave 3
[0] master sending to slave 4
[0] master sending to slave 5
[1] 
Starting transfers
[1] 
[1] slave sending to master 0
[1] slave receiving from master 0
[1] (0 1 2)
[2] 
Starting transfers
[2] 
[2] slave sending to master 0
[2] slave receiving from master 0
[2] (0 1 2)
[3] 
Starting transfers
[3] 
[3] slave sending to master 0
[3] slave receiving from master 0
[3] (0 1 2)
[4] 
Starting transfers
[4] 
[4] slave sending to master 0
[4] slave receiving from master 0
[4] (0 1 2)
/*---------------------------------------------------------------------------*\
| =========                 |                                                 |
| \\      /  F ield         | OpenFOAM: The Open Source CFD Toolbox           |
|  \\    /   O peration     | Version:  1.6.x                                 |
|   \\  /    A nd           | Web:      www.OpenFOAM.org                      |
|    \\/     M anipulation  |                                                 |
\*---------------------------------------------------------------------------*/
Build  : 1.6.x-605bfc578b21
Exec   : parallelTest -parallel
Date   : May 07 2010
Time   : 10:09:41
Host   : monarch01
PID    : 4344
Case   : /media/teradrive01/mecaflu-1.6.x/run/mine/7
nProcs : 6
Slaves : 
5
(
monarch01.4345
monarch01.4346
monarch01.4347
monarch01.4348
monarch01.4349
)

Pstream initialized with:
    floatTransfer     : 0
    nProcsSimpleSum   : 0
    commsType         : nonBlocking
SigFpe : Enabling floating point exception trapping (FOAM_SIGFPE).

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
Create time

End

Finalising parallel run
[5] 
Starting transfers
[5] 
[5] slave sending to master 0
[5] slave receiving from master 0
[5] (0 1 2)
Test on processor 5 comes after the end, I don't know if it could be a reason for stopping...
I have 8 GiB of memory and 3GiB of swap so memory seems to be ok!

Best regards

Nolwenn
Nolwenn is offline   Reply With Quote

Old   May 7, 2010, 15:34
Default
  #6
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,974
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Greetings Nolwenn,

That is a strange output... it seems a bit out of sync It has happened to me once sometime ago, but the OpenFOAM header always came first!

Doesn't the script foamJob work for you? Or does it output the exact same thing?

Another possibility, is that it could actually reveal a bug in OpenFOAM! So, how did you decompose the domains for each processor?

Best regards,
Bruno
__________________
wyldckat is offline   Reply With Quote

Old   May 10, 2010, 04:57
Default
  #7
New Member
 
Nolwenn
Join Date: Apr 2010
Posts: 26
Rep Power: 15
Nolwenn is on a distinguished road
Hello Bruno,

Here is the result of foamJob, I can't find a lot of information!

Code:
/*---------------------------------------------------------------------------*\
| =========                 |                                                 |
| \\      /  F ield         | OpenFOAM: The Open Source CFD Toolbox           |
|  \\    /   O peration     | Version:  1.6.x                                 |
|   \\  /    A nd           | Web:      www.OpenFOAM.org                      |
|    \\/     M anipulation  |                                                 |
\*---------------------------------------------------------------------------*/
Build  : 1.6.x-605bfc578b21
Exec   : parallelTest -parallel
Date   : May 07 2010
Time   : 10:09:41
Host   : monarch01
PID    : 4344
Case   : /media/teradrive01/mecaflu-1.6.x/run/mine/7
nProcs : 6
Slaves : 
5
(
monarch01.4345
monarch01.4346
monarch01.4347
monarch01.4348
monarch01.4349
)

Pstream initialized with:
    floatTransfer     : 0
    nProcsSimpleSum   : 0
    commsType         : nonBlocking
SigFpe : Enabling floating point exception trapping (FOAM_SIGFPE).

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
Create time

End

Finalising parallel run
For decomposition I didn't create anything, it is the one of a tutorial.

Code:
// The FOAM Project // File: decomposeParDict
/*
-------------------------------------------------------------------------------
 =========         | dictionary
 \\      /         | 
  \\    /          | Name:   decomposeParDict
   \\  /           | Family: FoamX configuration file
    \\/            | 
    F ield         | FOAM version: 2.1
    O peration     | Product of Nabla Ltd.
    A and          | 
    M anipulation  | Email: Enquiries@Nabla.co.uk
-------------------------------------------------------------------------------
*/
// FoamX Case Dictionary.

FoamFile
{
    version         2.0;
    format          ascii;
    class           dictionary;
    object          decomposeParDict;
}

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //


numberOfSubdomains 6;

method          hierarchical;
//method          metis;
//method          parMetis;

simpleCoeffs
{
    n               (2 1 2);
    delta           0.001;
}

hierarchicalCoeffs
{
    n               (3 1 2);
    delta           0.001;
    order           xyz;
}

manualCoeffs
{
    dataFile        "cellDecomposition";
}

metisCoeffs
{
    //n                   (5 1 1);
    //cellWeightsFile     "constant/cellWeightsFile";
}


// ************************************************************************* //
When I first have this problem I re-install OF-1.6.x but the problem is the same.

I use gcc compiler, is it possible another compiler solve this?

Thank you for your help Bruno!

Best regards,

Nolwenn
Nolwenn is offline   Reply With Quote

Old   May 10, 2010, 23:58
Default
  #8
Member
 
Scott
Join Date: Sep 2009
Posts: 44
Rep Power: 16
scott is on a distinguished road
How many processors or cores does your machine have?

I would presume if you have 8gb that you prob only have a quad core machine, hence I would only partition the domain into 4 volumes.

If you have a dual core machine then that would explain why it is ok with 2 processors, because that all you have.

Please post up your machine specs so that we can try and be more helpful.

Cheers,

Scott
scott is offline   Reply With Quote

Old   May 11, 2010, 05:28
Default
  #9
New Member
 
Nolwenn
Join Date: Apr 2010
Posts: 26
Rep Power: 15
Nolwenn is on a distinguished road
Hello Scott!

I have 8 processors on my machine, I tried to find specs :

Code:
r3@monarch01:~$ cat /proc/cpuinfo
processor    : 0
vendor_id    : AuthenticAMD
cpu family    : 15
model        : 33
model name    : Dual Core AMD Opteron(tm) Processor 865
stepping    : 0
cpu MHz        : 1799.670
cache size    : 1024 KB
physical id    : 0
siblings    : 2
core id        : 0
cpu cores    : 2
apicid        : 0
initial apicid    : 0
fpu        : yes
fpu_exception    : yes
cpuid level    : 1
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good extd_apicid pni lahf_lm cmp_legacy
bogomips    : 3599.34
TLB size    : 1024 4K pages
clflush size    : 64
cache_alignment    : 64
address sizes    : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

processor    : 1
vendor_id    : AuthenticAMD
cpu family    : 15
model        : 33
model name    : Dual Core AMD Opteron(tm) Processor 865
stepping    : 0
cpu MHz        : 1799.670
cache size    : 1024 KB
physical id    : 0
siblings    : 2
core id        : 1
cpu cores    : 2
apicid        : 1
initial apicid    : 1
fpu        : yes
fpu_exception    : yes
cpuid level    : 1
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good extd_apicid pni lahf_lm cmp_legacy
bogomips    : 3600.11
TLB size    : 1024 4K pages
clflush size    : 64
cache_alignment    : 64
address sizes    : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

processor    : 2
vendor_id    : AuthenticAMD
cpu family    : 15
model        : 33
model name    : Dual Core AMD Opteron(tm) Processor 865
stepping    : 0
cpu MHz        : 1799.670
cache size    : 1024 KB
physical id    : 1
siblings    : 2
core id        : 0
cpu cores    : 2
apicid        : 2
initial apicid    : 2
fpu        : yes
fpu_exception    : yes
cpuid level    : 1
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good extd_apicid pni lahf_lm cmp_legacy
bogomips    : 3600.10
TLB size    : 1024 4K pages
clflush size    : 64
cache_alignment    : 64
address sizes    : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

processor    : 3
vendor_id    : AuthenticAMD
cpu family    : 15
model        : 33
model name    : Dual Core AMD Opteron(tm) Processor 865
stepping    : 0
cpu MHz        : 1799.670
cache size    : 1024 KB
physical id    : 1
siblings    : 2
core id        : 1
cpu cores    : 2
apicid        : 3
initial apicid    : 3
fpu        : yes
fpu_exception    : yes
cpuid level    : 1
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good extd_apicid pni lahf_lm cmp_legacy
bogomips    : 3600.11
TLB size    : 1024 4K pages
clflush size    : 64
cache_alignment    : 64
address sizes    : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

processor    : 4
vendor_id    : AuthenticAMD
cpu family    : 15
model        : 33
model name    : Dual Core AMD Opteron(tm) Processor 865
stepping    : 0
cpu MHz        : 1799.670
cache size    : 1024 KB
physical id    : 2
siblings    : 2
core id        : 0
cpu cores    : 2
apicid        : 4
initial apicid    : 4
fpu        : yes
fpu_exception    : yes
cpuid level    : 1
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good extd_apicid pni lahf_lm cmp_legacy
bogomips    : 3600.10
TLB size    : 1024 4K pages
clflush size    : 64
cache_alignment    : 64
address sizes    : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

processor    : 5
vendor_id    : AuthenticAMD
cpu family    : 15
model        : 33
model name    : Dual Core AMD Opteron(tm) Processor 865
stepping    : 0
cpu MHz        : 1799.670
cache size    : 1024 KB
physical id    : 2
siblings    : 2
core id        : 1
cpu cores    : 2
apicid        : 5
initial apicid    : 5
fpu        : yes
fpu_exception    : yes
cpuid level    : 1
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good extd_apicid pni lahf_lm cmp_legacy
bogomips    : 3600.10
TLB size    : 1024 4K pages
clflush size    : 64
cache_alignment    : 64
address sizes    : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

processor    : 6
vendor_id    : AuthenticAMD
cpu family    : 15
model        : 33
model name    : Dual Core AMD Opteron(tm) Processor 865
stepping    : 0
cpu MHz        : 1799.670
cache size    : 1024 KB
physical id    : 3
siblings    : 2
core id        : 0
cpu cores    : 2
apicid        : 6
initial apicid    : 6
fpu        : yes
fpu_exception    : yes
cpuid level    : 1
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good extd_apicid pni lahf_lm cmp_legacy
bogomips    : 3600.10
TLB size    : 1024 4K pages
clflush size    : 64
cache_alignment    : 64
address sizes    : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

processor    : 7
vendor_id    : AuthenticAMD
cpu family    : 15
model        : 33
model name    : Dual Core AMD Opteron(tm) Processor 865
stepping    : 0
cpu MHz        : 1799.670
cache size    : 1024 KB
physical id    : 3
siblings    : 2
core id        : 1
cpu cores    : 2
apicid        : 7
initial apicid    : 7
fpu        : yes
fpu_exception    : yes
cpuid level    : 1
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good extd_apicid pni lahf_lm cmp_legacy
bogomips    : 3600.11
TLB size    : 1024 4K pages
clflush size    : 64
cache_alignment    : 64
address sizes    : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp
When I run the parallel test with 4 proc I have the same problem : OpenFOAM header comes almost at the end of the test ...

Best regards

Nolwenn
Nolwenn is offline   Reply With Quote

Old   May 11, 2010, 18:34
Default
  #10
Member
 
Scott
Join Date: Sep 2009
Posts: 44
Rep Power: 16
scott is on a distinguished road
Have you tried with all 8 processors?

I dont have this problem on mine when I use all of the processors. Make sure you use decomposepar to get 8 partitions before you try.

Scott
scott is offline   Reply With Quote

Old   May 11, 2010, 18:42
Default
  #11
Member
 
Scott
Join Date: Sep 2009
Posts: 44
Rep Power: 16
scott is on a distinguished road
Also are these 8 processes all on the same machine or are they on different machines? ie, is it a small cluster?

I haven't done this on a cluster setup before so can't be of any help with that. I was assuming that you had two quad core processors on a single motherboard, but I just went through it again and its either 8 dual core processors, or it is 4 dual core processers reporting a process for each core.

Can you confirm exactly what it is and maybe someone else can help you.

If its a cluster than you may have load issues, interconnect problems, or questionable installations on other machines.

Cheers,

Scott
scott is offline   Reply With Quote

Old   May 12, 2010, 05:34
Default
  #12
New Member
 
Nolwenn
Join Date: Apr 2010
Posts: 26
Rep Power: 15
Nolwenn is on a distinguished road
Sorry, I am not very familiar with machine specs !
It is a single machine with 4 dual cores processors.

When I run with all processors the problem is the same :

Code:
Parallel processing using OPENMPI with 8 processors
Executing: mpirun -np 8 /home/mecaflu/OpenFOAM/OpenFOAM-1.6.x/bin/foamExec parallelTest -parallel | tee log
[0] 
Starting transfers
[0] 
[0] master receiving from slave 1
[0] (0 1 2)
[0] master receiving from slave 2
[0] (0 1 2)
[0] master receiving from slave 3
[0] (0 1 2)
[0] master receiving from slave 4
[0] (0 1 2)
[0] master receiving from slave 5
[0] (0 1 2)
[0] master receiving from slave 6
[0] (0 1 2)
[0] master receiving from slave 7
[0] (0 1 2)
[0] master sending to slave 1
[0] master sending to slave 2
[0] master sending to slave 3
[0] master sending to slave 4
[0] master sending to slave 5
[0] master sending to slave 6
[0] master sending to slave 7
[1] 
Starting transfers
[1] 
[1] slave sending to master 0
[1] slave receiving from master 0
[1] (0 1 2)
[2] 
Starting transfers
[2] 
[2] slave sending to master 0
[2] slave receiving from master 0
[2] (0 1 2)
[3] 
Starting transfers
[3] 
[3] slave sending to master 0
[3] slave receiving from master 0
[3] (0 1 2)
[4] 
Starting transfers
[4] 
[4] slave sending to master 0
[4] slave receiving from master 0
[4] (0 1 2)
[5] 
Starting transfers
[5] 
[5] slave sending to master 0
[5] slave receiving from master 0
[5] (0 1 2)
[6] 
Starting transfers
[6] 
[6] slave sending to master 0
[6] slave receiving from master 0
[6] (0 1 2)
[7] 
Starting transfers
[7] 
[7] slave sending to master 0
[7] slave receiving from master 0
[7] (0 1 2)
/*---------------------------------------------------------------------------*\
| =========                 |                                                 |
| \\      /  F ield         | OpenFOAM: The Open Source CFD Toolbox           |
|  \\    /   O peration     | Version:  1.6.x                                 |
|   \\  /    A nd           | Web:      www.OpenFOAM.org                      |
|    \\/     M anipulation  |                                                 |
\*---------------------------------------------------------------------------*/
Build  : 1.6.x-605bfc578b21
Exec   : parallelTest -parallel
Date   : May 12 2010
Time   : 10:22:27
Host   : monarch01
PID    : 4894
Case   : /media/teradrive01/mecaflu-1.6.x/run/mine/9
nProcs : 8
Slaves : 
7
(
monarch01.4895
monarch01.4896
monarch01.4899
monarch01.4919
monarch01.4922
monarch01.4966
monarch01.4980
)

Pstream initialized with:
    floatTransfer     : 0
    nProcsSimpleSum   : 0
    commsType         : nonBlocking
SigFpe : Enabling floating point exception trapping (FOAM_SIGFPE).

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
Create time

End

Finalising parallel run
And in my log file I have just the end (from OF header).

Best regards

Nolwenn
Nolwenn is offline   Reply With Quote

Old   May 12, 2010, 06:03
Default
  #13
bfa
Member
 
Björn Fabritius
Join Date: Mar 2009
Location: Freiberg, Germany
Posts: 31
Rep Power: 17
bfa is on a distinguished road
I encounter the same problem as Nolwenn. I use 12 cores on a single machine. parallelTest works fine and prints out results in a reasonable order. But when I run foamJob computation hangs on solving the first UEqn. All cores are on 100% but nothing is happening.
solver: simpleFoam
case: pitzDaily
decomposition: simple
OpenFOAM 1.6.x

Here is the output from mpirun -H localhost -np 12 simpleFoam -parallel
Code:
/*---------------------------------------------------------------------------*\
| =========                 |                                                 |
| \\      /  F ield         | OpenFOAM: The Open Source CFD Toolbox           |
|  \\    /   O peration     | Version:  1.6.x                                 |
|   \\  /    A nd           | Web:      www.OpenFOAM.org                      |
|    \\/     M anipulation  |                                                 |
\*---------------------------------------------------------------------------*/
Build  : 1.6.x-1d1db32a12b0
Exec   : simpleFoam -parallel
Date   : May 12 2010
Time   : 09:31:45
Host   : brahms
PID    : 11694
Case   : /home/fabritius/OpenFOAM/OpenFOAM-1.6.x/tutorials/incompressible/simpleFoam/pitzDaily
nProcs : 12
Slaves : 
11
(
brahms.11695
brahms.11696
brahms.11697
brahms.11698
brahms.11699
brahms.11700
brahms.11701
brahms.11702
brahms.11703
brahms.11704
brahms.11705
)

Pstream initialized with:
    floatTransfer     : 0
    nProcsSimpleSum   : 0
    commsType         : nonBlocking
SigFpe : Enabling floating point exception trapping (FOAM_SIGFPE).

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
Create time

Create mesh for time = 0

Reading field p

Reading field U

Reading/calculating face flux field phi

Selecting incompressible transport model Newtonian
Selecting RAS turbulence model kEpsilon
kEpsilonCoeffs
{
    Cmu             0.09;
    C1              1.44;
    C2              1.92;
    sigmaEps        1.3;
}


Starting time loop

Time = 1
...and there it stops!

I tried a verbose mode of mpirun but that delivered no useful information either. Unfortunately I have no profiling tools at hand for parallel code. If anyone of you has vampir or sth similar and could try this out, that would be great.
bfa is offline   Reply With Quote

Old   May 15, 2010, 13:12
Default
  #14
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,974
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Greetings to all,

Well, this is quite odd. The only solutions that come to mind is to test the same working conditions with other build scenarios, namely:
  • use the system's OpenMPI, which is a valid option in $WM_PROJECT_DIR/etc/bashrc in OpenFOAM 1.6.x.
  • trying using the pre-built OpenFOAM 1.6 available on www.openfoam.com;
  • building OpenFOAM 1.6.x with gcc 4.3.3, which comes in the ThirdParty folder.
If you can manage to getting it running with one of the above or with some other solution, please tell us about it.

Because the only reasons that come to mind for the solvers to just jam up and not do anything productive, is that something didn't get built how it is suppose to be.


As for the output from parallelTest to come out with the outputs swapped, it could be an stdout buffering issue, where mpirun outputs the text from the slaves prior to the master's output, because the master's output didn't fill up fast enough to trigger the limit of number of characters before flushing.


Best regards,
Bruno
__________________
wyldckat is offline   Reply With Quote

Old   May 17, 2010, 04:50
Default
  #15
New Member
 
Nolwenn
Join Date: Apr 2010
Posts: 26
Rep Power: 15
Nolwenn is on a distinguished road
Hello everyone,

Now everything seems to be ok for me! I came back to OF 1.6 (prebuilt) with Ubuntu 8.04 and I have no longer problem.

Thank you again for your help Bruno!

Cheers,

Nolwenn
Nolwenn is offline   Reply With Quote

Old   May 27, 2010, 16:29
Default
  #16
New Member
 
Gonzalo Tampier
Join Date: Apr 2009
Location: Berlin, Germany
Posts: 9
Rep Power: 16
gtampier is on a distinguished road
Hello everyone,

I'm experiencing the very same problem with openSuse. I've tried the pre-compiled 1.6 version and it worked! My problem arises again when I recompile openmpi. I do this in order to add the torque (batch system) and ofed options. Since we have a small cluster, this options are necessary for running cases in more than one node. Even if I recompile openmpi without this options (and just recompile it, nothing else), I get the same problem! (calculations stop, sometimes earlier, sometimes later and sometimes at the beginning, w/o any error mssg. and keeping all CPU's at 100%). This is quite strange - I would be glad if someone has further ideas... I'll keep you informed if I make some progress.

regards
Gonzalo
gtampier is offline   Reply With Quote

Old   May 27, 2010, 21:41
Default
  #17
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,974
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Greetings Gonzalo,

Let's see... here are my questions for you:
  • How did you rebuild OpenMPI? Did you add the build options to the Allwmake script available in the folder $HOME/OpenFOAM/ThirdParty-1.6? Or did you rebuild the library by hand (./configure then make)?
  • What version of gcc did you use to rebuild OpenMPI? OpenFOAM's gcc 4.3.3 or openSUSE's version?
  • Then, after rebuilding the OpenMPI library, did you rebuild OpenFOAM as well? If not, something may be miss-linked somehow.
  • Do you know if your installed openSUSE version has a version of OpenMPI available in YaST (the Software Package Manager part of it) that has the characteristics you need? Because if it does, OpenFOAM 1.6.x has an option to use the system's OpenMPI version instead of the one that comes with OpenFOAM! And even if you need to stick to OpenFOAM 1.6, it should as easy as copying bashrc and settings.sh from the OpenFOAM-1.6.x/etc folder to OpenFOAM-1.6/etc folder!
I personally haven't had the time to try and reproduce these odd MPI freezing issues, but I also think it won't be very easy to reproduce them

The easiest way to avoid these issues, would be to use the same version of distros as the pre-built binaries came from, namely, if I'm not mistaken, Ubuntu 9.04 and openSUSE 11.0 or 11.1, because they have gcc 4.3.3 as their system compiler.

Best regards,
Bruno
__________________
wyldckat is offline   Reply With Quote

Old   May 28, 2010, 04:09
Default
  #18
New Member
 
Gonzalo Tampier
Join Date: Apr 2009
Location: Berlin, Germany
Posts: 9
Rep Power: 16
gtampier is on a distinguished road
Hello Bruno, hello all,

thanks for your comments. I compiled now openmpi again and it worked! I was trying to compile it with the system's gcc (4.4.1) of openSuse 11.2 first, which apparently caused the problems. Now I've tried it again with the ThirdParty gcc (4.3.3) and it works!
In both cases I compiled it with Allwmake from the ThirdParty-1.6 directory, after uncommenting the openib and openib-libdir options and adding the --with-tm option for torque. Then I deleted the openmpi-1.3.3/platform dir and executed Allwmake in ThirdParty-1.6. After this, it wasn't necessary to recompile OpenFOAM again.
Now I have run first tests with 2 nodes and a total of 16 processes (finer damBreak tutorial) and it seems to work fine!
It still remains for me a strange task, since I made the same for 1.6.x and it didn't work! I'll try now with the system's compiler for both OpenFOAM-1.6.x and ThirdParty when I have more time.
Thanks again!
Gonzalo
gtampier is offline   Reply With Quote

Old   May 28, 2010, 16:39
Default parallel problem
  #19
Member
 
Join Date: Mar 2010
Posts: 31
Rep Power: 16
bunni is on a distinguished road
Hi,

I've got a problem running a code in parallel. (one machine, quad core). I'm using openfoam 1.6 prebuilt binaries, on fedora 12.

The error I get is:

/*---------------------------------------------------------------------------*\
| ========= | |
| \\ / F ield | OpenFOAM: The Open Source CFD Toolbox |
| \\ / O peration | Version: 1.6 |
| \\ / A nd | Web: www.OpenFOAM.org |
| \\/ M anipulation | |
\*---------------------------------------------------------------------------*/
Build : 1.6-f802ff2d6c5a
Exec : interFoam -parallel
Date : May 28 2010
Time : 12:27:10
Host : blue
PID : 23136
Case : /home/bunni/OpenFOAM/OpenFOAM-1.6/tutorials/quartcyl
nProcs : 2
Slaves :
1
(
blue.23137
)

Pstream initialized with:
floatTransfer : 0
nProcsSimpleSum : 0
commsType : nonBlocking
SigFpe : Enabling floating point exception trapping (FOAM_SIGFPE).

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
Create time

Create mesh for time = 0

[blue:23137] *** An error occurred in MPI_Bsend
[blue:23137] *** on communicator MPI_COMM_WORLD
[blue:23137] *** MPI_ERR_BUFFER: invalid buffer pointer
[blue:23137] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
--------------------------------------------------------------------------
mpirun has exited due to process rank 1 with PID 23137 on
node blue exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[blue:23135] 1 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[blue:23135] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

- so I take it the program is crashing in the mesh part? It seems to run fine on a single proc. (and another geometry I had ran fine for parallel jobs). I've meshed a quarter of a cylinder, with the cylinder aligned on the z-axis. I've done simple decomposition along the z-axis, thinking that the circular geometry might be causing the problem.

Above, bruno mentioned the scripts: runParallel, parallelTest. Where are those scripts?

Cheers
bunni is offline   Reply With Quote

Old   May 28, 2010, 22:41
Default
  #20
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,974
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Greetings bunni,

Quote:
Originally Posted by bunni View Post
- so I take it the program is crashing in the mesh part? It seems to run fine on a single proc. (and another geometry I had ran fine for parallel jobs). I've meshed a quarter of a cylinder, with the cylinder aligned on the z-axis. I've done simple decomposition along the z-axis, thinking that the circular geometry might be causing the problem.
You might be hitting an existing bug in OpenFOAM 1.6, that could already be solved in OpenFOAM 1.6.x. For building OpenFOAM 1.6.x in Fedora 12, check this post: Problem Installing OF 1.6 Ubuntu 9.10 (64 bit) - How to use GCC 4.4.1 post #11

Quote:
Originally Posted by bunni View Post
Above, bruno mentioned the scripts: runParallel, parallelTest. Where are those scripts?
Check my post #4 in this current thread.

Best regards,
Bruno
__________________
wyldckat is offline   Reply With Quote

Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Unable to run OF in parallel on a multiple-node cluster quartzian OpenFOAM 3 November 24, 2009 14:37
Swap usage on parallel run nikhilesh OpenFOAM Bugs 1 April 30, 2009 05:42
Problem on Parallel Run Setup Hamidur Rahman CFX 0 September 23, 2007 18:11
Windows 64-bit, Distributed Parallel Run Issues... Erich CFX 3 March 28, 2006 17:36
Serial run OK parallel one fails r2d2 OpenFOAM Running, Solving & CFD 2 November 16, 2005 13:44


All times are GMT -4. The time now is 22:26.