CFD Online Discussion Forums

CFD Online Discussion Forums (http://www.cfd-online.com/Forums/)
-   OpenFOAM Running, Solving & CFD (http://www.cfd-online.com/Forums/openfoam-solving/)
-   -   Parallel Run on dynamically mounted partition (http://www.cfd-online.com/Forums/openfoam-solving/59032-parallel-run-dynamically-mounted-partition.html)

braennstroem September 27, 2007 03:54

Hi, I would like to run a c
 
Hi,

I would like to run a case in parallel which has its root on a dynamically mounted partition
'/scr/ceplx049/scr1/gcae504/OpenFOAM_Berechnung/ParallelTest'
I decomposed the case in that directory and tried to run it, but somehow it looks for the information in a non-existing 'home'-directory...

ceplx049/scr/ceplx049/scr1/gcae504/OpenFOAM_Berechnung/ParallelTest> mpirun --hostfile Klimakruemmer/machines -np 4 interFoam . damBreak -parallel > log &
[1] 26003
ceplx049/scr/ceplx049/scr1/gcae504/OpenFOAM_Berechnung/ParallelTest> [2]
[2]
[2] --> FOAM FATAL IO ERROR : cannot open file
[2]
[2] file: /v/caenfs05/egb_user5/home/gcae504/damBreak/processor2/system/controlDict at line 0.
[2]
[2] From function regIOobject::readStream(const word&)
[2] in file db/regIOobject/regIOobjectRead.C at line 66.
[2]
FOAM parallel run exiting
[2]
[ceplx050:20277] MPI_ABORT invoked on rank 2 in communicator MPI_COMM_WORLD with errorcode 1
[ceplx049][0,1,0][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed with errno=104
[3]
[3]
[3] --> FOAM FATAL IO ERROR : cannot open file
[3]
[3] file: /v/caenfs05/egb_user5/home/gcae504/damBreak/processor3/system/controlDict at line 0.
[3]
[3] From function regIOobject::readStream(const word&)
[3] in file db/regIOobject/regIOobjectRead.C at line 66.
[3]
FOAM parallel run exiting
[3]
[ceplx050:20278] MPI_ABORT invoked on rank 3 in communicator MPI_COMM_WORLD with errorcode 1
mpirun noticed that job rank 0 with PID 26007 on node ceplx049 exited on signal 15 (Terminated).



A parallel run with its root in my 'home'-directory works fine, but I limited space :-(
Would be nice, if anybody has an idea!?

Regards!
Fabian

olesen September 27, 2007 04:47

Hmm, You started from the d
 
Hmm,

You started from the directory
'/scr/ceplx049/scr1/gcae504/OpenFOAM_Berechnung/ParallelTest'

And MPI is reporting that it can't find the file

Quote:

[2] --> FOAM FATAL IO ERROR : cannot open file
[2] file: /v/caenfs05/egb_user5/home/gcae504/damBreak/processor2/system/controlDict
This looks more like an NFS confusion than anything else. Can you ssh onto the remote machine and see the '/v/ceanfs05/...' directory?

Check what 'mount -v' is showing and what the host is exporting (Linux: /usr/sbin/showmount -e HOST).

Depending on the configuration, you might need some form of directory mapping. For some directories we use the GridEngine sge_aliases, which lets you specify stuff like this:

#subm_dir subm_host exec_host path_replacement
/tmp_mnt/ * * /

braennstroem September 27, 2007 05:08

Hi Mark, yes, you are right
 
Hi Mark,

yes, you are right, I started from '/scr/ceplx049/scr1/gcae504/OpenFOAM_Berechnung/ParallelTest' and the case is located in that directory too, but I was wondering about the asking 'home' path, which obviously does not exists....

sorry, it works now when I run it with the complete path for the root and not just with '.'.
Thanks! Fabian

braennstroem October 2, 2007 03:49

Hi, as I mentioned before,
 
Hi,

as I mentioned before, it actually works now, but somehow I get the below error message after the first write to disk:

Time = 50

DILUPBiCG: Solving for Ux, Initial residual = 0.0224561, Final residual = 0.000388077, No Iterations 1
DILUPBiCG: Solving for Uy, Initial residual = 0.0595427, Final residual = 0.00106835, No Iterations 1
DILUPBiCG: Solving for Uz, Initial residual = 0.0407827, Final residual = 0.000722178, No Iterations 1
DICPCG: Solving for p, Initial residual = 0.758773, Final residual = 0.00727437, No Iterations 269
time step continuity errors : sum local = 0.00111858, global = 9.43786e-05, cumulative = -0.00822085
DILUPBiCG: Solving for epsilon, Initial residual = 0.0116643, Final residual = 0.000240804, No Iterations 1
DILUPBiCG: Solving for k, Initial residual = 0.0600995, Final residual = 0.000742773, No Iterations 1
ExecutionTime = 4908.29 s ClockTime = 5530 s

Time = 51

[2] --> FOAM Warning :
[5] --> FOAM Warning :
[5] From function Time::readModifiedObjects()
[5] in file db/Time/TimeIO.C at line 222
[5] Delaying reading objects due to inconsistent file time-stamps between processors
[6] --> FOAM Warning :
[8] --> FOAM Warning :
[9] --> FOAM Warning :
[9] From function Time::readModifiedObjects()
[9] in file db/Time/TimeIO.C at line 222
[9] Delaying reading objects due to inconsistent file time-stamps between processors
[2] From function Time::readModifiedObjects()
[2] in file db/Time/TimeIO.C at line 222
[2] Delaying reading objects due to inconsistent file time-stamps between processors
[3] --> FOAM Warning :
[3] From function Time::readModifiedObjects()
[3] in file db/Time/TimeIO.C at line 222
[3] Delaying reading objects due to inconsistent file time-stamps between processors
[4] --> FOAM Warning :
[4] From function Time::readModifiedObjects()
[4] in file db/Time/TimeIO.C at line 222
[4] Delaying reading objects due to inconsistent file time-stamps between processors
[6] From function Time::readModifiedObjects()
[6] in file db/Time/TimeIO.C at line 222
[6] Delaying reading objects due to inconsistent file time-stamps between processors
[7] --> FOAM Warning :
[7] From function Time::readModifiedObjects()
[7] in file db/Time/TimeIO.C at line 222
[7] Delaying reading objects due to inconsistent file time-stamps between processors
[8] From function Time::readModifiedObjects()
[8] in file db/Time/TimeIO.C at line 222
[8] Delaying reading objects due to inconsistent file time-stamps between processors

This Messages appears afterwards every time step, but the reconstruction and vtk-export works well at the end. Does anyone know, what kind of problem I face!? I run those calculations over ethernet...

Regards!
Fabian

hjasak October 2, 2007 04:03

Yup, the time daemon is out of
 
Yup, the time daemon is out of sync on your machines. Either set up a timeslave to work properly or play around with:

~/.OpenFOAM-1.4.1-dev/controlDict

OptimisationSwitches
{
fileModificationSkew 10;


Enjoy,

Hrv

braennstroem October 2, 2007 10:34

Hi Hrvoje, thanks! I assume
 
Hi Hrvoje,

thanks! I assume the given switch accepts a sync problem of 10msec!?

Fabian

mrangitschdowcom February 28, 2008 12:43

Hi Hrvoje, I've encountere
 
Hi Hrvoje,
I've encountered the time-stamp problem as well, but it's a bit more mysterious. I'm running Xoodles on 8 cores of a single processor so it really can't be a time daemon problem. I get the time-stamps error when reading/writing files -- not all the time, but enough to make things unpleasant. Sometimes it shows up as an inability to read a file (and openFOAM crashes), other times it just doesn't write one of the files on one of the processors (and I get a 0 length file for whatever variable was writing). reconstructPar fails then. It's very inconsistent, and will not reproduce at the same point in the execution.

Where exactly is the controlDict entry to do the fileModificationSkew, just in the controlDict in the system directory of my case, or elsewhere?

Thanks in advance!

Mike

hjasak February 29, 2008 03:53

Look at: ~/.OpenFOAM-1.4.1-
 
Look at:

~/.OpenFOAM-1.4.1-dev/controlDict

(the path may be adjusted for your version) and search for:


OptimisationSwitches
{
fileModificationSkew 10;


If you haven't got this, the equivalent bit in your OpenFOAM installation should be read instead (haven't checked):

/home/hjasak/OpenFOAM/OpenFOAM-1.4.1-dev/.OpenFOAM-1.4.1-dev/controlDict

Enjoy,

Hrv

mer March 7, 2008 05:25

Hi all! I started to run a pa
 
Hi all!
I started to run a parallel OF 1.4.1 case on a small network (04Pcs). In the past versions, I used LAM/MPI without problelms. Now, when I decompose the case,I can't find the corresponds files on the others nodes and when I run mpirun (openmpi) it fails.
what should I indicate in rhe decomposeParDict in the last lines, what is the problem, what is missing?
N.B. the SSH works well in different nodes.

Djemai

matteo_gautero March 14, 2008 07:33

Hi to all, I have the same er
 
Hi to all,
I have the same error message of Fabian:

[27] --> FOAM Warning :
[27] From function Time::readModifiedObjects()
[27] in file db/Time/TimeIO.C at line 222
[27] Delaying reading objects due to inconsistent file time-stamps between processors
[36] --> FOAM Warning :
[48] --> FOAM Warning :
[48] From function Time::readModifiedObjects()
[48] in file db/Time/TimeIO.C at line 222
[48] Delaying reading objects due to inconsistent file time-stamps between processors
[29] --> FOAM Warning :
[29] From function Time::readModifiedObjects()
[29] in file db/Time/TimeIO.C at line 222
[29] Delaying reading objects due to inconsistent file time-stamps between processors
[37] --> FOAM Warning :
[37] From function Time::readModifiedObjects()
[37] in file db/Time/TimeIO.C at line 222
[37] Delaying reading objects due to inconsistent file time-stamps between processors
[49] --> FOAM Warning :
[30] --> FOAM Warning :
[38] --> FOAM Warning :
[38] From function Time::readModifiedObjects()
[38] in file db/Time/TimeIO.C at line 222
[38] Delaying reading objects due to inconsistent file time-stamps between processors


I controlled the file ~/OpenFOAM/OpenFOAM-1.4.1/.OpenFOAM-1.4.1/controlDict and I found the section

"OptimisationSwitches
{
fileModificationSkew 10;
"
that was set to 10 yet. So I tried to change this value into 20 but it didn't work, I have the same error message. The machine that I use is a cluster with 8 nodes with 2 INTEL XEON QUAD CORE for each node. Any suggestion?

Thanks,
Matteo.

matteo_gautero March 14, 2008 11:37

Hi to all, sorry I've forgott
 
Hi to all,
sorry I've forgotten to tell you that I'm working on a mesh with 15000000 of cells. The same case with a coarser mesh (4000000 cells) don't give me any problem.

Thanks,
Matteo

srikara October 4, 2010 05:58

error while running in parallel on multi-cpus across nodes
 
Hi All,
While running a case in parallel I get the following error:

PHP Code:

msd@mshcln2:~/cae_bench/fluent/small/openfoam/with-case> mpirun -np 8 -hostfile machines interFoam -parallel 
  
/*---------------------------------------------------------------------------*\
  | =========                 |                                                 |
  | \\      /  F ield         | OpenFOAM: The Open Source CFD Toolbox           |
  |  \\    /   O peration     | Version:  1.6                                   |
  |   \\  /    A nd           | Web:      www.OpenFOAM.org                      |
  |    \\/     M anipulation  |                                                 |
  \*---------------------------------------------------------------------------*/
  
Build  1.6-f802ff2d6c5a
  Exec   
interFoam -parallel
  Date   
Aug 12 2010
  Time   
10:03:07
  Host   
mshccn51
  PID    
23091
  
Case   : /user/msd/cae_bench/fluent/small/openfoam/with-case
  
nProcs 8
  Slaves 

  
7
  
(
  
mshccn51.23092
  mshccn51.23093
  mshccn51.23094
  mshccn53.25769
  mshccn53.25770
  mshccn53.25771
  mshccn53.25772
  
)
   
  
Pstream initialized with:
      
floatTransfer     0
      nProcsSimpleSum   
0
      commsType         
nonBlocking
  SigFpe 
Enabling floating point exception trapping (FOAM_SIGFPE).
   
  
// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
  
Create time
   
  Create mesh 
for time 0
   
   
  Reading g
  
[0
  [
0
  [
0cannot open file
  
[0
  [
0file: /user/msd/cae_bench/fluent/small/openfoam/with-case/processor0/constant/g at line 0.
  
[0
  [
0]     From function regIOobject::readStream()
  [
0]     in file db/regIOobject/regIOobjectRead.C at line 62.
  
[0
  
FOAM parallel run exiting
  
[0
  [
3
  [
3
  [
3cannot open file
  
[3
  [
3file: /user/msd/cae_bench/fluent/small/openfoam/with-case/processor3/constant/g at line 0.
  
[3
  [
3]     From function regIOobject::readStream()
  [
3]     in file db/regIOobject/regIOobjectRead.C at line 62.
  
[3
  
FOAM parallel run exiting
  
[3
  [
5
  [
5
  [
5cannot open file
  
[5
  [
5file: /user/msd/cae_bench/fluent/small/openfoam/with-case/processor5/constant/g at line 0.
  
[5
  [
5]     From function regIOobject::readStream()
  [
5]     in file db/regIOobject/regIOobjectRead.C at line 62.
  
[5
  
FOAM parallel run exiting
  
[5
  [
4] [7
   
  [
7
  [
7cannot open file
  
[4
  [
4cannot open file
  
[4
  [
4file: /user/msd/cae_bench/fluent/small/openfoam/with-case/processor4/constant/g at line 0.
  
[7
  [
7file: /user/msd/cae_bench/fluent/small/openfoam/with-case/processor7/constant/g at line 0.
  
[7
  [
7]     From function regIOobject::readStream()
  [
7]     in file db/regIOobject/regIOobjectRead.C at line 62.
  
[4
  [
4]     From function regIOobject::readStream()
  [
4]     in file db/regIOobject/regIOobjectRead.C at line 62.
  
[4
  
FOAM parallel run exiting
  
[4
  [
7
  
FOAM parallel run exiting
  
[7
  [
6
  [
6] --------------------------------------------------------------------------
  
MPI_ABORT was invoked on rank 3 in communicator MPI_COMM_WORLD 
  with errorcode 1.
   
  NOTE
invoking MPI_ABORT causes Open MPI to kill all MPI processes.
  
You may or may not see output from other processesdepending on
  exactly when Open MPI kills them
.
  --------------------------------------------------------------------------
   
  [
2
  [
2
  [
2cannot open file
  
[2
  [
6] [2file: /user/msd/cae_bench/fluent/small/openfoam/with-case/processor2/constant/g at line cannot open file0.
  [
2
  [
2]     From function regIOobject::readStream()
  [
2]     in file db/regIOobject/regIOobjectRead.C at line 62.
  
[2
  
FOAM parallel run exiting
  
[2
   
  [
6
  [
6file: /user/msd/cae_bench/fluent/small/openfoam/with-case/processor6/constant/g at line 0.
  
[6
  [
6]     From function regIOobject::readStream()
  [
6]     in file db/regIOobject/regIOobjectRead.C at line 62.
  
[6
  
FOAM parallel run exiting
  
[6
  --------------------------------------------------------------------------
  
mpirun has exited due to process rank 3 with PID 23094 on
  node mshccn51 exiting without calling 
"finalize"This may
  have caused other processes in the application to be
  terminated by signals sent by mpirun 
(as reported here).
  --------------------------------------------------------------------------
  [
1
  [
1
  [
1cannot open file
  
[1
  [
1file: /user/msd/cae_bench/fluent/small/openfoam/with-case/processor1/constant/g at line 0.
  
[1
  [
1]     From function regIOobject::readStream()
  [
1]     in file db/regIOobject/regIOobjectRead.C at line 62.
  
[1
  
FOAM parallel run exiting
  
[1
  [
mshcln2:025706 more processes have sent help message help-mpi-api.txt mpi-abort
  
[mshcln2:02570Set MCA parameter "orte_base_help_aggregate" to 0 to see all help error messages
  
[HTML][/HTML

Could anyone please help me as to what could be the problem. The same case runs on a single cpu without any errors.

Thank you in advance,
Srikara

braennstroem October 4, 2010 13:56

Hi,

I saw this quite frequently in the last month as well and assume, that it is a nfs error as Mark mentioned a long time ago. Unfortunately, I have no idea how to get rid of this... :-(
Would be great, if you have an idea!
Fabian

wyldckat October 5, 2010 07:41

Greetings to all!

It's not the first time I've seen reports about this issue with NFS, but I've never been able to reproduce that error with NFS to try and figure out the proper solution. But I do have an idea on how to fix that issue with NFS a while back, but didn't get a reply about it specifically:
Quote:

Originally Posted by wyldckat (Post 265884)
If you can, try to mount with these options:
Code:

sync,dirsync,atime,exec,rw
Source: http://www.toucheatout.net/informati...tuning-options
The idea is to force the NFS system to refresh more actively, because the default options are usually meant for a small access footprint, while these options (the bold ones) should enforce a more strict policy, and if my theory is correct, it will hopefully fix the issue you are having.

The other theory is that NFS needs some reminding before you use OpenFOAM directly. In other words, when launching the parallel run, tell it to run a script that lists the contents of the folder, before actually running the solver! This way the NFS client-server system should be forced to explicitly check what is on the server wire :)

So, if you guys can test these theories, perhaps we can get to the bottom of this problem!

Best regards,
Bruno

braennstroem October 5, 2010 14:43

Hello Bruno,

thanks for the advice! I will check our settings again... would be great if this works. As it occurs only occasionally it might take some days to give you a feedback.

Thanks!
Fabian

Quote:

Originally Posted by wyldckat (Post 277871)
Greetings to all!

It's not the first time I've seen reports about this issue with NFS, but I've never been able to reproduce that error with NFS to try and figure out the proper solution. But I do have an idea on how to fix that issue with NFS a while back, but didn't get a reply about it specifically:

The other theory is that NFS needs some reminding before you use OpenFOAM directly. In other words, when launching the parallel run, tell it to run a script that lists the contents of the folder, before actually running the solver! This way the NFS client-server system should be forced to explicitly check what is on the server wire :)

So, if you guys can test these theories, perhaps we can get to the bottom of this problem!

Best regards,
Bruno



All times are GMT -4. The time now is 13:40.