CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM > OpenFOAM Running, Solving & CFD

Parallel Run on dynamically mounted partition

Register Blogs Community New Posts Updated Threads Search

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   September 27, 2007, 03:54
Default Hi, I would like to run a c
  #1
Senior Member
 
Fabian Braennstroem
Join Date: Mar 2009
Posts: 407
Rep Power: 19
braennstroem is on a distinguished road
Hi,

I would like to run a case in parallel which has its root on a dynamically mounted partition
'/scr/ceplx049/scr1/gcae504/OpenFOAM_Berechnung/ParallelTest'
I decomposed the case in that directory and tried to run it, but somehow it looks for the information in a non-existing 'home'-directory...

ceplx049/scr/ceplx049/scr1/gcae504/OpenFOAM_Berechnung/ParallelTest> mpirun --hostfile Klimakruemmer/machines -np 4 interFoam . damBreak -parallel > log &
[1] 26003
ceplx049/scr/ceplx049/scr1/gcae504/OpenFOAM_Berechnung/ParallelTest> [2]
[2]
[2] --> FOAM FATAL IO ERROR : cannot open file
[2]
[2] file: /v/caenfs05/egb_user5/home/gcae504/damBreak/processor2/system/controlDict at line 0.
[2]
[2] From function regIOobject::readStream(const word&)
[2] in file db/regIOobject/regIOobjectRead.C at line 66.
[2]
FOAM parallel run exiting
[2]
[ceplx050:20277] MPI_ABORT invoked on rank 2 in communicator MPI_COMM_WORLD with errorcode 1
[ceplx049][0,1,0][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed with errno=104
[3]
[3]
[3] --> FOAM FATAL IO ERROR : cannot open file
[3]
[3] file: /v/caenfs05/egb_user5/home/gcae504/damBreak/processor3/system/controlDict at line 0.
[3]
[3] From function regIOobject::readStream(const word&)
[3] in file db/regIOobject/regIOobjectRead.C at line 66.
[3]
FOAM parallel run exiting
[3]
[ceplx050:20278] MPI_ABORT invoked on rank 3 in communicator MPI_COMM_WORLD with errorcode 1
mpirun noticed that job rank 0 with PID 26007 on node ceplx049 exited on signal 15 (Terminated).



A parallel run with its root in my 'home'-directory works fine, but I limited space :-(
Would be nice, if anybody has an idea!?

Regards!
Fabian
braennstroem is offline   Reply With Quote

Old   September 27, 2007, 04:47
Default Hmm, You started from the d
  #2
Senior Member
 
Mark Olesen
Join Date: Mar 2009
Location: https://olesenm.github.io/
Posts: 1,685
Rep Power: 40
olesen has a spectacular aura aboutolesen has a spectacular aura about
Hmm,

You started from the directory
'/scr/ceplx049/scr1/gcae504/OpenFOAM_Berechnung/ParallelTest'

And MPI is reporting that it can't find the file

Quote:
[2] --> FOAM FATAL IO ERROR : cannot open file
[2] file: /v/caenfs05/egb_user5/home/gcae504/damBreak/processor2/system/controlDict
This looks more like an NFS confusion than anything else. Can you ssh onto the remote machine and see the '/v/ceanfs05/...' directory?

Check what 'mount -v' is showing and what the host is exporting (Linux: /usr/sbin/showmount -e HOST).

Depending on the configuration, you might need some form of directory mapping. For some directories we use the GridEngine sge_aliases, which lets you specify stuff like this:

#subm_dir subm_host exec_host path_replacement
/tmp_mnt/ * * /
olesen is offline   Reply With Quote

Old   September 27, 2007, 05:08
Default Hi Mark, yes, you are right
  #3
Senior Member
 
Fabian Braennstroem
Join Date: Mar 2009
Posts: 407
Rep Power: 19
braennstroem is on a distinguished road
Hi Mark,

yes, you are right, I started from '/scr/ceplx049/scr1/gcae504/OpenFOAM_Berechnung/ParallelTest' and the case is located in that directory too, but I was wondering about the asking 'home' path, which obviously does not exists....

sorry, it works now when I run it with the complete path for the root and not just with '.'.
Thanks! Fabian
braennstroem is offline   Reply With Quote

Old   October 2, 2007, 03:49
Default Hi, as I mentioned before,
  #4
Senior Member
 
Fabian Braennstroem
Join Date: Mar 2009
Posts: 407
Rep Power: 19
braennstroem is on a distinguished road
Hi,

as I mentioned before, it actually works now, but somehow I get the below error message after the first write to disk:

Time = 50

DILUPBiCG: Solving for Ux, Initial residual = 0.0224561, Final residual = 0.000388077, No Iterations 1
DILUPBiCG: Solving for Uy, Initial residual = 0.0595427, Final residual = 0.00106835, No Iterations 1
DILUPBiCG: Solving for Uz, Initial residual = 0.0407827, Final residual = 0.000722178, No Iterations 1
DICPCG: Solving for p, Initial residual = 0.758773, Final residual = 0.00727437, No Iterations 269
time step continuity errors : sum local = 0.00111858, global = 9.43786e-05, cumulative = -0.00822085
DILUPBiCG: Solving for epsilon, Initial residual = 0.0116643, Final residual = 0.000240804, No Iterations 1
DILUPBiCG: Solving for k, Initial residual = 0.0600995, Final residual = 0.000742773, No Iterations 1
ExecutionTime = 4908.29 s ClockTime = 5530 s

Time = 51

[2] --> FOAM Warning :
[5] --> FOAM Warning :
[5] From function Time::readModifiedObjects()
[5] in file db/Time/TimeIO.C at line 222
[5] Delaying reading objects due to inconsistent file time-stamps between processors
[6] --> FOAM Warning :
[8] --> FOAM Warning :
[9] --> FOAM Warning :
[9] From function Time::readModifiedObjects()
[9] in file db/Time/TimeIO.C at line 222
[9] Delaying reading objects due to inconsistent file time-stamps between processors
[2] From function Time::readModifiedObjects()
[2] in file db/Time/TimeIO.C at line 222
[2] Delaying reading objects due to inconsistent file time-stamps between processors
[3] --> FOAM Warning :
[3] From function Time::readModifiedObjects()
[3] in file db/Time/TimeIO.C at line 222
[3] Delaying reading objects due to inconsistent file time-stamps between processors
[4] --> FOAM Warning :
[4] From function Time::readModifiedObjects()
[4] in file db/Time/TimeIO.C at line 222
[4] Delaying reading objects due to inconsistent file time-stamps between processors
[6] From function Time::readModifiedObjects()
[6] in file db/Time/TimeIO.C at line 222
[6] Delaying reading objects due to inconsistent file time-stamps between processors
[7] --> FOAM Warning :
[7] From function Time::readModifiedObjects()
[7] in file db/Time/TimeIO.C at line 222
[7] Delaying reading objects due to inconsistent file time-stamps between processors
[8] From function Time::readModifiedObjects()
[8] in file db/Time/TimeIO.C at line 222
[8] Delaying reading objects due to inconsistent file time-stamps between processors

This Messages appears afterwards every time step, but the reconstruction and vtk-export works well at the end. Does anyone know, what kind of problem I face!? I run those calculations over ethernet...

Regards!
Fabian
braennstroem is offline   Reply With Quote

Old   October 2, 2007, 04:03
Default Yup, the time daemon is out of
  #5
Senior Member
 
Hrvoje Jasak
Join Date: Mar 2009
Location: London, England
Posts: 1,905
Rep Power: 33
hjasak will become famous soon enough
Yup, the time daemon is out of sync on your machines. Either set up a timeslave to work properly or play around with:

~/.OpenFOAM-1.4.1-dev/controlDict

OptimisationSwitches
{
fileModificationSkew 10;


Enjoy,

Hrv
__________________
Hrvoje Jasak
Providing commercial FOAM/OpenFOAM and CFD Consulting: http://wikki.co.uk
hjasak is offline   Reply With Quote

Old   October 2, 2007, 10:34
Default Hi Hrvoje, thanks! I assume
  #6
Senior Member
 
Fabian Braennstroem
Join Date: Mar 2009
Posts: 407
Rep Power: 19
braennstroem is on a distinguished road
Hi Hrvoje,

thanks! I assume the given switch accepts a sync problem of 10msec!?

Fabian
braennstroem is offline   Reply With Quote

Old   February 28, 2008, 11:43
Default Hi Hrvoje, I've encountere
  #7
Member
 
Michael Rangitsch
Join Date: Mar 2009
Location: Midland, Michigan, USA
Posts: 31
Rep Power: 17
mrangitschdowcom is on a distinguished road
Hi Hrvoje,
I've encountered the time-stamp problem as well, but it's a bit more mysterious. I'm running Xoodles on 8 cores of a single processor so it really can't be a time daemon problem. I get the time-stamps error when reading/writing files -- not all the time, but enough to make things unpleasant. Sometimes it shows up as an inability to read a file (and openFOAM crashes), other times it just doesn't write one of the files on one of the processors (and I get a 0 length file for whatever variable was writing). reconstructPar fails then. It's very inconsistent, and will not reproduce at the same point in the execution.

Where exactly is the controlDict entry to do the fileModificationSkew, just in the controlDict in the system directory of my case, or elsewhere?

Thanks in advance!

Mike
mrangitschdowcom is offline   Reply With Quote

Old   February 29, 2008, 02:53
Default Look at: ~/.OpenFOAM-1.4.1-
  #8
Senior Member
 
Hrvoje Jasak
Join Date: Mar 2009
Location: London, England
Posts: 1,905
Rep Power: 33
hjasak will become famous soon enough
Look at:

~/.OpenFOAM-1.4.1-dev/controlDict

(the path may be adjusted for your version) and search for:


OptimisationSwitches
{
fileModificationSkew 10;


If you haven't got this, the equivalent bit in your OpenFOAM installation should be read instead (haven't checked):

/home/hjasak/OpenFOAM/OpenFOAM-1.4.1-dev/.OpenFOAM-1.4.1-dev/controlDict

Enjoy,

Hrv
__________________
Hrvoje Jasak
Providing commercial FOAM/OpenFOAM and CFD Consulting: http://wikki.co.uk
hjasak is offline   Reply With Quote

Old   March 7, 2008, 04:25
Default Hi all! I started to run a pa
  #9
mer
Member
 
merrouche djemai
Join Date: Mar 2009
Location: ain-oussera, djelfa, algeria
Posts: 46
Rep Power: 17
mer is on a distinguished road
Hi all!
I started to run a parallel OF 1.4.1 case on a small network (04Pcs). In the past versions, I used LAM/MPI without problelms. Now, when I decompose the case,I can't find the corresponds files on the others nodes and when I run mpirun (openmpi) it fails.
what should I indicate in rhe decomposeParDict in the last lines, what is the problem, what is missing?
N.B. the SSH works well in different nodes.

Djemai
mer is offline   Reply With Quote

Old   March 14, 2008, 06:33
Default Hi to all, I have the same er
  #10
matteo_gautero
Guest
 
Posts: n/a
Hi to all,
I have the same error message of Fabian:

[27] --> FOAM Warning :
[27] From function Time::readModifiedObjects()
[27] in file db/Time/TimeIO.C at line 222
[27] Delaying reading objects due to inconsistent file time-stamps between processors
[36] --> FOAM Warning :
[48] --> FOAM Warning :
[48] From function Time::readModifiedObjects()
[48] in file db/Time/TimeIO.C at line 222
[48] Delaying reading objects due to inconsistent file time-stamps between processors
[29] --> FOAM Warning :
[29] From function Time::readModifiedObjects()
[29] in file db/Time/TimeIO.C at line 222
[29] Delaying reading objects due to inconsistent file time-stamps between processors
[37] --> FOAM Warning :
[37] From function Time::readModifiedObjects()
[37] in file db/Time/TimeIO.C at line 222
[37] Delaying reading objects due to inconsistent file time-stamps between processors
[49] --> FOAM Warning :
[30] --> FOAM Warning :
[38] --> FOAM Warning :
[38] From function Time::readModifiedObjects()
[38] in file db/Time/TimeIO.C at line 222
[38] Delaying reading objects due to inconsistent file time-stamps between processors


I controlled the file ~/OpenFOAM/OpenFOAM-1.4.1/.OpenFOAM-1.4.1/controlDict and I found the section

"OptimisationSwitches
{
fileModificationSkew 10;
"
that was set to 10 yet. So I tried to change this value into 20 but it didn't work, I have the same error message. The machine that I use is a cluster with 8 nodes with 2 INTEL XEON QUAD CORE for each node. Any suggestion?

Thanks,
Matteo.
  Reply With Quote

Old   March 14, 2008, 10:37
Default Hi to all, sorry I've forgott
  #11
matteo_gautero
Guest
 
Posts: n/a
Hi to all,
sorry I've forgotten to tell you that I'm working on a mesh with 15000000 of cells. The same case with a coarser mesh (4000000 cells) don't give me any problem.

Thanks,
Matteo
  Reply With Quote

Old   October 4, 2010, 05:58
Default error while running in parallel on multi-cpus across nodes
  #12
New Member
 
srikara's Avatar
 
Srikara Mahishi
Join Date: Mar 2009
Location: Bangalore
Posts: 22
Rep Power: 17
srikara is on a distinguished road
Hi All,
While running a case in parallel I get the following error:

PHP Code:
msd@mshcln2:~/cae_bench/fluent/small/openfoam/with-case> mpirun -np 8 -hostfile machines interFoam -parallel 
  
/*---------------------------------------------------------------------------*\
  | =========                 |                                                 |
  | \\      /  F ield         | OpenFOAM: The Open Source CFD Toolbox           |
  |  \\    /   O peration     | Version:  1.6                                   |
  |   \\  /    A nd           | Web:      www.OpenFOAM.org                      |
  |    \\/     M anipulation  |                                                 |
  \*---------------------------------------------------------------------------*/
  
Build  1.6-f802ff2d6c5a
  Exec   
interFoam -parallel
  Date   
Aug 12 2010
  Time   
10:03:07
  Host   
mshccn51
  PID    
23091
  
Case   : /user/msd/cae_bench/fluent/small/openfoam/with-case
  
nProcs 8
  Slaves 

  
7
  
(
  
mshccn51.23092
  mshccn51.23093
  mshccn51.23094
  mshccn53.25769
  mshccn53.25770
  mshccn53.25771
  mshccn53.25772
  
)
   
  
Pstream initialized with:
      
floatTransfer     0
      nProcsSimpleSum   
0
      commsType         
nonBlocking
  SigFpe 
Enabling floating point exception trapping (FOAM_SIGFPE).
   
  
// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
  
Create time
   
  Create mesh 
for time 0
   
   
  Reading g
  
[0
  [
0
  [
0cannot open file
  
[0
  [
0file: /user/msd/cae_bench/fluent/small/openfoam/with-case/processor0/constant/g at line 0.
  
[0
  [
0]     From function regIOobject::readStream()
  [
0]     in file db/regIOobject/regIOobjectRead.C at line 62.
  
[0
  
FOAM parallel run exiting
  
[0
  [
3
  [
3
  [
3cannot open file
  
[3
  [
3file: /user/msd/cae_bench/fluent/small/openfoam/with-case/processor3/constant/g at line 0.
  
[3
  [
3]     From function regIOobject::readStream()
  [
3]     in file db/regIOobject/regIOobjectRead.C at line 62.
  
[3
  
FOAM parallel run exiting
  
[3
  [
5
  [
5
  [
5cannot open file
  
[5
  [
5file: /user/msd/cae_bench/fluent/small/openfoam/with-case/processor5/constant/g at line 0.
  
[5
  [
5]     From function regIOobject::readStream()
  [
5]     in file db/regIOobject/regIOobjectRead.C at line 62.
  
[5
  
FOAM parallel run exiting
  
[5
  [
4] [7
   
  [
7
  [
7cannot open file
  
[4
  [
4cannot open file
  
[4
  [
4file: /user/msd/cae_bench/fluent/small/openfoam/with-case/processor4/constant/g at line 0.
  
[7
  [
7file: /user/msd/cae_bench/fluent/small/openfoam/with-case/processor7/constant/g at line 0.
  
[7
  [
7]     From function regIOobject::readStream()
  [
7]     in file db/regIOobject/regIOobjectRead.C at line 62.
  
[4
  [
4]     From function regIOobject::readStream()
  [
4]     in file db/regIOobject/regIOobjectRead.C at line 62.
  
[4
  
FOAM parallel run exiting
  
[4
  [
7
  
FOAM parallel run exiting
  
[7
  [
6
  [
6] --------------------------------------------------------------------------
  
MPI_ABORT was invoked on rank 3 in communicator MPI_COMM_WORLD 
  with errorcode 1.
   
  NOTE
invoking MPI_ABORT causes Open MPI to kill all MPI processes.
  
You may or may not see output from other processesdepending on
  exactly when Open MPI kills them
.
  --------------------------------------------------------------------------
   
  [
2
  [
2
  [
2cannot open file
  
[2
  [
6] [2file: /user/msd/cae_bench/fluent/small/openfoam/with-case/processor2/constant/g at line cannot open file0.
  [
2
  [
2]     From function regIOobject::readStream()
  [
2]     in file db/regIOobject/regIOobjectRead.C at line 62.
  
[2
  
FOAM parallel run exiting
  
[2
   
  [
6
  [
6file: /user/msd/cae_bench/fluent/small/openfoam/with-case/processor6/constant/g at line 0.
  
[6
  [
6]     From function regIOobject::readStream()
  [
6]     in file db/regIOobject/regIOobjectRead.C at line 62.
  
[6
  
FOAM parallel run exiting
  
[6
  --------------------------------------------------------------------------
  
mpirun has exited due to process rank 3 with PID 23094 on
  node mshccn51 exiting without calling 
"finalize"This may
  have caused other processes in the application to be
  terminated by signals sent by mpirun 
(as reported here).
  --------------------------------------------------------------------------
  [
1
  [
1
  [
1cannot open file
  
[1
  [
1file: /user/msd/cae_bench/fluent/small/openfoam/with-case/processor1/constant/g at line 0.
  
[1
  [
1]     From function regIOobject::readStream()
  [
1]     in file db/regIOobject/regIOobjectRead.C at line 62.
  
[1
  
FOAM parallel run exiting
  
[1
  [
mshcln2:025706 more processes have sent help message help-mpi-api.txt mpi-abort
  
[mshcln2:02570Set MCA parameter "orte_base_help_aggregate" to 0 to see all help error messages
  
[HTML][/HTML
Could anyone please help me as to what could be the problem. The same case runs on a single cpu without any errors.

Thank you in advance,
Srikara
srikara is offline   Reply With Quote

Old   October 4, 2010, 13:56
Default
  #13
Senior Member
 
Fabian Braennstroem
Join Date: Mar 2009
Posts: 407
Rep Power: 19
braennstroem is on a distinguished road
Hi,

I saw this quite frequently in the last month as well and assume, that it is a nfs error as Mark mentioned a long time ago. Unfortunately, I have no idea how to get rid of this... :-(
Would be great, if you have an idea!
Fabian
braennstroem is offline   Reply With Quote

Old   October 5, 2010, 07:41
Default
  #14
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,975
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Greetings to all!

It's not the first time I've seen reports about this issue with NFS, but I've never been able to reproduce that error with NFS to try and figure out the proper solution. But I do have an idea on how to fix that issue with NFS a while back, but didn't get a reply about it specifically:
Quote:
Originally Posted by wyldckat View Post
If you can, try to mount with these options:
Code:
sync,dirsync,atime,exec,rw
Source: http://www.toucheatout.net/informati...tuning-options
The idea is to force the NFS system to refresh more actively, because the default options are usually meant for a small access footprint, while these options (the bold ones) should enforce a more strict policy, and if my theory is correct, it will hopefully fix the issue you are having.
The other theory is that NFS needs some reminding before you use OpenFOAM directly. In other words, when launching the parallel run, tell it to run a script that lists the contents of the folder, before actually running the solver! This way the NFS client-server system should be forced to explicitly check what is on the server wire

So, if you guys can test these theories, perhaps we can get to the bottom of this problem!

Best regards,
Bruno
__________________
wyldckat is offline   Reply With Quote

Old   October 5, 2010, 14:43
Default
  #15
Senior Member
 
Fabian Braennstroem
Join Date: Mar 2009
Posts: 407
Rep Power: 19
braennstroem is on a distinguished road
Hello Bruno,

thanks for the advice! I will check our settings again... would be great if this works. As it occurs only occasionally it might take some days to give you a feedback.

Thanks!
Fabian

Quote:
Originally Posted by wyldckat View Post
Greetings to all!

It's not the first time I've seen reports about this issue with NFS, but I've never been able to reproduce that error with NFS to try and figure out the proper solution. But I do have an idea on how to fix that issue with NFS a while back, but didn't get a reply about it specifically:

The other theory is that NFS needs some reminding before you use OpenFOAM directly. In other words, when launching the parallel run, tell it to run a script that lists the contents of the folder, before actually running the solver! This way the NFS client-server system should be forced to explicitly check what is on the server wire

So, if you guys can test these theories, perhaps we can get to the bottom of this problem!

Best regards,
Bruno
braennstroem is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to specify partition count for local parallel rohit CFX 2 October 17, 2008 03:48
Basic rules for mesh partition for parallel runs hsieh OpenFOAM Running, Solving & CFD 1 December 24, 2006 11:07
dynamically allocated memory in C++ Junseok Kim Main CFD Forum 5 November 13, 2006 14:22
basic rules for partition domain for parallel run phsieh2005 Main CFD Forum 19 September 18, 2006 09:34
Dynamically changing boundray conditions Acha CFX 0 December 1, 2005 08:56


All times are GMT -4. The time now is 02:17.