CFD Online Discussion Forums

CFD Online Discussion Forums (http://www.cfd-online.com/Forums/)
-   OpenFOAM Running, Solving & CFD (http://www.cfd-online.com/Forums/openfoam-solving/)
-   -   Running Foam on multiple nodes (small cluster) (http://www.cfd-online.com/Forums/openfoam-solving/103050-running-foam-multiple-nodes-small-cluster.html)

Hisham June 9, 2012 08:43

Running Foam on multiple nodes (small cluster)
 
Hello Foamers,

I need to run OpenFOAM on two PCs. So:
1. I work on one (Master) and have a password SSH access to the other (slave).
2. I have installed version 2.1.0 from OpenCFD on both.
3. I have changed the ./bashrc of the slave to have OpenFOAM sourced before exiting the script due to non-interactive session.
4. I have the damBreak case divided into 12 sub-domains using scotch algorithm of decomposePar on both (the same path)
5. The case runs serially and in-parallel on both PCs individually.
6. I have a machines file:
Code:

localhost
username@slaveIP cpu=4

7. Running
Code:

foamJob -p interFoam
yields this error:
Code:

$ foamJob -p -s interFoam
Parallel processing using SYSTEMOPENMPI with 12 processors
Executing: /usr/bin/mpirun -np 12 -hostfile machines /opt/openfoam210/bin/foamExec -prefix /opt interFoam -parallel | tee log
username@IP password:
[1] /*---------------------------------------------------------------------------*\
| =========                |                                                |
| \\      /  F ield        | OpenFOAM: The Open Source CFD Toolbox          |
|  \\    /  O peration    | Version:  2.1.0                                |
|  \\  /    A nd          | Web:      www.OpenFOAM.org                      |
|    \\/    M anipulation  |                                                |
\*---------------------------------------------------------------------------*/
Build  : 2.1.0-0bc225064152
Exec  : interFoam -parallel
Date  : Jun 09 2012
Time  : 14:40:07
Host  : "numubuntu-System-Product-Name"
PID    : 4865

[1]
[1] --> FOAM FATAL IO ERROR:
[1] Expected a ')' or a '}' while reading List, found on line 0 an error
[1]
[1] file: IOstream at line 0.
[1]
[1]    From function Istream::readEndList(const char*)
[1]    in file db/IOstreams/IOstreams/Istream.C at line 159.
[2]
[2]
[2] [1] --> FOAM FATAL IO ERROR:
[2] Expected a ')' or a '}' while reading List, found on line 0 an error
[2]
[2] file: IOstream at line 0.
[2]
[2]    From function Istream::readEndList(const char*)
[2]    in file db/IOstreams/IOstreams/Istream.C at line 159.
[2]
FOAM parallel run exiting
[2]

FOAM parallel run exiting
[1] --------------------------------------------------------------------------
MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------

--------------------------------------------------------------------------
mpirun has exited due to process rank 2 with PID 2105 on
node SLAVE-IP exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[3]
[3] [4]
[4]
[4] --> FOAM FATAL IO ERROR:
[4] Expected a ')' or a '}' while reading List, found on line 0 an error
[4]
[4] file: IOstream at line 0.
[4]
[4]    From function Istream::readEndList(const char*)
[4]    in file db/IOstreams/IOstreams/Istream.C at line 159.
[4]
FOAM parallel run exiting
[4]

[3] [numubuntu-System-Product-Name:04861] 2 more processes have sent help message help-mpi-api.txt / mpi-abort
[numubuntu-System-Product-Name:04861] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

I appreciate any help!

Best regards
Hisham El Safti

wyldckat June 10, 2012 14:23

Greetings Hisham,

Indeed there seems to be some strange detail that is escaping here...

OK, let's try to debug this in parts:
  1. Is the user name the same on both machines? If not, you might want to try using the multiple roots definition in "decomposeParDict". I wrote a blog post about it some time ago: Running OpenFOAM in parallel with different locations for each process
  2. Let's skip the need for password. Follow these instructions for a passwordless access for your own user between machines: http://homepages.inf.ed.ac.uk/imurra...dless_ssh.html
    It's only passwordless if you don't use a password for your key. If yours is a closed internal network and unlikely that an internal access attack is done, then you won't need a password for this private/public key pair.
  3. Specify the number of cpu's for the local machine as well, just in case.
I can't think of any other hypothesis for now.

Best regards,
Bruno

Hisham June 10, 2012 15:21

Hello Bruno

Thanks for your reply. I have the same user name with "sudo" capabilities on both machines. I have specified the cpu count as proposed. I still get the same error. I think there is something to do with version incompatibilities. However, the error does not say which bit has the ) instead of } ... Also how can I view the so called "help messages"?

Thanks
Hisham

Edit: I have open mpi 1.4.3 on both machines

wyldckat June 10, 2012 15:59

Hi Hisham,

OK, try running this command:
Code:

/usr/bin/mpirun -np 12 -hostfile machines /opt/openfoam210/bin/foamExec interFoam -parallel
I removed the "-prefix /opt" from the previous command.

As for the help messages... according to the error message, it says to use the MCA system to set "orte_base_help_aggregate" to 0... Should be something like this:
Code:

/usr/bin/mpirun -np 12 -mca orte_base_help_aggregate 0 -hostfile machines /opt/openfoam210/bin/foamExec interFoam -parallel
For a full help on mpirun: http://www.open-mpi.org/doc/v1.4/man1/mpirun.1.php

In the machines file, define things like this:
Code:

localhost slots=8
slaveIP slots=4

Source: http://www.open-mpi.org/faq/?categor...imple-spmd-run

edit:
Quote:

A nice summary of how-to: OpenFOAM on deux pc post #13
Source: Notes about running OpenFOAM in parallel

Best regards,
Bruno

Hisham June 11, 2012 13:44

Thanks a lot Bruno. I had the chance to use a third PC today and the problem was with one of the PCs. I have changed the machines file as you suggested though.

Thanks again :)


All times are GMT -4. The time now is 14:23.