CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   OpenFOAM Running, Solving & CFD (https://www.cfd-online.com/Forums/openfoam-solving/)
-   -   Parallel & hostfile (https://www.cfd-online.com/Forums/openfoam-solving/65134-parallel-hostfile.html)

-mAx- June 5, 2009 03:13

Parallel & hostfile
 
hello,
I am trying to set a parallel calculation, but experiencing one issue
I have 2 pc connected via ethernet (master with 192.168.0.1 & node with 192.168.0.14)
ssh runs without need to enter password
from master--> "ssh node ls" is ok
from node--> "ssh node ls" is ok
from master i export /home/user/OpenFOAM
from node I mount it without problem.
I checked the helloworld example successfully
OF is on master installed (under /home/user), and it runs successfully in serial mode.
I can decompose one model into 2 subdomains without problem
I created the "machines" file as described in the doc, and from here I get trouble.
"machines" looks like
192.168.0.1
192.168.0.14
If I run my model in parallel
>mpirun --hostfile machines simpleFoam -parallel > log &
I get the message error "connect() failed with errno=113
Now, in machines, if I switch the IP-address order like
192.168.0.14
192.168.0.1
it runs..... (the log shows that the host is node and the slave is master)
any idea?
Thanks a lot in advance

shamantic June 18, 2009 15:54

Hello, I have the similar problem using OpenMPI for the lesCavitatingFoam tutorial.

I have two different machines for OpenFOAM 1.5:
1. foam-8: Suse 10.3 64bit 4GB gcc 4.2.1, OpenFoam installed from binary dp64 distribution

2. foam-9: Ubuntu Studio 8.04 4GB 32bit gcc 4.3.1, OpenFoam installed from binary dp distribution

Both installations pass the foamInstallationTest (foam-8 has gcc issue, never mind?). Maybe you check this too, -mAx-?

For both machines, it is possible to issue ssh commands for both machines without entering a password.

/home/rae/OpenFOAM/rae-1.5/ is a nfs share provided by foam-9 to foam-8

Running :

mpirun --hostfile system/machines -np 4 lesCavitatingFoam -case /home/rae/OpenFOAM/rae-1.5/tutorials/lesCavitatingFoam/throttle3D -parallel

depending on the machines file, gives following results:

1. system/machines contains the submitting machine name only: 4 Processes run on 1 host (2 cores) successfully

2. system/machines contains:

foam-8 cpu=2
foam-9 cpu=2

where foam-8 is the submitting host then: orted starts up immediately on both hosts. After very long time, on both machines two processes lesCavitationFoam execute, but no CPU load, and finally comes the error report:

/*---------------------------------------------------------------------------*\
| ========= | |
| \\ / F ield | OpenFOAM: The Open Source CFD Toolbox |
| \\ / O peration | Version: 1.5 |
| \\ / A nd | Web: http://www.OpenFOAM.org |
| \\/ M anipulation | |
\*---------------------------------------------------------------------------*/
Exec : lesCavitatingFoam -case /home/rae/OpenFOAM/rae-1.5/tutorials/lesCavitatingFoam/throttle3D -parallel
Date : Jun 18 2009
Time : 18:27:33
Host : foam-8
PID : 11518
[1]
[1]
[1] Expected a ')' or a '}' while reading List, found on line 0 the word 'o'
[1]
[1] file: IOstream at line [3] 0.
[1]
[1] From function Istream::readEndList(const char*)
[1] in file db/IOstreams/IOstreams/Istream.C
[3]
[3] at line 159.
[1]
FOAM parallel run exiting
[1]
Expected a ')' or a '}' while reading List, found on line 0 the word 'o'
[3]
[foam-9:10614] MPI_ABORT invoked on rank 1 in communicator MPI_COMM_WORLD with errorcode 1
[3] file: IOstream at line 0.
[3]
[3] From function Istream::readEndList(const char*)
[3] in file db/IOstreams/IOstreams/Istream.C at line 159.
[3]
FOAM parallel run exiting
[3]
[foam-9:10615] MPI_ABORT invoked on rank 3 in communicator MPI_COMM_WORLD with errorcode 1
mpirun noticed that job rank 0 with PID 11518 on node foam-8 exited on signal 15 (Terminated).
1 additional process aborted (not shown)

---------
If I omit the "-parallel", 4 processes run as expected, but they run all the same stuff I guess. Thus, mpirun does its job correctly?

Does this description fit your experience? Any Ideas? Thanks

-mAx- June 19, 2009 00:43

my problem is "solved".
I don't know why I had this problem, but now it runs perfectly. Test are running under 8 machines without problem.
For info, I don't install OF on each machine, I share the OF-installation with NFS. So as you mentionned that both machines succeded the foamInstallationTest, then you may do it only for the master (as you use NFS too)


All times are GMT -4. The time now is 19:11.