CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > OpenFOAM Running, Solving & CFD

Parallel & hostfile

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Display Modes
Old   June 5, 2009, 03:13
Default Parallel & hostfile
  #1
Super Moderator
 
-mAx-'s Avatar
 
Maxime Perelli
Join Date: Mar 2009
Location: Switzerland
Posts: 2,961
Rep Power: 30
-mAx- will become famous soon enough
hello,
I am trying to set a parallel calculation, but experiencing one issue
I have 2 pc connected via ethernet (master with 192.168.0.1 & node with 192.168.0.14)
ssh runs without need to enter password
from master--> "ssh node ls" is ok
from node--> "ssh node ls" is ok
from master i export /home/user/OpenFOAM
from node I mount it without problem.
I checked the helloworld example successfully
OF is on master installed (under /home/user), and it runs successfully in serial mode.
I can decompose one model into 2 subdomains without problem
I created the "machines" file as described in the doc, and from here I get trouble.
"machines" looks like
192.168.0.1
192.168.0.14
If I run my model in parallel
>mpirun --hostfile machines simpleFoam -parallel > log &
I get the message error "connect() failed with errno=113
Now, in machines, if I switch the IP-address order like
192.168.0.14
192.168.0.1
it runs..... (the log shows that the host is node and the slave is master)
any idea?
Thanks a lot in advance
__________________
In memory of my friend Hervé: CFD engineer & freerider
-mAx- is offline   Reply With Quote

Old   June 18, 2009, 15:54
Default
  #2
New Member
 
Klaus Rädecke
Join Date: Jun 2009
Location: Rüsselsheim, Germany
Posts: 4
Rep Power: 8
shamantic is on a distinguished road
Hello, I have the similar problem using OpenMPI for the lesCavitatingFoam tutorial.

I have two different machines for OpenFOAM 1.5:
1. foam-8: Suse 10.3 64bit 4GB gcc 4.2.1, OpenFoam installed from binary dp64 distribution

2. foam-9: Ubuntu Studio 8.04 4GB 32bit gcc 4.3.1, OpenFoam installed from binary dp distribution

Both installations pass the foamInstallationTest (foam-8 has gcc issue, never mind?). Maybe you check this too, -mAx-?

For both machines, it is possible to issue ssh commands for both machines without entering a password.

/home/rae/OpenFOAM/rae-1.5/ is a nfs share provided by foam-9 to foam-8

Running :

mpirun --hostfile system/machines -np 4 lesCavitatingFoam -case /home/rae/OpenFOAM/rae-1.5/tutorials/lesCavitatingFoam/throttle3D -parallel

depending on the machines file, gives following results:

1. system/machines contains the submitting machine name only: 4 Processes run on 1 host (2 cores) successfully

2. system/machines contains:

foam-8 cpu=2
foam-9 cpu=2

where foam-8 is the submitting host then: orted starts up immediately on both hosts. After very long time, on both machines two processes lesCavitationFoam execute, but no CPU load, and finally comes the error report:

/*---------------------------------------------------------------------------*\
| ========= | |
| \\ / F ield | OpenFOAM: The Open Source CFD Toolbox |
| \\ / O peration | Version: 1.5 |
| \\ / A nd | Web: http://www.OpenFOAM.org |
| \\/ M anipulation | |
\*---------------------------------------------------------------------------*/
Exec : lesCavitatingFoam -case /home/rae/OpenFOAM/rae-1.5/tutorials/lesCavitatingFoam/throttle3D -parallel
Date : Jun 18 2009
Time : 18:27:33
Host : foam-8
PID : 11518
[1]
[1]
[1] Expected a ')' or a '}' while reading List, found on line 0 the word 'o'
[1]
[1] file: IOstream at line [3] 0.
[1]
[1] From function Istream::readEndList(const char*)
[1] in file db/IOstreams/IOstreams/Istream.C
[3]
[3] at line 159.
[1]
FOAM parallel run exiting
[1]
Expected a ')' or a '}' while reading List, found on line 0 the word 'o'
[3]
[foam-9:10614] MPI_ABORT invoked on rank 1 in communicator MPI_COMM_WORLD with errorcode 1
[3] file: IOstream at line 0.
[3]
[3] From function Istream::readEndList(const char*)
[3] in file db/IOstreams/IOstreams/Istream.C at line 159.
[3]
FOAM parallel run exiting
[3]
[foam-9:10615] MPI_ABORT invoked on rank 3 in communicator MPI_COMM_WORLD with errorcode 1
mpirun noticed that job rank 0 with PID 11518 on node foam-8 exited on signal 15 (Terminated).
1 additional process aborted (not shown)

---------
If I omit the "-parallel", 4 processes run as expected, but they run all the same stuff I guess. Thus, mpirun does its job correctly?

Does this description fit your experience? Any Ideas? Thanks
shamantic is offline   Reply With Quote

Old   June 19, 2009, 00:43
Default
  #3
Super Moderator
 
-mAx-'s Avatar
 
Maxime Perelli
Join Date: Mar 2009
Location: Switzerland
Posts: 2,961
Rep Power: 30
-mAx- will become famous soon enough
my problem is "solved".
I don't know why I had this problem, but now it runs perfectly. Test are running under 8 machines without problem.
For info, I don't install OF on each machine, I share the OF-installation with NFS. So as you mentionned that both machines succeded the foamInstallationTest, then you may do it only for the master (as you use NFS too)
__________________
In memory of my friend Hervé: CFD engineer & freerider
-mAx- is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Script to Run Parallel Jobs in Rocks Cluster asaha OpenFOAM Running, Solving & CFD 12 July 4, 2012 22:51
Performance of GGI case in parallel hannes OpenFOAM Running, Solving & CFD 26 August 3, 2011 03:07
HP MPI warning...Distributed parallel processing Peter CFX 10 May 14, 2011 06:17
IcoFoam parallel woes msrinath80 OpenFOAM Running, Solving & CFD 9 July 22, 2007 02:58
Parallel Computing Classes at San Diego Supercomputer Center Jan. 20-22 Amitava Majumdar Main CFD Forum 0 January 5, 1999 13:00


All times are GMT -4. The time now is 09:20.