Parallel & hostfile
hello,
I am trying to set a parallel calculation, but experiencing one issue I have 2 pc connected via ethernet (master with 192.168.0.1 & node with 192.168.0.14) ssh runs without need to enter password from master--> "ssh node ls" is ok from node--> "ssh node ls" is ok from master i export /home/user/OpenFOAM from node I mount it without problem. I checked the helloworld example successfully OF is on master installed (under /home/user), and it runs successfully in serial mode. I can decompose one model into 2 subdomains without problem I created the "machines" file as described in the doc, and from here I get trouble. "machines" looks like 192.168.0.1 192.168.0.14 If I run my model in parallel >mpirun --hostfile machines simpleFoam -parallel > log & I get the message error "connect() failed with errno=113 Now, in machines, if I switch the IP-address order like 192.168.0.14 192.168.0.1 it runs..... (the log shows that the host is node and the slave is master) any idea? Thanks a lot in advance |
Hello, I have the similar problem using OpenMPI for the lesCavitatingFoam tutorial.
I have two different machines for OpenFOAM 1.5: 1. foam-8: Suse 10.3 64bit 4GB gcc 4.2.1, OpenFoam installed from binary dp64 distribution 2. foam-9: Ubuntu Studio 8.04 4GB 32bit gcc 4.3.1, OpenFoam installed from binary dp distribution Both installations pass the foamInstallationTest (foam-8 has gcc issue, never mind?). Maybe you check this too, -mAx-? For both machines, it is possible to issue ssh commands for both machines without entering a password. /home/rae/OpenFOAM/rae-1.5/ is a nfs share provided by foam-9 to foam-8 Running : mpirun --hostfile system/machines -np 4 lesCavitatingFoam -case /home/rae/OpenFOAM/rae-1.5/tutorials/lesCavitatingFoam/throttle3D -parallel depending on the machines file, gives following results: 1. system/machines contains the submitting machine name only: 4 Processes run on 1 host (2 cores) successfully 2. system/machines contains: foam-8 cpu=2 foam-9 cpu=2 where foam-8 is the submitting host then: orted starts up immediately on both hosts. After very long time, on both machines two processes lesCavitationFoam execute, but no CPU load, and finally comes the error report: /*---------------------------------------------------------------------------*\ | ========= | | | \\ / F ield | OpenFOAM: The Open Source CFD Toolbox | | \\ / O peration | Version: 1.5 | | \\ / A nd | Web: http://www.OpenFOAM.org | | \\/ M anipulation | | \*---------------------------------------------------------------------------*/ Exec : lesCavitatingFoam -case /home/rae/OpenFOAM/rae-1.5/tutorials/lesCavitatingFoam/throttle3D -parallel Date : Jun 18 2009 Time : 18:27:33 Host : foam-8 PID : 11518 [1] [1] [1] Expected a ')' or a '}' while reading List, found on line 0 the word 'o' [1] [1] file: IOstream at line [3] 0. [1] [1] From function Istream::readEndList(const char*) [1] in file db/IOstreams/IOstreams/Istream.C [3] [3] at line 159. [1] FOAM parallel run exiting [1] Expected a ')' or a '}' while reading List, found on line 0 the word 'o' [3] [foam-9:10614] MPI_ABORT invoked on rank 1 in communicator MPI_COMM_WORLD with errorcode 1 [3] file: IOstream at line 0. [3] [3] From function Istream::readEndList(const char*) [3] in file db/IOstreams/IOstreams/Istream.C at line 159. [3] FOAM parallel run exiting [3] [foam-9:10615] MPI_ABORT invoked on rank 3 in communicator MPI_COMM_WORLD with errorcode 1 mpirun noticed that job rank 0 with PID 11518 on node foam-8 exited on signal 15 (Terminated). 1 additional process aborted (not shown) --------- If I omit the "-parallel", 4 processes run as expected, but they run all the same stuff I guess. Thus, mpirun does its job correctly? Does this description fit your experience? Any Ideas? Thanks |
my problem is "solved".
I don't know why I had this problem, but now it runs perfectly. Test are running under 8 machines without problem. For info, I don't install OF on each machine, I share the OF-installation with NFS. So as you mentionned that both machines succeded the foamInstallationTest, then you may do it only for the master (as you use NFS too) |
All times are GMT -4. The time now is 19:11. |