CFD Online Discussion Forums

CFD Online Discussion Forums (http://www.cfd-online.com/Forums/)
-   OpenFOAM Installation (http://www.cfd-online.com/Forums/openfoam-installation/)
-   -   OpenFOAM on cluster (http://www.cfd-online.com/Forums/openfoam-installation/57301-openfoam-cluster.html)

markh83 October 17, 2008 07:50

Hi.. I'm relatively new to
 
Hi..

I'm relatively new to using OpenFOAM, especially using the cluster that has been put at my disposal.. It's a rather small cluster, consisting of 5 nodes with 2 processors each.. I've read in the user manual about decomposing the domain, and i have done so, using decomposePar, and this dict-file:

// Mesh decomposition control dictionary

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //

FoamFile
{
version 0.5;
format ascii;

root "ROOT";
case "CASE";
instance "system";
local "";

class dictionary;

object decompositionDict;
}

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //

numberOfSubdomains 4;

//method simple;
method hierarchical;
//method metis;
//method manual;

simpleCoeffs
{
n (2 2 1);
delta 0.001;
}

hierarchicalCoeffs
{
n (2 2 1);
delta 0.001;
order xyz;
}

manualCoeffs
{
dataFile "decompositionData";
}




Thereafter i have made a hostfile (called 'machines'), listing the two nodes I would like to test on:

i14cluster-21 cpu=2
i14cluster-22 cpu=2

I've copied the case and the hostfile to the cluster-master, and via ssh I log into it.. From there do a mpirun like this:

mpirun --hostfile machines -np 4 turbFoam-vt . mesh2 -parallel

(I've edited the turbfoam for variable timesteps, and renamed it, but that part works smoothly)

But when I do this, I get an error:

[tepe3-1@i14cluster Cases]$ mpirun --hostfile machines -np 4 turbFoam_vt . mesh2
-parallel
tepe3-1@i14cluster-21's password: tepe3-1@i14cluster-22's password:
Could not chdir to home directory /home/tepe3-1: No such file or directory
bash: orted: command not found
[i14cluster:31446] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_c
mds.c at line 275
[i14cluster:31446] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at l
ine 1164
[i14cluster:31446] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line
90
[i14cluster:31446] ERROR: A daemon on node i14cluster-21 failed to start as expe
cted.
[i14cluster:31446] ERROR: There may be more information available from
[i14cluster:31446] ERROR: the remote shell (see above).
[i14cluster:31446] ERROR: The daemon exited unexpectedly with status 127.
[i14cluster:31446] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_c
mds.c at line 188
[i14cluster:31446] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at l
ine 1196
--------------------------------------------------------------------------
mpirun was unable to cleanly terminate the daemons for this job. Returned value
Timeout instead of ORTE_SUCCESS.




Can anybody give me a hint of what I'm doing wrong?

jens_klostermann October 17, 2008 19:09

Hi Mark, 1. try parallel ru
 
Hi Mark,

1. try parallel running on ONE node! if it works try.

2. check if you have set up passwordless ssh to the nodes

3. check if your home with your case is available on every node (usually shared raid e.g. through NFS)

4. check if compiled and or linked your mpi-lib, e.g. openmpi-1.2.6 or vendors mpi

Jens


All times are GMT -4. The time now is 11:51.