CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM > OpenFOAM Running, Solving & CFD

problem about running parallel on cluster

Register Blogs Community New Posts Updated Threads Search

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   July 23, 2014, 01:35
Default problem about running parallel on cluster
  #1
New Member
 
Zhiwei Zheng
Join Date: May 2014
Posts: 23
Rep Power: 11
killsecond is on a distinguished road
Hi all,
Recently, I want to run case on cluster, and meet some problem. Note: just run case on the master note, and other notes are slave.
When I implement "mpirun -np 8 -hostfile machines /home/zhengzw/OpenFOAM/OpenFOAM-2.2.2/bin/foamExec -prefix /home/zhengzw/OpenFOAM pisoFoam -parallel | tee log"

[zhengzw@manager pitzDailyMapped]$ mpirun -np 8 -hostfile machines /home/zhengzw/OpenFOAM/OpenFOAM-2.2.2/bin/foamExec -prefix /home/zhengzw/OpenFOAM pisoFoam -parallel | tee log
Warning: Permanently added 'n01,172.16.1.1' (RSA) to the list of known hosts.
bash: orted: command not found
--------------------------------------------------------------------------
A daemon (pid 16793) died unexpectedly with status 127 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------

When I implement "foamJob -s -p pisoFoam"

[zhengzw@manager pitzDailyMapped]$ foamJob -s -p pisoFoam
Parallel processing using SYSTEMOPENMPI with 8 processors
Executing: /usr/lib64/openmpi/bin/mpirun -np 8 -hostfile machines /home/zhengzw/OpenFOAM/OpenFOAM-2.2.2/bin/foamExec -prefix /home/zhengzw/OpenFOAM pisoFoam -parallel | tee log
Warning: Permanently added 'n01,172.16.1.1' (RSA) to the list of known hosts.
bash: /usr/lib64/openmpi/bin/orted: No such file or directory
--------------------------------------------------------------------------
A daemon (pid 15934) died unexpectedly with status 127 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------

When I implement "/usr/lib64/openmpi/bin/mpirun -np 8 -hostfile machines /home/zhengzw/OpenFOAM/OpenFOAM-2.2.2/bin/foamExec -prefix /home/zhengzw/OpenFOAM pisoFoam -parallel | tee log"

[zhengzw@manager pitzDailyMapped]$ /usr/lib64/openmpi/bin/mpirun -np 8 -hostfile machines /home/zhengzw/OpenFOAM/OpenFOAM-2.2.2/bin/foamExec -prefix /home/zhengzw/OpenFOAM pisoFoam -parallel | tee log
Warning: Permanently added 'n01,172.16.1.1' (RSA) to the list of known hosts.
bash: /usr/lib64/openmpi/bin/orted: No such file or directory
--------------------------------------------------------------------------
A daemon (pid 16822) died unexpectedly with status 127 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------

Howerver the orted exists in the directory "/usr/lib64/openmpi/bin/" because when I implement "ls /usr/lib64/openmpi/bin/"

[zhengzw@manager cylinder2D1]$ ls /usr/lib64/openmpi/bin/
mpic++ mpitests-IMB-MPI1 ompi-profiler otfconfig
mpicc mpitests-osu_acc_latency ompi-ps otfdecompress
mpiCC mpitests-osu_alltoall ompi-server otfdump
mpicc-vt mpitests-osu_bcast ompi-top otfinfo
mpiCC-vt mpitests-osu_bibw opal_wrapper otfmerge
mpic++-vt mpitests-osu_bw opari otfprofile
mpicxx mpitests-osu_get_bw orte-bootproxy.sh otfshrink
mpicxx-vt mpitests-osu_get_latency ortec++ vtc++
mpiexec mpitests-osu_latency ortecc vtcc
mpif77 mpitests-osu_latency_mt orteCC vtCC
mpif77-vt mpitests-osu_mbw_mr orte-clean vtcxx
mpif90 mpitests-osu_multi_lat orted vtf77
mpif90-vt mpitests-osu_put_bibw orte-iof vtf90
mpirun mpitests-osu_put_bw orte-ps vtfilter
mpitests-com mpitests-osu_put_latency orterun vtunify
mpitests-glob ompi-clean orte-top vtunify-mpi
mpitests-globalop ompi_info orte_wrapper_script vtwrapper
mpitests-IMB-EXT ompi-iof otfaux
mpitests-IMB-IO ompi-probe otfcompress

I have no ideal, help me! Any help will be appreciated!
killsecond is offline   Reply With Quote

Old   July 23, 2014, 14:50
Default
  #2
cdm
Member
 
Join Date: May 2013
Location: Canada
Posts: 32
Rep Power: 12
cdm is on a distinguished road
It's difficult to help with parallel case troubleshooting because there are many potential sources of error, but I can try to help point you toward things to investigate.

Make sure that OpenFOAM is installed in the exact same location on all computers. It looks like you have it installed in /home/zhengzw/OpenFOAM/ on your workstation. foamExec passes the environment variables to the slave machines from your workstation, so when the slaves attempt to launch OpenFOAM, they are looking in their own filesystem under /home/zhengzw/OpenFOAM/ to find the OpenFOAM libraries and executables. This is why it's preferable to install OpenFOAM in a standard location (like /opt/ or similar) as opposed to the local user directory when you are running across multiple machines.

Also, if you haven't already, make sure you have password-less login enabled by copying your SSH keys appropriately across all your machines. Otherwise, OpenFOAM won't be able to read/write across nodes.

Finally, if you're trying to use a custom library, you'll have to make it accessible across all nodes as well.
cdm is offline   Reply With Quote

Old   July 23, 2014, 20:36
Default
  #3
New Member
 
Zhiwei Zheng
Join Date: May 2014
Posts: 23
Rep Power: 11
killsecond is on a distinguished road
Hi cdm,

thanks for your reply!

Quote:
Originally Posted by cdm View Post
It's difficult to help with parallel case troubleshooting because there are many potential sources of error, but I can try to help point you toward things to investigate.

Make sure that OpenFOAM is installed in the exact same location on all computers. It looks like you have it installed in /home/zhengzw/OpenFOAM/ on your workstation. foamExec passes the environment variables to the slave machines from your workstation, so when the slaves attempt to launch OpenFOAM, they are looking in their own filesystem under /home/zhengzw/OpenFOAM/ to find the OpenFOAM libraries and executables. This is why it's preferable to install OpenFOAM in a standard location (like /opt/ or similar) as opposed to the local user directory when you are running across multiple machines.
The adminstrator ask me to creat a user of myself on the master node, and run my own case in the user, so I just install the OpenFOAM in my user,not on all the computers. If I install the OpenFOAM on the root user, is it better?
killsecond is offline   Reply With Quote

Old   July 23, 2014, 21:13
Default
  #4
cdm
Member
 
Join Date: May 2013
Location: Canada
Posts: 32
Rep Power: 12
cdm is on a distinguished road
It really depends on your cluster setup. I'd discuss with the admin about how the cluster is distributed. OpenFOAM has to be installed on each computer that you expect to use for running in parallel. The admin should be able to get you set up properly, as it is likely not be something you can do yourself as a local user.
cdm is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Problem running with 3 CPU's in parallel saurabh3737 OpenFOAM Running, Solving & CFD 5 August 16, 2012 17:05
Script to Run Parallel Jobs in Rocks Cluster asaha OpenFOAM Running, Solving & CFD 12 July 4, 2012 22:51
RSH problem for parallel running in CFX Nicola CFX 5 June 18, 2012 18:31
Problem editing files when running in parallel Ladnam OpenFOAM 2 September 19, 2011 03:35
Problem in running Parallel mamaly60 OpenFOAM Running, Solving & CFD 1 April 19, 2010 11:11


All times are GMT -4. The time now is 09:58.