CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   OpenFOAM Running, Solving & CFD (https://www.cfd-online.com/Forums/openfoam-solving/)
-   -   OpenMPI bash: orted: comand not found error (https://www.cfd-online.com/Forums/openfoam-solving/74806-openmpi-bash-orted-comand-not-found-error.html)

wyldckat July 2, 2010 07:05

Hi Stephane,

OK, a few more possibilities:
  • Do you have permissions on the machine to install OpenMPI into the system? Thus is would make it part of it and reduce the chances of it not being detected! Preferably installing OpenMPI using the system's software/package management... and don't forget to install the -dev part of the OpenMPI package too!
    Then go to OpenFOAM's bashrc file and change from OpenMPI to SYSTEMOPENMPI, if I'm not mistaken... and you might need to rebuild libPstream.so.
  • Try not to use the machines file for defining what machines to use. Let's try running locally only for now!
  • Let's try debugging the environment accessible by the mpirun:
    1. create a file with this in it, e.g. test.sh:
      Code:

      #!/bin/bash
      var=$$
      env > $var.log

    2. save file and run:
      Code:

      chmod +x test.sh
    3. try launching mpirun with our new file, but using foamExec for launching it:
      Code:

      mpirun -n 2 `which foamExec` ./test.sh
      Or 3 or 4 processes, it's up to you, since this is only a test!
    4. this should have created files named pid_number.log, one for each successfully launched test.sh.
    5. now, even if it's only one that has been launched successfully, does its content have references to OpenFOAM's environment? In other words, does it look like your local OpenFOAM environment?
  • Another possibility is to try to use mpiexec or orterun instead of mpirun... although I doubt it will do any difference.
  • Is the folder /shared a folder on a physical mount, is it a user mount or a system wide mount? I say this, because if the folder is a user mount, and due to some strange reason, it might not be visible to the remote mpirun executable.
Best regards,
Bruno

openfoam_user July 2, 2010 09:09

Bruno,

with the below command nothing appens. No file is created.

mpirun -n 2 `which foamExec` ./test.sh

But with the below command 2 files (6630.log and 6631.log) are created.
mpirun -n 2 ./test.sh

I have done another application (hello test)to test mpirun. Maybe you know it.

With the below command I obtain an error message
mpirun --hostfile myhostfile hello

error message:
orted: Command not found.
--------------------------------------------------------------------------
A daemon (pid 6648) died unexpectedly with status 1 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun was unable to cleanly terminate the daemons on the nodes shown
below. Additional manual cleanup may be required - please refer to
the "orte-clean" tool for assistance.
--------------------------------------------------------------------------
cfs6 - daemon did not report back when launched
cfs7 - daemon did not report back when launched
cfs8 - daemon did not report back when launched
cfs9 - daemon did not report back when launched
cfs11 - daemon did not report back when launched
[117]cfs10-sanchi /home/sanchi/test_openmpi % orted: Command not found.
orted: Command not found.
orted: Command not found.
orted: Command not found.

With the below command I obtain an error message
/shared/OpenFOAM/ThirdParty-1.7.0/platforms/linux64Gcc/openmpi-1.4.1/bin/mpirun --hostfile myhostfile hello

Hello World! from process 10 out of 12 on cfs11
Hello World! from process 11 out of 12 on cfs11
Hello World! from process 9 out of 12 on cfs10
Hello World! from process 8 out of 12 on cfs10
Hello World! from process 0 out of 12 on cfs6
Hello World! from process 2 out of 12 on cfs7
Hello World! from process 1 out of 12 on cfs6
Hello World! from process 3 out of 12 on cfs7
Hello World! from process 6 out of 12 on cfs9
Hello World! from process 7 out of 12 on cfs9
Hello World! from process 4 out of 12 on cfs8
Hello World! from process 5 out of 12 on cfs8

Something is going wrong because:

[106]cfs10-sanchi /home/sanchi/test_openmpi % which mpirun
/shared/OpenFOAM/ThirdParty-1.7.0/platforms/linux64Gcc/openmpi-1.4.1/bin/mpirun

Regards,

Stephane.

wyldckat July 2, 2010 09:54

Hi Stephane,

"Hello World!" is a great testing application :D
Quote:

Originally Posted by openfoam_user (Post 265469)
Something is going wrong because:

[106]cfs10-sanchi /home/sanchi/test_openmpi % which mpirun
/shared/OpenFOAM/ThirdParty-1.7.0/platforms/linux64Gcc/openmpi-1.4.1/bin/mpirun

Uhm... remember the post I made a while back:
Quote:

Originally Posted by wyldckat (Post 261037)
You better edit the foamJob script:
Code:

which foamJob
and go to the last lines and where is says:
Code:

echo "Executing: mpirun
change to
Code:

echo  "Executing: $mpirun
Run foamJob like you have done before and now you will know who is foamJob really using!

Which mpirun is OpenFOAM's 1.7 foamJob trying to use?



Also, try using the -x option for launching mpirun. For example:
Code:

mpirun -n 2 -x PATH -x LD_LIBRARY_PATH ./test.sh
If this doesn't work, then the only possible solution should be the next possibility!



By the way, what method are you using for sharing the folder /shared between machines? NFS, sshfs, samba or something else?
My guess is that for some reason, the way that the folder is mounted is only activated on demand. For example:
  • if we just try to simply launch mpirun remotely, even if the path to it is in PATH, the mounting system assumes that the mpirun file should be already visible;
  • but if we say that mpirun is located at /shared/right_here/mpirun, the mounting mechanism responsible for the folder /shared wakes up and really checks if the file really exists!
This is the only valid explanation that I can theorize based on the available clues! That's why I'm asking how are you mounting the folder /shared!

Ah, there is also another possibility: does the folder /shared exist before mounting or is it only created when it's mounted? I've had this particular problem with MSys and Cygwin, but never with Linux... but it's a possibility!

Best regards,
Bruno

openfoam_user July 5, 2010 04:47

Bruno,

the folder /shared exist before mounting.

I can't understand because OF-1.6 and 1.6.x (in the past) was running fine in parallel.

Regards,

Stephane.

openfoam_user July 5, 2010 07:23

Bruno,

I have notice that mpirun of version 1.7.0 has no link !

[102]cfs10-sanchi /home/sanchi % ls -l `which mpirun`
-rwx------ 1 sanchi cfs 106795 2010-07-01 14:47 /shared/OpenFOAM/ThirdParty-1.7.0/platforms/linux64Gcc/openmpi-1.4.1/bin/mpirun

For the previous version 1.6.x the link was:
sanchi@cfs10:~> ls -l `which mpirun`
lrwxrwxrwx 1 sanchi cfs 7 2010-05-31 12:15 /shared/OpenFOAM/ThirdParty-1.6.x/openmpi-1.3.3/platforms/linux64GccDPOpt/bin/mpirun -> orterun

Your comments about that ?

Regards,

Stephane.

wyldckat July 5, 2010 18:13

Hi Stephane,

Sorry for the late reply, but I couldn't answer earlier.

OK, as for the link: I believe you no longer have the link because you copied the orterun file to mpirun, which was one of my instructions to try to isolate/fix the issue.

As for the /shared folder: you didn't say what method do you use for mounting.

As for OpenFOAM 1.6.x was working in parallel before: As far as I can tell, you are still getting nearly the same problem you were getting, but this time it's even worse! I remember you posted some time ago that you weren't able to use mpirun with success, and that with foamJob it did work, albeit rather slow. This time, even foamJob doesn't work.

The only common working point with both OpenFOAM versions is if you state the full path to mpirun when launching the parallel run. And that's why I suspect the mounting mechanism is to blame! Otherwise, there is a bug in OpenMPI... which got worse from OpenMPI 1.3.3 to 1.4.1!!

So, three possibilities remain:
  1. the mounting system used (NFS, sshfs, samba, etc...) is to blame or isn't properly configured;
  2. or you can try building OpenFOAM 1.7.0 or 1.7.x with OpenMPI 1.3.3 from the 1.6.x version.
  3. read and follow the instructions on OpenMPI's FAQ: Where should I install Open MPI? - the more efficient option would be to install OpenMPI locally on each machine, but I suppose it's not possible in the one you use :(

Best regards,
Bruno

openfoam_user July 6, 2010 03:32

Bruno,

the /shared folder is mounted using NFS.

The story is a bit curious.
- At the beginning OF-1.6 and OF-1.6.x were running fine in parallel.
- Then OF-1.6.x was no more running in parallel, but foamJob was running.
- Then foamJob was no more running in parallel.
- Now I have installed OF-1.7.x. It is impossible to launch a case in parallel, even if I use the full path to mpirun.

But, our own flow solver NSMB runs in parallel using /opt/mpich/bin/mpirun.

Stephane.

wyldckat July 6, 2010 04:27

Hi Stephane,

Have you tried using full paths for mpirun and foamExec? Something like this:
Code:

/shared/OpenFOAM/ThirdParty-1.7.x/platforms/linux64Gcc/openmpi-1.4.1/bin/mpirun -np 4 -hostfile machines /shared/OpenFOAM/OpenFOAM-1.7.x/bin/foamExec  interDyMFoam -parallel | tee log
OK, as for NFS - if you can, try to mount with these options:
Code:

sync,dirsync,atime,exec,rw
Source: http://www.toucheatout.net/informati...tuning-options
The idea is to force the NFS system to refresh more actively, because the default options are usually meant for a small access footprint, while these options (the bold ones) should enforce a more strict policy, and if my theory is correct, it will hopefully fix the issue you are having.


As for "one day was working, the next it wasn't", it seems that the master node may have been updated/upgraded while the other nodes didn't... or maybe all did get updated, which could have tampered with your previous settings...

Good luck!
Bruno

openfoam_user July 6, 2010 04:51

Hi Bruno,

Now it works again. I don't know why, but it works again with the 2 following commands:

1.
/shared/OpenFOAM/ThirdParty-1.7.x/platforms/linux64Gcc/openmpi-1.4.1/bin/mpirun -np 8 -hostfile machines /shared/OpenFOAM/OpenFOAM-1.7.x/bin/foamExec simpleFoam -parallel | tee log

2.
foamJob -s -p simpleFoam

Yesterday I have installed OF-1.7.x and this morning I have done git pull and ./Allwmake.

This is the only change between yesterday and today.

Thanks again for all your messages !!!

Best regards,

Stephane.

CFDUser_ January 27, 2015 08:54

Quote:

Originally Posted by fijinx (Post 254017)
Ok I definately got it now! I just added the ..../etc/bashrc as the FIRST line in the .bashrc file (before it calls the if non-interactive do nothing) and it works!

Dear James Baker,

Thankyou. Everything working fine :).

Thanks & Regards,
CFDUser_

chandra shekhar pant March 6, 2020 09:47

Hello All,


I am also facing the same issue, which says:FIPS integrity verification test failed.
orted: Command not found.
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:

* not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default

* lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.

* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to use.

* compilation of the orted with dynamic libraries when static are required
(e.g., on Cray). Please check your configure cmd line and consider using
one of the contrib/platform definitions for your system type.

* an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).
--------------------------------------------------------------------------
when running on a cluster of 2 nodes using

Code:

mpirun/orterun --host n217:16,n219:16 -np 32 --use-hwthread-cpus snappyHexMesh -parallel -overwrite > log.snappyHexMesh
Could any one suggest any thing in this regard, it will be a great help. Thanks a lot!


All times are GMT -4. The time now is 16:21.