![]() |
Running OpenFOAM on a Cluster
Hi
im new to openfoam we have a cluster with one master node nd a client node with passwordless ssh enabled nd running openfoamv6 on both the machines with the case directories present in both nodes i get the following error when i try to run the case with foamJob. I read the other post where i got to know that it is environment problem so how to i set the environment properly? your help will be highly appreciated the following is the error. mpirun was unable to find the specified executable file, and therefore did not launch the job. This error was first reported for process rank 3; it may have occurred for other processes as well. NOTE: A common cause for this error is misspelling a mpirun command line parameter option (remember that mpirun interprets the first unrecognized command line token as the executable). Node: client1 Executable: /opt/openfoam6/bin/foamJob regards, Rishab |
Have you tried to run the case separately on each machine? Can you specify the command you have type?
|
Hi,
yes ive tried running the case individually on the machines it runs perfectly the command is mpirun --hostfile machines -np 6 foamJob simpleFoam -parallel regards Rishab |
U don't need foamJob in this command. Just type "mpirun --hostfile machines -np 6 simpleFoam -parallel"
|
yes ive done that too...i get the same error!
|
Can you type "which simpleFoam" on both computers and show us the output
|
this is what i get when i type which simpleFoam on master and on client
/opt/openfoam6/platforms/linux64GccDPInt32Opt/bin/simpleFoam |
What error does it show when you type "mpirun --hostfile machines -np 6 simpleFoam -parallel". Can you paste here?
|
this is the error i get when i type mpirun --hostfile machines np -6 simpleFoam -parallel
mpirun was unable to find the specified executable file, and therefore did not launch the job. This error was first reported for process rank 3; it may have occurred for other processes as well. NOTE: A common cause for this error is misspelling a mpirun command line parameter option (remember that mpirun interprets the first unrecognized command line token as the executable). Node: client1 Executable: /opt/openfoam6/bin/simpleFoam |
I can see that it is trying to look for simpleFoam at /opt/openfoam6/bin/simpleFoam but the actual location is /opt/openfoam6/platforms/linux64GccDPInt32Opt/bin/simpleFoam on client 1, so it says cannot find the executable file.
How did you install OpenFoam? Do you use the same method to install them on both computers? |
yes ive used the same method to install openfoam described in this website
https://openfoam.org/download/6-ubuntu/ |
Hi
when i type "mpirun --hostfile machines -np 6 simpleFoam -parallel" i get the following error [mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:03141] [[57312,0],0] usock_peer_send_blocking: send() to socket 39 failed: Broken pipe (32) [mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:03141] [[57312,0],0] ORTE_ERROR_LOG: Unreachable in file oob_usock_connection.c at line 316 [mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:03141] [[57312,0],0]-[[57312,1],0] usock_peer_accept: usock_peer_send_connect_ack failed -------------------------------------------------------------------------- mpirun was unable to find the specified executable file, and therefore did not launch the job. This error was first reported for process rank 3; it may have occurred for other processes as well. NOTE: A common cause for this error is misspelling a mpirun command line parameter option (remember that mpirun interprets the first unrecognized command line token as the executable). Node: client1 Executable: /opt/openfoam6/platforms/linux64GccDPInt32Opt/bin/simpleFoam when i type "mpirun --hostfile machines -np 6 foamJob simpleFoam -parallel" i get the following error Application : simpleFoam Executing: /opt/openfoam6/platforms/linux64GccDPInt32Opt/bin/simpleFoam -parallel > log 2>&1 & Application : simpleFoam Executing: /opt/openfoam6/platforms/linux64GccDPInt32Opt/bin/simpleFoam -parallel > log 2>&1 & -------------------------------------------------------------------------- mpirun was unable to find the specified executable file, and therefore did not launch the job. This error was first reported for process rank 3; it may have occurred for other processes as well. NOTE: A common cause for this error is misspelling a mpirun command line parameter option (remember that mpirun interprets the first unrecognized command line token as the executable). Node: client1 Executable: /opt/openfoam6/bin/foamJob |
Let's first make sure you can run on one node. What do you get when you run just this:
runParallel simpleFoam -parallel mpirun -np 3 simpleFoam -parallel And what do you get when you run: mpirun --hostfile machines -np 6 hostname |
Hi
when i run "runParallel simpleFoam -parallel" this is what i get :- runParallel: command not found when i run "mpirun -np 3 simpleFoam -parallel" the solver starts running this is what i get :- /*---------------------------------------------------------------------------*\ ========= | \\ / F ield | OpenFOAM: The Open Source CFD Toolbox \\ / O peration | Website: https://openfoam.org \\ / A nd | Version: 6 \\/ M anipulation | \*---------------------------------------------------------------------------*/ Build : 6-1a0c91b3baa8 Exec : simpleFoam -parallel Date : Aug 23 2018 Time : 21:54:51 Host : "mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:03141" PID : 9239 I/O : uncollated Case : /home/mpiuser/OpenFOAM/mpiuser-6/run/tutorials/incompressible/simpleFoam/24-30-8.50 nProcs : 2 Slaves : 1("rishabghombal-HP-15-Notebook-PC.9240") Pstream initialized with: floatTransfer : 0 nProcsSimpleSum : 0 commsType : nonBlocking polling iterations : 0 sigFpe : Enabling floating point exception trapping (FOAM_SIGFPE). fileModificationChecking : Monitoring run-time modified files using timeStampMaster (fileModificationSkew 10) allowSystemOperations : Allowing user-supplied system call operations // * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * // Create time Overriding OptimisationSwitches according to controlDict maxThreadFileBufferSize 2e+09; maxMasterFileBufferSize 2e+09; Create mesh for time = 0 SIMPLE: Convergence criteria found p: tolerance 1e-05 U: tolerance 1e-05 "(k|epsilon|)": tolerance 1e-05 Reading field p Reading field U Reading/calculating face flux field phi Selecting incompressible transport model Newtonian Selecting turbulence model type RAS Selecting RAS turbulence model kEpsilon RAS { RASModel kEpsilon; turbulence on; printCoeffs on; Cmu 0.09; C1 1.44; C2 1.92; C3 0; sigmak 1; sigmaEps 1.3; } No MRF models present No finite volume options present Starting time loop Time = 1 smoothSolver: Solving for Ux, Initial residual = 1, Final residual = 0.0205988, No Iterations 4 smoothSolver: Solving for Uy, Initial residual = 1, Final residual = 0.0245561, No Iterations 4 smoothSolver: Solving for Uz, Initial residual = 1, Final residual = 0.0245738, No Iterations 4 GAMG: Solving for p, Initial residual = 1, Final residual = 0.0767341, No Iterations 5 time step continuity errors : sum local = 0.027811, global = 0.0131447, cumulative = 0.0131447 smoothSolver: Solving for epsilon, Initial residual = 0.0859358, Final residual = 0.00617931, No Iterations 3 bounding epsilon, min: -1031.63 max: 13226.2 average: 851.478 smoothSolver: Solving for k, Initial residual = 1, Final residual = 0.0955549, No Iterations 6 ExecutionTime = 60.07 s ClockTime = 60 s Time = 2 when i run "mpirun --hostfile machines -np 6 hostname" this is what i get :- mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:03141 which is the (master) mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:03141 which is the (client) |
Quote:
|
these are the contents of my machines file
master slots=3 client slots=3 when i type "mpirun -V" i get :- mpirun (Open MPI) 2.1.1 Report bugs to http://www.open-mpi.org/community/help/ |
Try this:
mpirun -np 6 -hostfile machines hostname It should output: master master master client client client |
when i type "mpirun -np 6 -hostfile machines hostname" this is what i get :-
mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:03141 mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:03141 mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:03141 mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:7604 mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:7604 mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:7604 just like master master master client client client but instead of master it gives out mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:03141 which is the master node and instead of client it gives out mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:7604 which is the client node |
Ok, that looks correct . What you showed earlier was only:
master master Now try: source /path/toopenfoam/installation/etc/bashrc run cd to job folder then do: mpirun -np 6 -hostfile machines simpleFoam -parallel |
when i type "source /opt/openfoam6/etc/bashrc" i dont get anything like nothing happens
when i type "mpirun -np 6 -hostfile machines simpleFoam -parallel" this is what i get:- [mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:03141] [[57312,0],0] usock_peer_send_blocking: send() to socket 39 failed: Broken pipe (32) [mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:03141] [[57312,0],0] ORTE_ERROR_LOG: Unreachable in file oob_usock_connection.c at line 316 [mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:03141] [[57312,0],0]-[[57312,1],0] usock_peer_accept: usock_peer_send_connect_ack failed -------------------------------------------------------------------------- mpirun was unable to find the specified executable file, and therefore did not launch the job. This error was first reported for process rank 3; it may have occurred for other processes as well. NOTE: A common cause for this error is misspelling a mpirun command line parameter option (remember that mpirun interprets the first unrecognized command line token as the executable). Node: client Executable: /opt/openfoam6/platforms/linux64GccDPInt32Opt/bin/simpleFoam |
Quote:
|
Hi
when i type "mpirun -np 6 -hostfile machines simpleFoam -parallel" this is what i get:- [mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:20848] [[33493,0],0] usock_peer_send_blocking: send() to socket 41 failed: Broken pipe (32) [mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:20848] [[33493,0],0] ORTE_ERROR_LOG: Unreachable in file oob_usock_connection.c at line 316 [mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:20848] [[33493,0],0]-[[33493,1],0] usock_peer_accept: usock_peer_send_connect_ack failed -------------------------------------------------------------------------- mpirun was unable to find the specified executable file, and therefore did not launch the job. This error was first reported for process rank 3; it may have occurred for other processes as well. NOTE: A common cause for this error is misspelling a mpirun command line parameter option (remember that mpirun interprets the first unrecognized command line token as the executable). Node: client1 Executable: /opt/openfoam6/platforms/linux64GccDPInt32Opt/bin/simpleFoam -------------------------------------------------------------------------- 3 total processes failed to start when i type "mpirun -np 6 -hostfile machines foamJob simpleFoam -parallel" this is what i get:- Application : simpleFoam Executing: /opt/openfoam6/platforms/linux64GccDPInt32Opt/bin/simpleFoam -parallel > log 2>&1 & Application : simpleFoam Application : simpleFoam Executing: /opt/openfoam6/platforms/linux64GccDPInt32Opt/bin/simpleFoam -parallel > log 2>&1 & Executing: /opt/openfoam6/platforms/linux64GccDPInt32Opt/bin/simpleFoam -parallel > log 2>&1 & -------------------------------------------------------------------------- mpirun was unable to find the specified executable file, and therefore did not launch the job. This error was first reported for process rank 3; it may have occurred for other processes as well. NOTE: A common cause for this error is misspelling a mpirun command line parameter option (remember that mpirun interprets the first unrecognized command line token as the executable). Node: client1 Executable: /opt/openfoam6/bin/foamJob -------------------------------------------------------------------------- 3 total processes failed to start Along with a log file which says :- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): ompi_mpi_init: ompi_rte_init failed --> Returned "(null)" (-43) instead of "Success" (0) -------------------------------------------------------------------------- *** An error occurred in MPI_Init_thread *** on a NULL communicator *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, *** and potentially your MPI job) [mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:20554] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed! |
yes i did customise it and when i type "source /opt/openfoam6/etc/bashrc" nothing happens i dont get any output
then when i type "mpirun -np 6 -hostfile machines simpleFoam -parallel" this is what i get:- [mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:03141] [[57312,0],0] usock_peer_send_blocking: send() to socket 39 failed: Broken pipe (32) [mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:03141] [[57312,0],0] ORTE_ERROR_LOG: Unreachable in file oob_usock_connection.c at line 316 [mpiuser-HP-ProDesk-400-G2-MT-TPM-DP:03141] [[57312,0],0]-[[57312,1],0] usock_peer_accept: usock_peer_send_connect_ack failed -------------------------------------------------------------------------- mpirun was unable to find the specified executable file, and therefore did not launch the job. This error was first reported for process rank 3; it may have occurred for other processes as well. NOTE: A common cause for this error is misspelling a mpirun command line parameter option (remember that mpirun interprets the first unrecognized command line token as the executable). Node: client Executable: /opt/openfoam6/platforms/linux64GccDPInt32Opt/bin/simpleFoam |
Quote:
For example here's how the bashrc file looks like for me: source /opt/apps/OpenFOAM/OpenFOAM-v1712/etc/bashrc |
Hi
yes the line is there in bash file in both nodes ie master and client source /opt/openfoam6/etc/bashrc this is the line in my case |
Rank 3 is the process that starts on the slave. So probably some paths are different on the slave node. Is /opt a NFS share or did you install it separately on both machines?
At this point I would remove the installation from the slave and just make /opt an NFS share. |
Quote:
|
Hi,feacluster
opt is not a nfs shared....i have not installed nfs ive installed openfoam separately inside opt |
hi,hokhay
yes i placed the source line in the first line of the bash file and also in etc/profile and also in ~/.profile this did the trick thanks a lot for you support feacluster and hokhay! u guys are my heroes! |
Hi
the solver is running but i ran out of memory in just 10 iterations! can anyone explain why this is happening? my case has approximately 7 million cells and my computer config is :- 4GB RAM 500GB HD 4core i5 Processor running at 3.8GHz ie four physical cores and two logical cores i get a dialogue box after 10 iterations saying simpleFoam stopped unexpectedly because it ran out of memory! if i have to increase memory what should i upgrade in my computers!? my decomposeParDict file has the distributed option set to "no" is this causing this problem? |
I think it's just purely too little RAM and 7 millions cell is not small number. My rough guess for 4GB ram, you maybe able to run 2millions cell simulation.
To be honest, your PC is not up to the job for practical CFD. A 7 millions cells sim would take at least 2 days to complete, depending on the convergence rate. You need to get a brand new PC with at least 16GB Ram |
Hi
can you please explain a way to calculate these things maybe not exactly but roughly because this is just a coarse mesh and i will significantly refine my mesh for grid independence which will also increase the number of cells in future and i will also do some fluid structure interaction simulation. so if i need to upgrade to a new computer i need to decide the specs, and im open to even buying a server or setup another cluster whichever gives me a better performance. all i know is that if the number of cells reduce to less than 50k cells per processor openfoam does not solve it in parallel. my target is to solve 30 million cells in less than 2 hours. The university im studying right now is willing to fund for the setup. so please advice and ur inputs and suggestions is highly appreciated. regards rishab |
I don't think there is any way to calculate this. It is just from my experience.
For your reference, I am running a car aerodynamics steady state simulation with 35 millions cell on 12 server computers with total of 192 cores and this configuration takes about 18 hours to run 10000 iterations. They are 6 year old servers with E5-2650 cpu. The new AMD EYPC cpu could easily double the performance. To finish a 30 millions cell sim in 2 hours, I guess you may need more powerfull servers than what I have and your simulation needs to convergence with less iterations. It is really case dependence. |
so when u say server PC what do u exactly mean by it do u literally mean server or is it a PC if so can u please tell me the specs and how many memory slots for a 16core cpu and so on?
|
I mean server. The one I am using is PowerEdge R620. It is a dual cpu computer and total of 16 Ram slots. You can find the spec from Dell website. Also OpenFoam is I/O intensitive software, which means the memory bandwidth has a larger impact on performance than CPU.
I suggest you to read the following paper. https://www.researchgate.net/publica...and_Don'ts |
Quote:
source /opt/intel/compilers_and_libraries/linux/mpi/intel64/bin/mpivars.sh source /opt/apps/OpenFOAM/OpenFOAM-v1712/etc/bashrc export LD_LIBRARY_PATH=/opt/apps/intel:$LD_LIBRARY_PATH export I_MPI_FABRICS=shm:dapl export I_MPI_DAPL_PROVIDER=ofa-v2-ib0 export I_MPI_DYNAMIC_CONNECTION=0 |
All times are GMT -4. The time now is 16:47. |