Running in parallel on multiple nodes
I was trying to run a case by using resources of 2 computers by using the following command.
mpirun --hostfile <machines> -np <nprocs> snappyHexMesh -parallel
When I script without the hostfile on 1 node with 8 processors, I don't get any errors but when I run the same script on 2 nodes using the machines names in the machines hostfile, I get an error saying
cannot find points in directory polymesh from 0 down to constant.
I tried checking if the constant directory had the polymesh directory and the points file in it and apparently, it does.
Can someone please help me. Where is it that I am going wrong?
you need to decompse your model in order to run in parallel. Run decomposePar in order to do this.
I did do that. The following are the steps I followed.
3. mpirun --hostfile machines -np <nprocs> snappyHexMesh -parallel
The problem what I mentioned is when I do the third step.
do you have the files for the "simulation calculation" on both machines?
And do you have the same filesystem structure on both machines?
As I understand, using MPI with machine files doesn't require us to have the working folder in both the system.
To answer your question, no I don't have the file in the slave nodes. But I will give it a shot now and see if it works.
Update - Elvis, I did put the folder in both the nodes and tried running. I get the same error
In fact you need to have the working folder in both system (usually via nfs shared file system).
And you also need that snappyHexMesh can work on both system (so same as the working folder, you need openfoam on nfs, or install openfoam on the same dir on each machine).
You aslo need to source your bashrc on each node. On way to do this is to use foamExec.
And take a look at ssh access with shared key (doesn't need a password for each node).
|All times are GMT -4. The time now is 21:15.|