CFD Online Discussion Forums

CFD Online Discussion Forums (http://www.cfd-online.com/Forums/)
-   OpenFOAM Pre-Processing (http://www.cfd-online.com/Forums/openfoam-pre-processing/)
-   -   Parallel run with distributed data: How? (http://www.cfd-online.com/Forums/openfoam-pre-processing/107962-parallel-run-distributed-data-how.html)

eskila October 10, 2012 10:22

Parallel run with distributed data: How?
 
I have two multi-core computers available on the network, and want to parallelize a case using cores on both machines.
I tried it with "distributed no" in decomposeParDict, and that worked fine.
Now i want to try the "distributed yes" feature, but find the documentation a bit lacking.

I am starting with something simple: Using one core on each machine.
Let's call the machines:
#1: The one i'm launching the job from.
#2: The other one.

I have a file "machines" with the following content (ip adresses hidden by x's):
Code:

xxx.xxx.xxx.xx1 cpu=1
xxx.xxx.xxx.xx2 cpu=1

I tried a decomposeParDict like this, though i'm not sure if this is the way it's supposed to be done:

Code:

numberOfSubdomains 2;

method          scotch;

distributed    yes;

roots
1
(
"<absolute path to empty directory on machine 2 which my user owns>"
);

I then do (on machine 1, in the case directory):
Code:

blockMesh
decomposePar
mpirun --hostfile machines -np 2 buoyantBoussinesqPimpleFoam -parallel

It asks for my password on machine 2. After that i get the error:
Code:

[1] --> FOAM FATAL ERROR:
[1] Cannot find file "points" in directory "polyMesh" in times 0 down to constant
[1]
[1]    From function Time::findInstance(const fileName&, const word&, const IOobject::readOption, const word&)
[1]    in file db/Time/findInstance.C at line 188.
[1]
FOAM parallel run exiting

What am i doing wrong?
I'm guessing it has something to do with the "roots" entry in decomposeParDict. Should it be absolute or relative path? If the latter, relative to what? Should it be empty directories? What should it be if i want to e.g. run on 10 cores on each machine?

Any help will be greatly appreciated.

kev4573 October 10, 2012 14:11

Do you need to specify both roots when using the distributed option, even when one of the nodes is the host?

My guess is that they are absolute paths to the case directory for each of the nodes.

eskila October 10, 2012 14:26

Quote:

Originally Posted by kev4573 (Post 385998)
Do you need to specify both roots when using the distributed option, even when one of the nodes is the host?

No, because if i try to give two roots with this setup, i get the message that the number of roots MUST be one less than the number of processes.

But if i as an example want to use 10 cores on each machine, do i really have to list 19 roots where 9 and 10 of them are identical? And how is a root in decomposeParDict coupled with a machine listed in the "machines" file? A path can be valid on both machines..

kev4573 October 10, 2012 15:13

These instructions worked for me - http://www.cfd-online.com/Forums/blo...h-process.html .

In any case I'd agree the official documentation could be clearer..

eskila October 10, 2012 16:02

Quote:

Originally Posted by kev4573 (Post 386014)
These instructions worked for me - http://www.cfd-online.com/Forums/blo...h-process.html .

In any case I'd agree the official documentation could be clearer..

Thanks for the tip.
I am not able to test it for myself right now, but i think i see one problem: In that case all the processes seem to be run on the same machine, only in different directories. Where does the information about the IP-adresses and the distribution of processes between machines go?

kev4573 October 10, 2012 16:58

I'm not sure there is an explicit map of processors to node/cpu locations, it may just be implied by the ordering of the nodes in your hosts file; first X processors are computed on first node, next X processors computed on the next node, and so on. I think the key is to just copy the entire decomposed case to each node, and point the root directories in decomposeParDict to the parent of the case directory.

eskila October 11, 2012 03:44

It worked, I think. I don't know how i can be sure that both computers used only their local disc, instead of "crossing over", but at least it ran without error.

The "machines" file contained:
Code:

XXX.XXX.XXX.XX1 cpu=10
XXX.XXX.XXX.XX2 cpu=10

While decomposeParDict contained a listing of 19 roots. 9 of them (nodeX, X=1->9) were paths on machine #1, and 10 of them (nodeX, X=10->19) were paths on machine #2.
Each nodeX directory got a complete copy of the case as a subdirectory, after running decomposePar on the master node.
Then
Code:

mpirun --hostfile machines -np 20 buoyantBoussinesqPimpleFoam -parallel
was run on the master node, and 10 processes popped up on "top" on each machine.

kev4573 October 11, 2012 09:22

Great, I was wondering myself about whether you would need to define roots for each core vs each machine. I agree it seems to redundant to do the former, but could be beneficial if someone wanted to distribute the processors in different places within a single machine.


All times are GMT -4. The time now is 08:17.