CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   Main CFD Forum (https://www.cfd-online.com/Forums/main/)
-   -   Running MPI code on a multiprocessor node (https://www.cfd-online.com/Forums/main/11247-running-mpi-code-multiprocessor-node.html)

wen April 17, 2006 14:30

Running MPI code on a multiprocessor node
 
Hi,

I'd appreciate a lot if you can tell me the answer to this question.

I'm tring to use MPI (MVAPICH) + OpenPBS to run a code on a 10 node Linux x86 64b system. Now I can run the code with less than or equal to 10 processes in parallel.

The qestion is that, each node on the cluster actually has 2 processors, can I somehow manage to request > 10 processes? i.e. I was trying to do the following yet failed:

mpirun -np 12 myprogram.exe > myoutput.log

Since I was trying to use 12 parallel processes to run the code, can't the computer somehow figure out every node has more than one processor and use them?

Thanks,

Wen


Tian_FB April 17, 2006 21:41

Re: Running MPI code on a multiprocessor node
 
hi, As I know,u can use 'mpirun -np m **.exe' with the right setup of machines.Linux and m> ,= or <the number of processors.

Renato. April 17, 2006 23:35

Re: Running MPI code on a multiprocessor node
 
Yes, you can do it by specifying two processes per node. I think the syntax to do it in OpenPBS is something like:

#PBS â€"l nodes=10:ppn=2

this command means that you're requesting 10 nodes and 2 processes per node, in other words, 20 processes...

Hope it helps

Regards

Renato.


wen April 18, 2006 09:29

Re: Running MPI code on a multiprocessor node
 
I was trying to do what Renato told me, but it didn't work.

I guess there should be some PBS configuration problems, I don't know how to turn that multi-processor option (ppn=2) on.

Through Tian_FB's idea, I use a machines.LINUX file that consists of the node names that are assigned to the Job. But each name is repeated (appears twice in the file) and that worked! The code runs with 20 processes (threads). But problem is that when I use qstat or showq (maui) to check the status of the job, they only report one processor being used for each node.

Any more hints?

Wen

Tian_FB April 18, 2006 10:37

Re: Running MPI code on a multiprocessor node
 
well,maybe you use the MPI_Get_processor_name to get the processors' names and print out them,you'll see it.I am studying mpi these days and would like to discussion with you via email tfbao@mail.ustc.edu.cn . Email me if you have some good ideals or problems .

Tian

Renato. April 18, 2006 12:23

Re: Running MPI code on a multiprocessor node
 
Ok, try to write your machinefile in the following manner:

NodeName1:2 NodeName2:2 NodeName3:2 ... NodeName10:2

Cheers

Renato.


wen April 18, 2006 14:56

Re: Running MPI code on a multiprocessor node
 
If I do that, i.e. useing:

node1:2 node2:2

...

node10:2

in the machines.LINUX file for the mpirun, it will say:

"getaddrinfo: Temporary failure in name resolution" and stop the job, which means the server can't find the correct nodes, due to wrong node name.

Yet, if I do:

node1

node1

node2

node2

...

node10

node10 in the machines.LINUX file (this is cheating!), it will find the nodes, and give 20 processes for the run as requested by "mpirun -np 20 ./myprogram.exe". But OpenPBS (qstat) and maui (showq) won't report that correctly.

If I say "#PBS -l node=10:ppn=2" in the PBS script job file, the job will be always on Q status, which means it's been deferred, and it will be deferred forever and never get run.

Also, "pbsnodes -a" command can't tell the correct nodes that the code is running on. For example, I can login node0 and node10 and use "top" to find myprogram.exe running onit. Yet "pbsnodes -a" still say those two nodes are free.

What next?

Wen


wen April 19, 2006 16:35

Problem Solved
 
Problem solved, it's due to that I'm not root on the machine, after configuration, the PBS server needs to be restarted.

Basically:

1) pbs_server/nodes file should consist all nodes

node1 np=2

node2 np=2

...

2) in the shell script file for job submission,

#PBS -l nodes=10:ppn=2

mpirun_rsh -rsh -hostfile $PBS_NODEFILE -np 20 ./myprog.exe

cheers



All times are GMT -4. The time now is 01:01.