Hi everybody,
I try to run da
Hi everybody,
I try to run damBreakFine on four single CPU Systems + one dual CPU System via Open MPI The machines file looks like this: node1 node2 node3 node4 node5 I start the processes with: mpirun --hostfile damBreakFine/system/machines -np 5 interFoam -case damBreakFine -parallel >& damBreakFine/log & If I mpirun the single CPU Systems everything works fine. If I mpirun the dual CPU System it works fine. But if I mix them up, mpirun spawn a thread for every CPU and put them to 100% usage, but did not get any further. The log file shows only: create mesh for time = 0 There is a need to kill one interFoam process to bring the CPUs back to normal usage rate. Any suggestion? |
Hello again,
I also compile
Hello again,
I also compiled a little MPI "Hello World" Test and it works fine on 6 CPUs: ---------------------- #include <mpi.h> #include <stdio.h> #include <unistd.h> int main(int argc, char *argv[]) { int rank, size, node; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &node); char buf[1024]; gethostname(buf,1024); printf (buf); printf(" says hello world with nid %d \n",node); MPI_Finalize(); return 0; } -------------------- > mpicc hello.c -o hello This time the machines file looks like this: node1 node2 node3 node4 node5 cpu=2 > mpirun --hostfile machines -np 10 ./hello node1 says hello world with nid 0 node2 says hello world with nid 1 node3 says hello world with nid 2 node4 says hello world with nid 3 node5 says hello world with nid 4 node2 says hello world with nid 7 node3 says hello world with nid 8 node4 says hello world with nid 9 node5 says hello world with nid 5 node1 says hello world with nid 6 Could it be that OpenFOAM 1.5 parallel run is only supported up to 4 processors? |
I am running turbFoam right no
I am running turbFoam right now on 23 nodes with dual quad core cpus, thats 184 cores.
Did you run the decompasePar for the new number of processes. |
Ahh good to hear!
Yes I go th
Ahh good to hear!
Yes I go through the steps: Editing decomposeParDict: ------------- numberOfSubdomains 6; simpleCoeffs { n (1 6 1); } metisCoeffs { processorWeights ( 1 1 1 1 1 1 ); } ------------- > blockMesh > setFields > decomposePar -force > mpirun --hostfile system/machines -np 6 interFoam -parallel >& log & |
Did decomposePar give any mess
Did decomposePar give any messages about zero sized blocks or faces?
Most of the time I see OF1.5 hang on startup is due to decomposition errors or NIC failures. What is the topology/layout of the network for your nodes? Are you using the MPI libraries from OF1.5? |
Thanks for your attention.
Thanks for your attention.
decomposePar did not give any messages about zero sized blocks or faces. Every network interface doing well. The dual system has two network interfaces in the same subnet. The machines connected to a stellar GigaBit switch and the open MPI libs compiled out of the openFoam 1.5 gtgz's on Heron LTS I also try the turbFoam solver but the results are the same. If I use the single prozessor machines everything run smooth, but if I mix them up with the dual system the log file says: "create mesh for time = 0" an the processes runs on 100% CPU usage until i stop them. |
Is you dual machine actually 2
Is you dual machine actually 2 cpu's or two cores, if it is two cores then the cpu=2 may not work.
my machine file is auto generated by the queue manaager Torque. However, I think you should have something like this in yours, node1 //single processor node2 //single processor node3 //single processor node4 //single processor node5 //dual processor node5 //dual processor To match up with the -np=6, don't add the comments to the machine file. |
After a phone call with Bernha
After a phone call with Bernhard Gschaider I put the Dualsystem aside and add a Quadcore with one network interface to the single CPU systems. This time every node doing well and I could take some measurements.
Thanx for your help and have a nice weekend! |
Hi,
I solved the problem with
Hi,
I solved the problem with the Dual CPU System. After stopping the second Network Interface with ifdown eth1 everything doing fine. |
All times are GMT -4. The time now is 16:56. |