Scale-Up Study in Parallel Processing with OpenFoam
1 Attachment(s)
Hello everyone, Whats the plan? :D
Recently I have done a case with different numbers of processors to check the parallel processing performance in OpenFoam, But I got a strange result. I made a simple case and used Scotch method, which is kinda acting like Metis method. Then I used the cluseter in our lab (CMTL in UIC) to solve these cases. The cluster has 4 nodes with 8 processors on each node. The case is a 3d Backward facing step with 748K cells and Re=500, and the solver is rhoPisoFoam. I used 2,4,6,8,16,24,32 processors at a time to solve same Case, but after running I saw the 32-processor case was slower than 16-processor case. The attached file shows the speed of solutions ( Time step/Minute) versus number of processors Can Any one tell me what I did wrong, or if he /she had this problem before? Also Can anyone tell me about that Time openFoam is reporting in log file? |
Hi Sahm,
Your case is too small to stress your cluster, for 32 procs you only have 23k cells per proc. Your cluster is probably spending all its time communicating. Try a larger mesh (or faster interconnect). Cheers Andrew |
With Metis Method
1 Attachment(s)
I changed decomposition method to Metis method, and there was an increase to my solution speeds, The attached file shows the speed-up.
|
Can you upload it in pdf or open-document format ?? Running linux, so no xlsx for me =(
|
New Results.
1 Attachment(s)
Sorry for that,
this file is for 2 cases. one is backstep flow, with Solver rhopisoFoam, and 750k and 1.5m cells, the other one, is for cavity case with 1M cells, any comment about this is appreciated. |
Is any body Going to Help? Is any body not Going to Help?
|
What is your question??
I would say you should get better scalability than this but you will have to say more about the system - cores per node, interconnect type etc. |
Ok,
The system I`m working with is a cluster with 4 processing Node, and 1 head node, each node has 8 processors, with Infiniband connection between nodes. I dont know about ram of each node. But I didn't have any problem with that. The problem with this scale up study is that when I increased the number of Processors, the speed of calculation is decreased. I also tried this case with 3M cells, But still this problem exists. Do you have any Idea why this is happening and how I can make it better? |
Greetings SAHM,
It seems you have an "isolate and conquer" problem! 4 machines, 8 cores each, means that you should first test in a ramification method. In other words:
I suggest these tests, because by what I've seen from the first graph, it feels there is somewhat of an inertia like problem with the machines! In other words, by analysing the first graph, it seems that:
Additionally, don't forget that your infinyband interconnect might have options to configure for jumbo packets (useful for gigantic data transfer between nodes, which would be the case with 50-100 million cells :eek:) or very low latency (useful for constant communication between nodes). Another thing that could hinder the scalability is how frequently do you save a case snapshot. In other words, does this case store:
Sooo... to conclude... be careful what you wish for! You just might get it :cool: ... or so I would think so :P Edit ---------------------------- I didn't see the other three graphs before I posted. But my train of thought still applies. And I should add another thing to the test list:
Best regards, Bruno |
Wow, Still there is a question.
Thanks Bruno. I forgot to erase the first sheet since that case was not defined well, but the other cases have the same problem. About your comments to run them with different types of parallelism, I have a question.
Actually I don't know, how to assign a certain sub-domain (of a decomposed domain) into a certain processor. I mean to run your tests, I need to define a specific processor for a specific sub-domain, or at least I should define how many cores of a node I`m going to use. For example I have to define how to use 4 processors, 2 cores/node and 2 nodes. Can you tell me how to do this? Since this cluster is shared between people in the lab, I need to ask for their permission when I'm going to use more than 8 cores. So running your case might take a long time. Besides, we use a software that enqueues the jobs in my cluster, its called Lava, and I should use it to define my jobs for the cluster, otherwise, other people might not see if I'm running something. Can you tell me how to define a job on certain cores on different nodes, and I'd appreciate it if you tell me how to do it with that Lava, if you know this software. I have an idea that I would like to discuss with you. I think if I define sub-domains for specific cores, I can make the interconnections between nodes minimum. I mean If speed of connection between cores of a node is faster than the connection of nodes, assigning neighbor sub-domains into cores of a single node might help since this reduces the data transferred between nodes ( through a slower connection). I would like to know your idea about this concept. Thanks again for your comment. |
Greetings SAHM,
Quote:
Code:
cat machinefile.morab Code:
foamJob -p -s icoFoam So, for more information about the MPI application you are using, you should consult its manual ;) You could also tweak the foamJob script (run "which foamJob" to know where it is ;)) to better suit your needs! Quote:
Not-so-quick answer: See --> this <-- short tutorial and the links it points to. But by what I can briefly see, it seems that it can operate as a wrapper for mpirun. So, my guess is that you could do a symbolic link of mpirun to Lava's mpirun and use foamJob as it is now! Quote:
Best regards, Bruno |
All times are GMT -4. The time now is 23:27. |