Parallel performance of icoFoam
I get linear speed-up upto 4 processors with icoFoam; when I try to use more than 4 processors, the performance decreases. Has anyone come across this issue? I will be more than happy to report the details if need be.
Could you give us more details on your hardware, software and problem configuration?
CPU type, amount of RAM, type of interconnect, version of OpenFOAM, size of meshes, etc.
What was your problem size. Many solvers will not scale when the number of cells per CPU is less than 10,000. There is a lot of communication overhead. This leads to more time in communication than actual computation. Hence the resulting performance is bad !
Thanks for your inputs...The machine that I am using is a Dell Dual processor, quad core Intel (Harpertown running at 2.33Ghz), 16GB, No interconnect (it just uses shared memory)
Here is the output from checkMesh...
Create polyMesh for time = constant
Time = constant
internal faces: 184318
boundary patches: 10
point zones: 0
face zones: 0
cell zones: 0
Number of cells of each type:
tet wedges: 0
Boundary definition OK.
Point usage OK.
Upper triangular ordering OK.
Topological cell zip-up check OK.
Face vertices OK.
Face-face connectivity OK.
Number of regions: 1 (OK).
Checking patch topology for multiply connected surfaces ...
Patch Faces Points Surface
inlet 111 72 ok (not multiply connected)
out2 74 52 ok (not multiply connected)
out3 62 40 ok (not multiply connected)
out4 65 44 ok (not multiply connected)
out5 44 31 ok (not multiply connected)
out6 58 38 ok (not multiply connected)
out7 51 34 ok (not multiply connected)
out8 50 34 ok (not multiply connected)
out1 46 32 ok (not multiply connected)
w1 24139 12150 ok (not multiply connected)
Domain bounding box: (-0.0609739 -0.106362 -0.025452) (0.0609426 0.106513 0.0254047)
Boundary openness (-2.2934e-17 -4.63998e-17 7.83197e-18) OK.
Max cell openness = 3.5536e-16 OK.
Max aspect ratio = 13.8974 OK.
Minumum face area = 5.56535e-09. Maximum face area = 1.18647e-05. Face area magnitudes OK.
Min volume = 2.28107e-13. Max volume = 1.27222e-08. Total volume = 5.34576e-05. Cell volumes OK.
Mesh non-orthogonality Max: 77.0421 average: 22.9327
*Number of severely non-orthogonal faces: 17.
Non-orthogonality check OK.
<<Writing 17 non-orthogonal faces to set nonOrthoFaces
Face pyramids OK.
Max skewness = 1.44383 OK.
All angles in faces OK.
All face flatness OK.
Two things to look for here. The first is that as you decompose the domain into more and more small subdomains the surface area to volume ratio of each subdomain increases. Surface area is measured in number of faces and volume is measured in number of cells. At some point, your machine spends more time communicating than processing. I would not divide up a domain into chunks of less than 50K cells.
Secondly, shared memory machines like yours will start to have memory transfer bottlenecks, where your fast processors are spending much of their time waiting for the accesses to main memory. In unstructured codes, pre-fetching is hard, even with the special ordering OpenFOAM uses.
Thanks for the valuable input. Makes sense...
This happened because using Harpertown.
Pay attention to the line for INTEL WHITEBOX (INTEL_X5482_HTN4, 3200, RHEL5).
|All times are GMT -4. The time now is 15:56.|