CFD Online Discussion Forums

CFD Online Discussion Forums (http://www.cfd-online.com/Forums/)
-   Hardware (http://www.cfd-online.com/Forums/hardware/)
-   -   Scaling from 1 to 2 nodes shows 122% performance increase (http://www.cfd-online.com/Forums/hardware/108583-scaling-1-2-nodes-shows-122-performance-increase.html)

evcelica October 26, 2012 18:41

Scaling from 1 to 2 nodes shows 122% performance increase
 
Just thought I'd share the somewhat unexpected results of my 2 node "cluster". I'm using two identical i7-3930K computers overclocked to 4.4 GHz, each with 32GB of 2133MHz ram. They are connected using Intel gigabit and I'm using platform-MPI running ANSYS CFX v14.

Benchmark case has ~4 million nodes - steady state thermal with multiple domains.

When comparing:
1 computer running 4 cores to
2 computers running 4 cores each

My speedup shows to be 2.22 times faster :)!
So much for linear scaling, has anyone else seen this, it just seems a little odd to me, though I'm definitely happy about it!
This is something to consider If anyone has been thinking about adding a second node.


EDIT:
After running it a few more times I realized during my single node simulation I accidentally had the CPU downclocked to 3.8GHz instead of 4.4. So the 15.6% Overclock gave me the extra 11% speed per node. Running it again with the same 4.4GHz clock speed on all nodes I got 99.5% efficient scaling. Sorry for the misinformation.

abdul099 October 27, 2012 18:22

There are two possible explanations:

Either the method to investigate the performance fails for some reason. It might be interesting to see how you've judged the performance gain.

Or that might happen due to increased memory bandwidth when running on both nodes. And of course, the cell count per core is not that low, so there should be no run into scaling issues due to communication latency. And of course, your memory might be very fast, depending on the memory timings.
But haven't ever seen something similar. Although a nearly linear scaling could be seen on our Sandy Bridge cluster down to very low cells/core, and it performs VERY well, speedup efficiency was never above 1.

CapSizer October 28, 2012 07:06

Better than 100% scaling on very small clusters is not so unusual, because you can benefit from the additional extremely fast cache memory that becomes available.

abdul099 October 28, 2012 12:32

I agree, CapSizer. I just wondered why I haven't seen this before, since the Sandy Bridge E in our cluster have 20MB cache and each two can communicate via QPI and via infiniband with all others.
Therefore I would still say, it strongly depends on the case and the used hardware.

CapSizer October 28, 2012 13:09

I think also that perhaps the way the solver is parallelized may have an effect? It is a long time since I used CFX, but from what I can remember it was less sensitive to inter-node communication than other solvers I have used. It is as if it was doing a lot of work per iteration before communicating between nodes, so perhaps it suffers less from inter-node latency, and more likely to benefit form the extra cache? Just a guess ...

evcelica October 31, 2012 11:58

EDIT:
After running it a few more times I realized during my single node simulation I accidentally had the CPU downclocked to 3.8GHz instead of 4.4. So the 15.6% Overclock gave me the extra 11% speed per node. Running it again with the same 4.4GHz clock speed on all nodes I got 99.5% efficient scaling. Sorry for the misinformation.


All times are GMT -4. The time now is 10:34.