CFD Online Discussion Forums

CFD Online Discussion Forums (http://www.cfd-online.com/Forums/)
-   Hardware (http://www.cfd-online.com/Forums/hardware/)
-   -   HP C3000 blade server vs i7-3930K PCs (http://www.cfd-online.com/Forums/hardware/118449-hp-c3000-blade-server-vs-i7-3930k-pcs.html)

siefdi May 28, 2013 03:59

HP C3000 blade server vs i7-3930K PCs
 
Hi All,

Currently I having a chance to get a HP BladeSystem (used) (http://h18004.www1.hp.com/products/b...c-class/c3000/) C3000 enclosure with 8 mixed-Proliant BL460C blades server as follows:
4 blades of Xeon 5570 x 2 ; 16 GB PC3-10600R memory
2 blades of Xeon 5450 x 2 ; 8 GB PC2-5300F memory
2 blades of Xeon 5160 x 2 ; 4 GB PC2-5300F memory
So it will have 64 cores (HT off) or 96 cores (HT on). All for about $3000.

I am planning to use it for at least several years ahead and I am going to perform CFD calculations (OpenFOAM; < 20M of mesh) as well as other parallel computation using my own code (fortran).

Question is, is it worth to get it?
Or it will be better to purchase, let say, several i7-3930K PCs and connect it together (I could get 4 PCs (used) for about the same price).

Any thought is greatly appreciated.

Regards,
siefdi

CapSizer May 28, 2013 04:53

The difficulty with this sort of thing is that unless you actually run the benchmarks, it can be quite tricky to know what you really have. To start with, I think it is a bit of a problem that the cluster is so non-uniform. As you would have seen from previous postings in this forum, memory performance is really critical for CFD, and here you have two different generations of memory being used. To be honest, you could virtually leave off the 5450 and 5160 blades, the work will be done by the 5570's with their faster memory.

As a crude heuristic, I don't think running CFD on second hand hardware is a good idea. The old machines are much slower (often mostly due more to the slow memory than the CPU speed itself) and consume a lot of power. I would say that second hand hardware might be a reasonable option for other server tasks, but in CFD, where performance counts for so much, and you will be running nearly 24/7, rather get new hardware.

siefdi May 28, 2013 05:46

Hi CapSizer,
Thanks so much for the comments, aprreciate it.

Quote:

Originally Posted by CapSizer (Post 430442)
... I think it is a bit of a problem that the cluster is so non-uniform.

Ah you are right, it seems that is not a good idea to mixed up the cluster. I didn't think about it. Thanks for pointed it out.


Quote:

Originally Posted by CapSizer (Post 430442)
As you would have seen from previous postings in this forum, memory performance is really critical for CFD, and here you have two different generations of memory being used.

Lets assume that I will only use 4 blades of 5570's for CFD.

But first of all, I would like to confirm something. If I have 1 blade consists of 2 x Xeon 5570, and each 5570 has 3 channel memory, so that two Xeon 5570s will have 6 channel memory. Is it correct?

As I learn from this forum, number of channel memory will be very important in CFD (also memory speed, as you mention). I know it will depend on many factors, but based on your experience, for the same case, which one will be faster;
a. Higher total number of memory channel (let say 24) with DDR3 1333
b. Smaller total number of memory channel (let say 16) with DDR3 1600
Or maybe the general question is, which one is more important, number of memory channel and memory speed?

Quote:

Originally Posted by CapSizer (Post 430442)
As a crude heuristic, I don't think running CFD on second hand hardware is a good idea. The old machines are much slower (often mostly due more to the slow memory than the CPU speed itself) and consume a lot of power. I would say that second hand hardware might be a reasonable option for other server tasks, but in CFD, where performance counts for so much, and you will be running nearly 24/7, rather get new hardware.

Well, I was thinking that by purchased second hand hardware then I will get more power, as quantity is important in 24/7 case,.. but I think I get your point. Thanks a lot. :)

Regards,
siefdi

CapSizer May 28, 2013 06:01

Quote:

Originally Posted by siefdi (Post 430452)
Hi CapSizer,

As I learn from this forum, number of channel memory will be very important in CFD (also memory speed, as you mention). I know it will depend on many factors, but based on your experience, for the same case, which one will be faster;
a. Higher total number of memory channel (let say 24) with DDR3 1333
b. Smaller total number of memory channel (let say 16) with DDR3 1600
Or maybe the general question is, which one is more important, number of memory channel and memory speed?

Perhaps here you should start looking at the SpecFPrate benchmark numbers. They don't tell the whole story, but they do contain valuable information. You could simply multiply the memory numbers, 24*1333 = 31992 and 16*1600 =25600, but I think that is over simplistic. It is indicative of available bandwidth, but there is more to it, perhaps, based on how the job is partitioned. This one is not so easy to call. I have a suspicion that you will find that the blade cluster will be quicker, but it will consume more power, which also has to be paid for.

CapSizer May 28, 2013 06:27

As another approximation, the specfprate number for a dual-processor X5570 system is around 190. You will have four of them, which will give you a total of 760. Your alternative system is 4 X 3930 K. Each 3930K CPU is roughly equivalent to a Xeon E5-2667 (there are no ratings for the 3930K), which has a rating of 416 in a dual socket system. So, going by these figures, four 3930K's should give you a rating of 416*2 = 832, so somewhat better than the four server blades. Bear in mind that these are not real CFD benchmark numbers. The closest that Spec has got for a CFD equivalent is leslie3d, and the numbers are generated by running identical copies of the code, rather than a partitioned single problem. In any event, between the memory speed and the SPEC results, you can probably say that four 3930K's will be somewhat similar in performance to four of your cluster blades.

siefdi May 29, 2013 00:19

Dear CapSizer,

I think that I get precious input and pointers in this case from you, through this discussion. For that, I thank you so much. :)

Regards,
siefdi

evcelica May 29, 2013 18:05

Agreeing with CapSizer:
You could also look at the Euler3D CFD benchmark on this page: http://www.tomshardware.com/reviews/...ew,3149-9.html
It has a single 3960X slightly outperforming a dual XEON X5580.

CapSizer May 29, 2013 18:22

Quote:

Originally Posted by evcelica (Post 430847)
Agreeing with CapSizer:
You could also look at the Euler3D CFD benchmark on this page: http://www.tomshardware.com/reviews/...ew,3149-9.html
It has a single 3960X slightly outperforming a dual XEON X5580.

That certainly fits with the general picture that emerges from the new i7 vs old Xeon comparison. The only caution is that the Euler3D benchmark is very unrepresentative of typical CFD codes, but that difference only really seems to count when comparing AMD vs. Intel.

The bottom line with all these hardware configuration questions is that you can't really argue against the latest i7 and its four memory channels. It comes at a price premium, but the performance advantage justifies that premium.

siefdi June 3, 2013 02:32

Hi Evcelica & CapSizer,.

Thanks so much for the insight, really appreciate it.

After a bit more research, I do totaly agree with both of you that old cpu, eventough it look super on the paper and lot cheaper than it used to be, still could not compete with current cpus in term of performance, eficciency, power consumption, etc. imho. (well, MAYBE except that I can get xeon 5600 series and overclock it using EVGA SR2, did anyone do it?)

So, I will just leave the xeon 5550s behind, and now I even tempted to get xeon e5-2600 series. Here, I can get 2*xeon e5-2670 workstation (used) for about the same price. Question is, is it worth it? or several (3*) i7-3930K PCs still a better option? I am wondering about number of cpu, cpu clock and communication between each cpu, as both system have the same 4 channel memory.

Thanks a lot.
Regards,
siefdi

CapSizer June 3, 2013 05:10

Quote:

Originally Posted by siefdi (Post 431559)
Hi Evcelica & CapSizer,.
So, I will just leave the xeon 5550s behind, and now I even tempted to get xeon e5-2600 series. Here, I can get 2*xeon e5-2670 workstation (used) for about the same price. Question is, is it worth it? or several (3*) i7-3930K PCs still a better option?
siefdi

I would say that this is very difficult to call without doing the actual testing. On the face of it, the 3 X i7 system should be quite a bit quicker. The Xeon gives you 16 cores vs. the i7 system's 18, and 8 memory channels vs. the i7's 3*4=12. In addition, you can probably put faster overclocked memory modules in the i7. However, you will now need to deal with the networking and the latency and hassle associated with it. You will probably need to find a good managed switch and figure out how to tune it for best performance. On the other hand, the i7 system is more scalable, in the sense that you can leave one or even two machines switched off for very small problems, or add another one or two if you need more later.

From my experience, a good dual socket workstation is a really valuable, reliable CFD workhorse that allows you to focus on the task at hand. I've generally had at least one sitting next to my desk for most of the past 15 years, and I've never been sorry. I've also done lots of work on a variety of ad-hoc and other clusters, and those have all been very productive, but generally required the investment of quite a lot of additional hardware-related effort.

siefdi June 26, 2013 22:31

UPdate
 
1 Attachment(s)
Hi All,

First of all, thank you for all your insight so far. Its a very nice forum we have here. :)

Recently, I have had a chance to run my OpenFOAM case (5M mesh) on several computer system, as follows:

a. 2 * Xeon X5550 (2.56 GHz) | 2x4 cores | 2x3 memory channels
b. 1 * i7-3930K (3.2 GHz) | 6 cores | 4 memory channels
c. 2 * Xeon E5-2660 (2.2 GHz) | 2x8 cores | 2x4 memory channels

Well, it was just a simple run-test with limited time, and might be not good enough to be called a 'benchmark'... but anyway here they are:

http://www.cfd-online.com/Forums/att...1&d=1372300018


Some notes:
- Only 4 out of 8 memory channels are populated on system c (Xeon E5-2660).
- Hyperthreading was turned on.


From what I can see:
- Memory speed is indeed significant
- Newer processor architechture is simply better (number of cores, lithography, etc)
- ...

Please let me know what do you think...

Regards,
siefdi

CapSizer June 27, 2013 02:19

Thanks for posting that info, it is really valuable! It is a pity that the E5-2660 could not be tested with 8 memory channels populated, it would have been quite neat to see how much difference that made.

evcelica June 27, 2013 18:18

Wow thanks for the graphs, very nice to see real world data! I agree it would be great to see what the extra memory channels do for the XEON E5 since no one should ever plan on running them in only dual channel mode.

I would still like to see 2 node i7 vs. dual socket XEON. Hopefully I'll have access to a dual E5 machine in the near future and can run these tests.

I appreciate you sharing that info, Thanks again!

siefdi July 3, 2013 00:17

UPdate2
 
1 Attachment(s)
Hi All,

Thanks for all the feedbacks.

You are right, the Dual Xeon E5-2660 system should have all 8 memory channels populated for optimal performances (probably in the near future, I hope).

Anyway, for test purposes, finally I could managed to populated all 8 memory channels by grabbed all available memory modules originally planted on other systems.

Notes:
  • These memory modules are mixed between different models and capacities.
  • All of them are ECC-type of memory (which one may argue that this kind of memory are considerably slower than their Non-ECC counterparts).
  • These memory modules are not optimized for 4-channel (not a kit of 4).
  • For the speed of 1600 MHz, I forced it by overclocked the memory (originally it was 1333 MHz memory modules).

For all the above reasons, I expect to get more performance increases if we can use suitable memory modules optimized for LGA2011 with 4 memory channels ( I am not really sure, though).

Anyway, here they are:



Please let me know what do you think about it.

@Evcelica : Very nice. Looking forward for your test-results in the near future.

Thanks all,
Regards,
siefdi

CapSizer July 3, 2013 03:11

Thanks for posting that info, once again it is very valuable. It certainly gives a nice indication of how the older Xeon has been left behind. One might perhaps have anticipated a slightly bigger advantage for the E5 with 8 memory channels, but it is possible that the mixture of memory modules that you had to use prevented it from unleashing the full potential. Or perhaps that is just how it performs :-) At least these are real useable numbers!

siefdi July 3, 2013 04:03

Hi Capsizer,

Thanks for the comments.
About the mixed memory modules, I do think the same, but I am not really sure though. I hope I could do the test with suitable memory modules in the near future.

One other thing that I am not sure is, whether CPU clock took significant role on the performances or not. The Xeon E5-2660 has only 2.2 GHz max clock (3.0 GHz with TurboBoost turn on) which noticeably lower compared to the others. Moreover, during this test I can only see 2.2GHz as its max speed and its never reached higher value than 2.2 GHz even after I had turned TurboBoost on. I am aware that its not (only) CPU clock which is important, but how significant it is in CFD world, I don't know.

Well, anyway, thanks a lot for all the discussions. Its really help my knowledge.

Regards,
siefdi

evcelica July 3, 2013 16:43

The Turboboost is usually maxed at single core usage, and throttles down as more cores are used (more power usage). I would expect 2.2-2.3 GHz when using all available cores.

Its a little surprising that the 4 modules at 1600 was faster than 8 modules at 1333. I'm not sure if using a matched set would make any difference since you clocked them manually anyways, but like you I'm not sure.

Thanks again for these benchmarks, that is really great!

siefdi July 7, 2013 20:58

Thanks
 
Hi All,

Thanks a lot for all the comments and discussions. Really appreciate it. :)

Regards,
siefdi


All times are GMT -4. The time now is 13:41.