CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

42% gain from dual E5-2643 to i7-3930K (x2)

Register Blogs Community New Posts Updated Threads Search

Like Tree6Likes
  • 4 Post By evcelica
  • 1 Post By CapSizer
  • 1 Post By evcelica

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   March 26, 2014, 13:20
Default 42% gain from dual E5-2643 to i7-3930K (x2)
  #1
Senior Member
 
Erik
Join Date: Feb 2011
Location: Earth (Land portion)
Posts: 1,167
Rep Power: 23
evcelica is on a distinguished road
I finally got around to testing this out to answer the whole "2 node cluster vs. dual CPU workstation" question. I used ANSYS CFX v15 for this benchmark, and I made the case very simple if others wanted to compare systems.

CASE:
Geometry: 1m x 1m x 5m long duct
Mesh: 100 x 100 x 500 "cubes" all 1x1x1cm (5M cells)
Flow: Default Water enters @ 10m/s at 300K, goes out other side at 0Pa. Walls are 400K.
High Resolution Turbulence and advection
Everything else default.
Double Precision: ON
CPUs: Platform MPI: 8 cores = 4+4 for cluster
20 iterations

Setup 1 is a dual XEON E5-2643 (3.3 GHz) with 128GB of 1600 MHz RAM.
Setup 2 is a two node cluster of i7s 3930K @ 4.2GHz, each with 64GB of 2133 MHz RAM. Connected with 20Mbps Infiniband.
Maybe in the future I'll check the performance only using a gigabit connection, though I've heard it doesn't matter much with 2 nodes.

Solver Wall Time for dual XEON: 869.96 s
Solver Wall Time for two i7s: 612.70 s

I saw the roughly same trends in mechanical benchmarks as well. (30-50% performance increase, and in some benchmarks a single i7 on 4 cores could solve faster than the dual XEON on 8!)
Jonathan, shreyasr, siefdi and 1 others like this.
evcelica is offline   Reply With Quote

Old   March 27, 2014, 03:28
Thumbs up
  #2
New Member
 
CFD
Join Date: Jan 2013
Posts: 23
Rep Power: 13
siefdi is on a distinguished road
Wow, it's a very valuable information.. thanks.

I would like to clarify something btw, I am not really sure but it seems to me that the memory speed on the i7-3930K's system (which was 2133 MHz) play a more significant role to the overall performances rather than the CPU clock, but off-course I could be wrong. Is it?

It will be interesting to see how i7-3930K's system performs, compared to the Xeon's one, if it has the same memory speed (at 1600 MHz).
siefdi is offline   Reply With Quote

Old   March 31, 2014, 05:01
Default
  #3
Member
 
Kim Bindesbøll Andersen
Join Date: Oct 2010
Location: Aalborg, Denmark
Posts: 39
Rep Power: 15
bindesboll is on a distinguished road
As far as I can see you are actually testing four different factors in one test, so it is not possible to draw any detailed conclusions apart from that one system is faster than the other.

1) RAM speed: 1600 MHz vs. 2133 MHz (speed may not be the only difference between the RAM used)
2) CPU clock speed: 3.3 GHz vs. 4.2 GHz
3) CPU architecture: 4 cores, 10 MB cache vs. 6 cores, 12 cache.
4) 2-node cluster with Infiniband vs. Dual socket system

So if you think you are only testing the effect of issue 4) you would need to keep 1), 2) and 3) constant.
Testing of my 2-node cluster with Xeon 2667 v2 and Infiniband indicates that the maximum number of cores used efficiently at 1866 MHz RAM is somewhere inbetween 4 and 8 cores. So your Xeon system might be limited on core counts. As the i7 is higher on both cores and RAM speed it is evident that the performance is supirior. I agree on the conclusion that the Infiniband isnt the bottleneck at a 2-node cluster.

Best regard
Kim Bindesbøll
bindesboll is offline   Reply With Quote

Old   March 31, 2014, 05:55
Default
  #4
Senior Member
 
Charles
Join Date: Apr 2009
Posts: 185
Rep Power: 18
CapSizer is on a distinguished road
Quote:
Originally Posted by bindesboll View Post
As far as I can see you are actually testing four different factors in one test, so it is not possible to draw any detailed conclusions apart from that one system is faster than the other.
I think the point that Erik is making here is that he wants to compare two systems that are equivalent in terms of what one can realistically get for your money. You can't get the high clock speed Xeons, and the Xeon system won't support the very fast memory. But you can for the same price (?) get the high clock speed i7's with their fast memory.
shreyasr likes this.
CapSizer is offline   Reply With Quote

Old   March 31, 2014, 19:42
Default
  #5
Senior Member
 
Erik
Join Date: Feb 2011
Location: Earth (Land portion)
Posts: 1,167
Rep Power: 23
evcelica is on a distinguished road
Quote:
Originally Posted by CapSizer View Post
I think the point that Erik is making here is that he wants to compare two systems that are equivalent in terms of what one can realistically get for your money. You can't get the high clock speed Xeons, and the Xeon system won't support the very fast memory. But you can for the same price (?) get the high clock speed i7's with their fast memory.
Correct, this is a $10K XEON machine vs two ~$3,000 machines. I wouldn't care to test the i7 machines at 1600 MHz. I got them because you can use much higher RAM and CPU clock frequencies than you can with the XEON.

When I did some previous benchmarking I saw about Memory Frequency was more of a factor than CPU clock, but both help each other realize gains. Going from 3.2GHz CPU and 1600 MHz RAM to 4.2 and 2133 MHz RAM on the same machine showed a 33% increase.
http://www.cfd-online.com/Forums/att....benchmark.jpg
shreyasr likes this.
evcelica is offline   Reply With Quote

Old   July 29, 2014, 05:26
Default
  #6
Senior Member
 
OJ
Join Date: Apr 2012
Location: United Kindom
Posts: 473
Rep Power: 20
oj.bulmer will become famous soon enough
Hi,

I am thinking of buying the following configuration:



I am not very much initiated in hardware benchmarking. From your benchmark image in the last post, it is obvious that you got better performance at 2133 MHz than 1866 MHz, but the processor you have used (i7 3930K) only supports upto 1600 MHz. How were you able to maintain the 2133 MHz between processor and RAM?

If you look at the my configuration, it has i7 4820k processor that supports max 1866 MHz, but the RAM I selected is 2400 MHz. The only reason I will go for 2400 MHz RAM is the benefit it gives in computational time. But ONLY, if processor can go more than 1866 MHz threshold...

Thanks
oj.bulmer is offline   Reply With Quote

Old   July 29, 2014, 17:11
Default
  #7
Senior Member
 
Erik
Join Date: Feb 2011
Location: Earth (Land portion)
Posts: 1,167
Rep Power: 23
evcelica is on a distinguished road
The CPU only "officially" supports 1600, but if your motherboard is capable of running "overclocked" memory, it can run at higher memory frequencies.
Its not guaranteed to work though, and could be unstable. If you can't get it stable at 2400, that's just too bad. It's only rated at 1600 MHz, and if you want to try to push it further, that is up to you.
If you couldn't get it stable at 1600 MHz, then you would have a warranty issue and could get a replacement for what isn't working.

Same thing with the CPU clock, if you overclock it and it becomes unstable, that's too bad, you are pushing it beyond what it is guaranteed to work at.

I'm not saying it won't work, it most likely will, it's just not guaranteed, and you would have to run it at a lower frequency.
evcelica is offline   Reply With Quote

Old   November 9, 2014, 23:51
Default Gigabit performance ?
  #8
Member
 
Shreyas Ragavan
Join Date: Feb 2012
Location: India
Posts: 37
Rep Power: 14
shreyasr is on a distinguished road
Quote:
Originally Posted by evcelica View Post
I finally got around to testing this out to answer the whole "2 node cluster vs. dual CPU workstation" question. I used ANSYS CFX v15 for this benchmark, and I made the case very simple if others wanted to compare systems.

CASE:
Geometry: 1m x 1m x 5m long duct
Mesh: 100 x 100 x 500 "cubes" all 1x1x1cm (5M cells)
Flow: Default Water enters @ 10m/s at 300K, goes out other side at 0Pa. Walls are 400K.
High Resolution Turbulence and advection
Everything else default.
Double Precision: ON
CPUs: Platform MPI: 8 cores = 4+4 for cluster
20 iterations

Setup 1 is a dual XEON E5-2643 (3.3 GHz) with 128GB of 1600 MHz RAM.
Setup 2 is a two node cluster of i7s 3930K @ 4.2GHz, each with 64GB of 2133 MHz RAM. Connected with 20Mbps Infiniband.
Maybe in the future I'll check the performance only using a gigabit connection, though I've heard it doesn't matter much with 2 nodes.

Solver Wall Time for dual XEON: 869.96 s
Solver Wall Time for two i7s: 612.70 s

I saw the roughly same trends in mechanical benchmarks as well. (30-50% performance increase, and in some benchmarks a single i7 on 4 cores could solve faster than the dual XEON on 8!)
Hi Erik,
the 2012 ANSYS HPC documentation indicates that the performance difference between a 1Gigabit and 10 Gigabit starts becoming quite pronounced beyond 24 cores.

Were you able to compare the performance with a Gigabit connection?
Would that be a speed of 1GBPS or 10GBPS ?
Attached Images
File Type: jpg Interconnect.jpg (48.9 KB, 66 views)
__________________
Shreyas
www.cfdrevolutions.weebly.com
shreyasr is offline   Reply With Quote

Old   November 11, 2014, 12:21
Default
  #9
Senior Member
 
Erik
Join Date: Feb 2011
Location: Earth (Land portion)
Posts: 1,167
Rep Power: 23
evcelica is on a distinguished road
A gigabit connection is 1 Gbps not GBps (bit vs. Byte) Byte is 8 times bigger.
I have not benchmarked this problem using gigabit only yet, and I believe it's going to be problem dependent, where large problems scale better than smaller ones, and single vs. double precision makes a difference. I've ran some very large problems (17M nodes, double precision) using gigabit and seen 99.5% speedup on 2 nodes vs. 1. But I've also seen smaller problems that don't scale well at all (6 processors on one node was faster than 2 nodes with 3+3).
I only have licenses to run up to 16 cores.
I'm getting a third node soon, but probably won't be testing gigabit since I think I.T. started doing something to our computers that reduces network speeds (even when I directly connected the computers using a crossover cable, bypassing the work network) MPI tests maxed out at 40-70MBps, I took the same cable home and could get 120+ MBps consistently.
evcelica is offline   Reply With Quote

Old   November 12, 2014, 04:24
Default
  #10
Senior Member
 
Derek Mitchell
Join Date: Mar 2014
Location: UK, Reading
Posts: 172
Rep Power: 13
derekm is on a distinguished road
One has to add in on the cost side the infiniband kit, though with careful ebaying you could get 10 gbs for $200 + $50 per node.new price though ?

and then the large learning and fiddling around time to get the infiniband working.
If you were paying for this time you wouldnt go there if you could avoid it.
I have the scars from putting a seven node infiniband cluster together, if you havent done it before, let me say its non-trivial but certainly doable without assistance, but be prepared to spend a long time on it.
__________________
A CHEERING BAND OF FRIENDLY ELVES CARRY THE CONQUERING ADVENTURER OFF INTO THE SUNSET
derekm is offline   Reply With Quote

Old   November 24, 2014, 13:14
Default
  #11
Member
 
acasas's Avatar
 
Antonio Casas
Join Date: May 2013
Location: world
Posts: 85
Rep Power: 12
acasas is on a distinguished road
Hi guys, I may be experiencing similar performance issues.
Please, check my post and let me know your opinion.
http://www.cfd-online.com/Forums/har...5-2650-v3.html

thanks a lot
acasas is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Performance of dual xeon 2643 tally_ho Hardware 7 December 17, 2012 12:01
recover solution from dual grid jamesproctor Main CFD Forum 0 July 30, 2010 15:31
P4 1.5 or Dual P3 800EB on Gibabyte board Danial FLUENT 4 September 12, 2001 11:44


All times are GMT -4. The time now is 15:23.