CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

Any ideas on the Penalty for dual CPU and infiniband

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   June 29, 2018, 12:18
Default Any ideas on the Penalty for dual CPU and infiniband
  #1
New Member
 
Joshua Brickel
Join Date: Nov 2013
Posts: 26
Rep Power: 12
JoshuaB is on a distinguished road
Hi,

I am curious as the the pealty involved in having a two CPU system or for having CPUs connected via Infiniband.

I understand that having more cores will eventually be beneficial, but if the software one has limits one to a certain number of cores, then this issue becomes more relevant.

But, I was wondering if anyone has an idea of how to understand the performance differences (given the same processor) for the following situation…

Single CPU running
Dual CPU on the same motherboard
Single CPU on two different motherboards connected via infiniband.

For arguments sake consider the CFD simulation to be large enough to require all the cores being offered. In the first case we have only half the cores available as in the second and third cases, but we don’t have any CPU to CPU or CPU to Infiniband to Infiniband to CPU latency issues.

This has all come up because I recently was able to run a CFX analysis with 11 million nodes on two different computers:

1. A single 8 core W-2145 CPU (7 cores were actually used during the solution run)
2. A dual CPU 8 core each 2687W v2 (14 cores were actually used in total for the solution run)

What I found was that the wall time per iteration during the solution was fairly close. For the older 2687W v2 it took it 168 secs/iter, while the W-2145 took 180 sec/iter.

Now I understand the W-2145 is a newer and faster CPU. But if I could scale this up, this would mean I could have on a 2 W-2145 CPU machine about 1.87x the performance 168/(180/2).

But the W-2145 can’t be configured in a dual core CPU (no UPI links). But I could configure on two separate machines connected via infinband (which in theory would allow me to scale it out even further later one).

So this got me to thinking what is the performance penalty one pays for having these types of configuration. I doubt I would see the a true 1.87 speed up, but the question is what should I expect for a speed up (if I devoted the same number of total cores to the solution)?
JoshuaB is offline   Reply With Quote

Old   July 1, 2018, 04:17
Default
  #2
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,398
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
First things first, It seems a little odd that the two platforms you tested are so close together. I would expect the Xeon 2687W v2 to perform better. Usually this is due to some suboptimal memory configuration.

That being said, there is no real penalty with a low number of nodes for a properly implemented MPI application. Neither within the nodes aka shared memory nor between the nodes with Infiniband given the problem size is large enough. Instead, you often see superlinear speedup in strong scaling due to cache effects.

Before you start connecting two single-socket Xeon CPUs with infiniband, I would recommend using a dual-socket solution with Xeon Silver or Gold CPUs. They provide 50% more memory bandwidth per CPU compared to Xeon W series.

Last edited by flotus1; July 1, 2018 at 13:33.
flotus1 is offline   Reply With Quote

Old   July 3, 2018, 12:40
Default Follow up question
  #3
New Member
 
Joshua Brickel
Join Date: Nov 2013
Posts: 26
Rep Power: 12
JoshuaB is on a distinguished road
First of all, thanks for replying. I have a couple of follow up questions if you don't mind...

I'm not quite sure why you are so surprised. The memory bandwidth of the 2687W v2 is 59.7 GB/s, while the newer W-2145 is 85.3 GB/s. (As per Intel's website)

Since you recommend a dual CPU Gold/Silver over two W-2145 hooked together via infiniband, I take it your experience tells you the higher the memory bandwidth generally the better the performance. I base this on the W-2145 having 4 channels for memory while the 6134 has six. They are otherwise both 8-core CPUs, and the maximum frequency for the W-2145 is even a bit higher.




Quote:
Originally Posted by flotus1 View Post
First things first, It seems a little odd that the two platforms you tested are so close together. I would expect the Xeon 2687W v2 to perform better. Usually this is due to some suboptimal memory configuration.

That being said, there is no real penalty with a low number of nodes for a properly implemented MPI application. Neither within the nodes aka shared memory nor between the nodes with Infiniband given the problem size is large enough. Instead, you often see superlinear speedup in strong scaling due to cache effects.

Before you start connecting two single-socket Xeon CPUs with infiniband, I would recommend using a dual-socket solution with Xeon Silver or Gold CPUs. They provide 50% more memory bandwidth per CPU compared to Xeon W series.
JoshuaB is offline   Reply With Quote

Old   July 3, 2018, 13:00
Default
  #4
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,398
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Memory bandwidth in a dual-socket system adds up. 2 CPUs -> 2 times the theoretical memory bandwidth.
This and the fact that you have more raw computing power plus more and faster caches on the Xeon E5 v2 system makes me think that it should be a bit faster.

You can try it yourself to which extent you are limited by memory bandwidth on your Xeon W CPU. Run the same case with 1, 2, 4, 6 and 8 cores. Less than linear speedup in this case is very likely due to a memory bandwidth limit. At least if the scaling benchmark is run with a fixed CPU frequency (->turn off turbo boost), higher turbo frequencies for a lower amount of active cores can slightly skew the results.

My recommendation for a dual-socket system over two single-socket systems hooked together over Infiniband is mostly for cost-effectivenes and convenience.
You only need one case, motherboard, PSU etc.
No networking gear required
No need to set up the Infiniband network
More memory in one shared memory system if you need it
and last not least more total memory bandwidth
flotus1 is offline   Reply With Quote

Reply

Tags
dual cpu, infiniband, latency, single cpu

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Superlinear speedup in OpenFOAM 13 msrinath80 OpenFOAM Running, Solving & CFD 18 March 3, 2015 05:36


All times are GMT -4. The time now is 10:58.