CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

Self-built InfiniteBand cluster?

Register Blogs Community New Posts Updated Threads Search

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   July 28, 2023, 05:28
Question Self-built InfiniBand cluster?
  #1
New Member
 
Freewill1's Avatar
 
Join Date: Aug 2014
Posts: 18
Rep Power: 11
Freewill1 is on a distinguished road
Hi all,

We have five workstations at hand, each with
1. 1×32-core AMD EPYC 7532 CPU
2. 8×16GB DDR4 3200 RAM
3. 1×1TB Samsung 980 Pro SSD for storage

Each of them tried hard to make the most use of the 8-ch RAM support of the AMD EPYC CPU by having a relatively low core count and filling eight RAM slots. Not very outdated so far?

Now, we need to setup a small-scale cluster to make parellall solving of our own CFD code based on
1. a finite-volume discratization of the Navier-Sokes equations with a SIMPLE-like algorithm on a structured mesh.
2. a distribution parallel strategy with a message passing interface (MPI) to enable exchange of the halo region of the FV mesh after domain decomposition. Each decomposed domain solves the equations independently with one CPU core or more.

So far, all the programming and testing of the MPI parallelism of the code have been confined to one machine mentioned above to minic a cluster with intra-node data exchange only.

We intend to break through this hardware limitation in the near future and expect a good (near-linear) core number-speedup scaling up to ~100 CPU cores.

To achieve this goal, I know that the inter-node rate of the data exchange would be the bottleneck (similar to that caused by the inter-node RAM bandwidth), so the InfiniBand (IB) network should be as fast as possible (i.e., as low latancy/high bandwidth as possible).

Therefore, a cluster consisting of the machines above (each would serves as a node in the cluster) and a small IB network would be what we want.

The budget shouldn't be too high (<$800 for each node)

A 100 or 200Gbps IB network is perhaps the good choice.

Theoretically, there may be serveral ways of easy setup of the network:
Case-1: a three-node cluster with ring topology,
Case-2: a three-node cluster with a star topology,
Case-3: a five-node cluster with ring topology,
Case-4: a five-node cluster with a star topology.

See the my illustrative image here for clearity:
https://www.cfd-online.com/Forums/at...1&d=1690535873

Note that
Cases 1,3 with a ring topology avoid the use of a expensive 100/200G IB switch,
Case 1 features direct node-to-node connection,
Cases 2,5 with a star topology require the use of the IB switch and can thus be ruled out

I have no idea about how the IB network work and how to build a efficent one with limit budget.

Here are my questions:
1. Can the way of Cases 1 and 3 works without a IB switch, which is too expensive?

2. Can the 100 or 200Gbps IB network, regardless the use of a switch, achieve the goal of ~100-core near-linear scaling (suppose the parallel algrithm of code is efficent enough)?

3. If the inter-node latency is more important than the bandwidth, is the older low-latency 40 or the 56Gbps network competent?

Here are some Ansys demonstrations using the AMD EPYC CPUs and a 100 or 200Gbps IB network, which show good scaling:
https://www.cfd-online.com/Forums/at...1&d=1690535832

https://www.cfd-online.com/Forums/at...1&d=1690535908

Thanks!

-----------------------------------------------
Prices of IB hardware for reference (US$, from eBay)

40G QDR:
Switch: Mellanox IS5023 18-port, US$100
NIC: Mellanox ConnectX-3 dual port, US$25

56G FDR:
Switch: Mellanox SX6036 36-port: US$100
NIC: Mellanox Connect-IB dual port: US$50

100G EDR, PCIe 4.0x16/3.0x16:
Switch: Mellanox SB7800 36-port: US$1,700
NIC: Mellanox ConnectX-5 dual port: US$300

200G HDR, PCIe 4.0x16/3.0x16:
Switch: NVidia Mellanox QM8700, US$5,000 each
NIC: NVidia Mellanox ConnectX-6 dual port, US$600

400G NDR, PCIe 5.0x16/4.0x16:
Switch:
Switch: NVidia Mellanox QM9700, US$19,000
NIC: NVidia Mellanox ConnectX-7 dual port, US$800

*NIC: the Network Interface Card
Attached Images
File Type: png Benchmark Eypc7001.png (81.6 KB, 19 views)
File Type: jpg IB network.jpg (103.8 KB, 25 views)
File Type: png Benchmark Eypc7002.png (95.3 KB, 19 views)

Last edited by Freewill1; July 30, 2023 at 09:23. Reason: fix typo
Freewill1 is offline   Reply With Quote

Old   July 28, 2023, 13:10
Default
  #2
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 316
Rep Power: 12
wkernkamp is on a distinguished road
Quote:
Originally Posted by Freewill1 View Post
Hi, all,

We have five workstations at hand, each with
1. 1×32-core AMD EPYC 7532 CPU
2. 8×16GB DDR4 3200 RAM
3. 1×1TB Samsung 980 Pro SSD for storage


Here are my questions:
1. Can the way of Cases 1 and 3 works without a IB switch, which is too expensive?

2. Can the 100 or 200Gbps IB network, regardless the use of a switch, achieve the goal of ~100-core near-linear scaling (suppose the parallel algrithm of code is efficent enough)?

3. If the inter-node latency is more important than the bandwidth, is the older low-latency 40 or the 56Gbps network competent?

Thanks!
Q1: Yes an infiniband ring config works. I have done it.
Q2: Yes, the 100 or 200Gbps IB network can achieve linear scaling because your cluster is very small.
Q3: Yes, the older 40 or the 56Gbps network will be fine. In fact, you probably will approach your performance goal with 1Gbps ethernet. This is so, because normally, the network has to share only the boundary vectors between nodes. So the network bandwidth requirement is much less than the local core to dram bandwidth for a solution iteration.



If you misconfigure the cluster and each node has to reread grid info from a single node at every iteration the traffic would be larger. Normally, repeated reads of the same info from a disk or shared volume ends up cached in core memory so it has to be read only once and not every iteration.
wkernkamp is offline   Reply With Quote

Reply

Tags
cfd, cluster, infiniteband, scaling, self-built


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Fine/Turbo cannot open grid built by IGG, Configuration file problem? wkjshon Fidelity CFD 6 March 29, 2016 02:09
Compile Thirdparty-2.3.0 seav OpenFOAM Installation 9 March 23, 2014 16:08
ParaView 3.8.0 problem on debian Unseen OpenFOAM Installation 4 August 16, 2010 10:26
Compilation error with paraview quartzian OpenFOAM Installation 0 September 21, 2008 08:32
CFX 5.6 Built Neser CFX 2 December 15, 2004 22:21


All times are GMT -4. The time now is 11:53.