CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

Parallel running on four T5610 workstations using a Infiniband switch.

Register Blogs Community New Posts Updated Threads Search

Like Tree2Likes
  • 2 Post By wkernkamp

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   April 2, 2019, 01:51
Default Parallel running on four T5610 workstations using a Infiniband switch.
  #1
New Member
 
Dustin
Join Date: Sep 2011
Location: Earth
Posts: 23
Rep Power: 14
DungPham is on a distinguished road
Hi everyone,

I am now trying to find a solution to run parallel Fluent on four workstations by using a configuration with IB switch. I know that a member (Ghost84?) has already posted a tutorial for interconnecting 2 workstations directly without a switch. But now it is 4 workstations, please share your solutions. So far what I know is we need IB cards (DDR CX4-SFF8470 20Gb/s is good or not?), IB switch (DDR? or any suggestion for a suitable switch), and 20Gb/s CX4 to CX4 cables.
Thank you for your reading and sharing.

My Dell T5610 specs are: Dual E5-2697 v2 12core 2.7ghz, quad channel 64GB DDR3-10600R.
DungPham is offline   Reply With Quote

Old   May 13, 2019, 21:45
Default configured IB without switch
  #2
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 311
Rep Power: 12
wkernkamp is on a distinguished road
It is possible to run without switch. I did it with two ConnectX-3 pcie3 cards from Mellanox +/- $40 on ebay. The copper IB cables cost also about $40. Your speed will be 56 Gb/s when you have QDR FDR cards. Hardware cost total: $120.

Be careful that you:
1. get the IB cards and not the EN cards. The latter does not support IB, but the former does support Ethernet over IB.
2. The HP cards have a non-standard shape that does not match your "normal" pcie slot.

The mellanox cards typically need a firmware update. This happens automatically with ofed install, but not when it is a mellanox card from ibm or oracle. However, you can turn an Oracle or IBM card back to a regular configuration with the following command:

sudo flint --no_flash_verify -d /dev/mst/mt4099_pci_cr0 -i fw-ConnectX3-rel-2_42_5000-MCX354A-FCB_A2-A5-FlexBoot-3.4.752.bin --allow_psid_change burn

replace mt4099_pci_cr0 with your applicable device; and download firmware "fw-...bin" as appropriate.

Caution: firmware burning on your own can brick your card. Did not happen to me, but I don't want to be responsible for your card.

You need opensm running on one of the machines. (That is the IB network manager)

I ran this on ubuntu 18.10. For my application the speed-up was better than 2x, because the problem memory access is reduced on the individual machine (bottleneck), while the IB with direct memory access (rdma) is very fast.

There are IB switches available on ebay for $300. Got one, because I am linking more than two machines. It works well but is very noisy.

Note that you could also build a chain or ring with the individual cards if they are dual port. Each segment requires it own setup. I did not go that route, because it is difficult to add/subtract machines. I am also not sure whether the fabric remains as fast if you chain nodes in this way. If you try this, let me know!

Good luck.
DungPham and zhangqin200000 like this.
wkernkamp is offline   Reply With Quote

Old   October 2, 2019, 12:59
Default
  #3
New Member
 
Qin Zhang
Join Date: May 2012
Posts: 10
Rep Power: 13
zhangqin200000 is on a distinguished road
Quote:
Originally Posted by wkernkamp View Post
It is possible to run without switch. I did it with two ConnectX-3 pcie3 cards from Mellanox +/- $40 on ebay. The copper IB cables cost also about $40. Your speed will be 56 Gb/s when you have QDR FDR cards. Hardware cost total: $120.

Be careful that you:
1. get the IB cards and not the EN cards. The latter does not support IB, but the former does support Ethernet over IB.
2. The HP cards have a non-standard shape that does not match your "normal" pcie slot.

The mellanox cards typically need a firmware update. This happens automatically with ofed install, but not when it is a mellanox card from ibm or oracle. However, you can turn an Oracle or IBM card back to a regular configuration with the following command:

sudo flint --no_flash_verify -d /dev/mst/mt4099_pci_cr0 -i fw-ConnectX3-rel-2_42_5000-MCX354A-FCB_A2-A5-FlexBoot-3.4.752.bin --allow_psid_change burn

replace mt4099_pci_cr0 with your applicable device; and download firmware "fw-...bin" as appropriate.

Caution: firmware burning on your own can brick your card. Did not happen to me, but I don't want to be responsible for your card.

You need opensm running on one of the machines. (That is the IB network manager)

I ran this on ubuntu 18.10. For my application the speed-up was better than 2x, because the problem memory access is reduced on the individual machine (bottleneck), while the IB with direct memory access (rdma) is very fast.

There are IB switches available on ebay for $300. Got one, because I am linking more than two machines. It works well but is very noisy.

Note that you could also build a chain or ring with the individual cards if they are dual port. Each segment requires it own setup. I did not go that route, because it is difficult to add/subtract machines. I am also not sure whether the fabric remains as fast if you chain nodes in this way. If you try this, let me know!

Good luck.
Hi,
is there possible for you to share the models of your IB card and IB switch? Therefore we can have a reference to setup our own. Is there necessarily to go for 56G? What are you currently using? 10G,20G,40G or 56G?

Many thanks

Qin
zhangqin200000 is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
running mapFields in parallel mkhm OpenFOAM Pre-Processing 10 September 16, 2021 13:12
[snappyHexMesh] Problem with boundaries with sHM in parallel running Loekatoni OpenFOAM Meshing & Mesh Conversion 0 January 24, 2019 07:56
[Other] blueCFD-Core-2016 user compiled solvers not running in parallel sbence OpenFOAM Installation 10 December 5, 2018 08:44
Error running openfoam in parallel fede32 OpenFOAM Programming & Development 5 October 4, 2018 16:38
Running CFX parallel distributed Under linux system with loadleveler queuing system ahmadbakri CFX 1 December 21, 2014 04:19


All times are GMT -4. The time now is 22:27.