|
[Sponsors] |
Parallel running on four T5610 workstations using a Infiniband switch. |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
April 2, 2019, 01:51 |
Parallel running on four T5610 workstations using a Infiniband switch.
|
#1 |
New Member
Dustin
Join Date: Sep 2011
Location: Earth
Posts: 23
Rep Power: 14 |
Hi everyone,
I am now trying to find a solution to run parallel Fluent on four workstations by using a configuration with IB switch. I know that a member (Ghost84?) has already posted a tutorial for interconnecting 2 workstations directly without a switch. But now it is 4 workstations, please share your solutions. So far what I know is we need IB cards (DDR CX4-SFF8470 20Gb/s is good or not?), IB switch (DDR? or any suggestion for a suitable switch), and 20Gb/s CX4 to CX4 cables. Thank you for your reading and sharing. My Dell T5610 specs are: Dual E5-2697 v2 12core 2.7ghz, quad channel 64GB DDR3-10600R. |
|
May 13, 2019, 21:45 |
configured IB without switch
|
#2 |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 311
Rep Power: 12 |
It is possible to run without switch. I did it with two ConnectX-3 pcie3 cards from Mellanox +/- $40 on ebay. The copper IB cables cost also about $40. Your speed will be 56 Gb/s when you have QDR FDR cards. Hardware cost total: $120.
Be careful that you: 1. get the IB cards and not the EN cards. The latter does not support IB, but the former does support Ethernet over IB. 2. The HP cards have a non-standard shape that does not match your "normal" pcie slot. The mellanox cards typically need a firmware update. This happens automatically with ofed install, but not when it is a mellanox card from ibm or oracle. However, you can turn an Oracle or IBM card back to a regular configuration with the following command: sudo flint --no_flash_verify -d /dev/mst/mt4099_pci_cr0 -i fw-ConnectX3-rel-2_42_5000-MCX354A-FCB_A2-A5-FlexBoot-3.4.752.bin --allow_psid_change burn replace mt4099_pci_cr0 with your applicable device; and download firmware "fw-...bin" as appropriate. Caution: firmware burning on your own can brick your card. Did not happen to me, but I don't want to be responsible for your card. You need opensm running on one of the machines. (That is the IB network manager) I ran this on ubuntu 18.10. For my application the speed-up was better than 2x, because the problem memory access is reduced on the individual machine (bottleneck), while the IB with direct memory access (rdma) is very fast. There are IB switches available on ebay for $300. Got one, because I am linking more than two machines. It works well but is very noisy. Note that you could also build a chain or ring with the individual cards if they are dual port. Each segment requires it own setup. I did not go that route, because it is difficult to add/subtract machines. I am also not sure whether the fabric remains as fast if you chain nodes in this way. If you try this, let me know! Good luck. |
|
October 2, 2019, 12:59 |
|
#3 | |
New Member
Qin Zhang
Join Date: May 2012
Posts: 10
Rep Power: 13 |
Quote:
is there possible for you to share the models of your IB card and IB switch? Therefore we can have a reference to setup our own. Is there necessarily to go for 56G? What are you currently using? 10G,20G,40G or 56G? Many thanks Qin |
||
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
running mapFields in parallel | mkhm | OpenFOAM Pre-Processing | 10 | September 16, 2021 13:12 |
[snappyHexMesh] Problem with boundaries with sHM in parallel running | Loekatoni | OpenFOAM Meshing & Mesh Conversion | 0 | January 24, 2019 07:56 |
[Other] blueCFD-Core-2016 user compiled solvers not running in parallel | sbence | OpenFOAM Installation | 10 | December 5, 2018 08:44 |
Error running openfoam in parallel | fede32 | OpenFOAM Programming & Development | 5 | October 4, 2018 16:38 |
Running CFX parallel distributed Under linux system with loadleveler queuing system | ahmadbakri | CFX | 1 | December 21, 2014 04:19 |