CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

Seeking an overview and tips to Infiniband + Ansys (Windows 10)

Register Blogs Members List Search Today's Posts Mark Forums Read

Like Tree1Likes
  • 1 Post By digitalmg

Reply
 
LinkBack Thread Tools Display Modes
Old   December 1, 2017, 10:23
Default Seeking an overview and tips to Infiniband + Ansys (Windows 10)
  #1
SLC
New Member
 
Join Date: Jul 2011
Posts: 26
Rep Power: 8
SLC is on a distinguished road
Hi,

So I've got approval for the purchase of a 32-core setup for Ansys Fluent and CFX (see this thread: Hardware check: 32 core setup for CFD).

It will be 2 node setup, each node with Dual Xeon 8C CPUs.

In order to maximize performance and to maintain future expandability, I want to set these two nodes up with Infiniband (direct connection without a switch).

As I start delving into the world of Infiniband I am starting to realise it can get complicated for a non-network engineer. I will, however, have to setup Infiniband on my own (without the help of my firm's IT department).

I plan on using the Mellanox MCX353A-FCBT ConnectX-3 VPI Single-Port QSFP FDR IB PCIe card. The choice of card is limited because I have to purchase new from Dell, and its either this or a much more expensive EDR adapter.

I hope to use Windows 10 Enterprise on both of the nodes.

So far I have read that:

  • I can install these cards in a PCIe x8 slot
  • Connect them with a QSFP FDR copper cable
  • Install the Mellanox driver (update firmware if required)
  • Install Mellanox OFED for Windows (WinOF 5.35 is latest version - lists Windows 10 as compatible OS)
  • Run the OpenSM service on one of my nodes to enable the Infiniband cards to talk to each other.
This is where I then get a bit lost.


  • Apparently I do not want to use IPoIB because of high latency and high computational demand. I do want to use the native Infiniband protocol (is this called RDMA?).
  • But if I don't use IPoIB, how do I assign IP addresses to the Infiniband cards and ensure they are on a separate subnet to my normal ethernet connections?
  • How do I ensure that the native protocol is used? Or is this up to the MPI software to activate?


I currently run my Fluent and CFX runs on a parallel distributed setup using HP Platform MPI 9.1.4.2. As far as I can tell, HP Platform MPI 9.1.4.2 does not support WinOF versions past 2.1. Thus, I will have to switch to Intel MPI in order to use the latest version of WinOF (5.35).


I appreciate any tips and pointers you may have
SLC is offline   Reply With Quote

Old   December 1, 2017, 16:15
Default
  #2
Senior Member
 
Join Date: Mar 2009
Location: Austin, TX
Posts: 153
Rep Power: 12
kyle is on a distinguished road
If you're not using IPoIB, then there are no IP addresses associated with the Infiniband cards. Fluent and whatever MPI you end up using will use the ethernet connection to negotiate the RDMA connection.

It should be pretty easy. Just plug in the cable, install WinOF, fire up an OpenSM instance and start Fluent. The correct MPI library should be selected automatically.
kyle is offline   Reply With Quote

Old   December 2, 2017, 12:58
Default
  #3
SLC
New Member
 
Join Date: Jul 2011
Posts: 26
Rep Power: 8
SLC is on a distinguished road
Quote:
Originally Posted by kyle View Post
If you're not using IPoIB, then there are no IP addresses associated with the Infiniband cards. Fluent and whatever MPI you end up using will use the ethernet connection to negotiate the RDMA connection.

It should be pretty easy. Just plug in the cable, install WinOF, fire up an OpenSM instance and start Fluent. The correct MPI library should be selected automatically.
Thanks for your reply.

So is this the wrong way to set things up? (Looks like it uses IPoIB because he sets up IPv4 addresses for the Infiniband cards?): NEW TUTORIAL: setting 2-node cluster with infiniband (WIN7)
SLC is offline   Reply With Quote

Old   December 4, 2017, 10:49
Default
  #4
Senior Member
 
Erik
Join Date: Feb 2011
Location: Earth (Land portion)
Posts: 751
Rep Power: 14
evcelica is on a distinguished road
You can set up IPoIB, and assign IP addresses. It will just be there in addition to the native Infiniband connection. I can transfer files over my IPoIB network with double the speed of the ethernet. But for solver, it uses the native Infiniband path, not TCP. You can check by opening Task Manager >> Networking, and it will show no traffic on ethernet or IPoIB during the solve, as everything is on the native infiniband line.
evcelica is offline   Reply With Quote

Old   December 4, 2017, 11:18
Default
  #5
SLC
New Member
 
Join Date: Jul 2011
Posts: 26
Rep Power: 8
SLC is on a distinguished road
Quote:
Originally Posted by evcelica View Post
You can set up IPoIB, and assign IP addresses. It will just be there in addition to the native Infiniband connection. I can transfer files over my IPoIB network with double the speed of the ethernet. But for solver, it uses the native Infiniband path, not TCP. You can check by opening Task Manager >> Networking, and it will show no traffic on ethernet or IPoIB during the solve, as everything is on the native infiniband line.
Ah ok, cool. My nodes will already be connected over 10 GbE.

Assuming my nodes are setup with both 10 GbE and native infiniband, but not IPoIB, will I just use the Ethernet based hostnames/IP-addresses of my nodes to initiate the run in CFX/Fluent, and then the MPI will initiate the native connection over the infiniband driver?
SLC is offline   Reply With Quote

Old   December 5, 2017, 13:58
Default
  #6
Senior Member
 
Erik
Join Date: Feb 2011
Location: Earth (Land portion)
Posts: 751
Rep Power: 14
evcelica is on a distinguished road
Yes, I just use the computer names, and it uses native infiniband.
If i specify the IP address of the IPoIB, it uses TCP over infiniband which sucks.
If I specify the IP address of the gigabit LAN, it uses the standard gigabit network.
If I turn off the infiniband connection and use the computer name, it uses the standard gigabit.
It checks in a certain order if a connection is available or not, and moves to the next one on the list. You can specify interconnect order in the environmental variables.
evcelica is offline   Reply With Quote

Old   December 8, 2017, 04:17
Default
  #7
New Member
 
M-G
Join Date: Apr 2016
Posts: 20
Rep Power: 4
digitalmg is on a distinguished road
I could run Ansys Fluent with MS-MPI 8.1.1 successfully in Shared Memory on Local Machine. You cannot run it out of box on Ansys 18.2 version without some modifications:

Intel MPI environment path variables should be removed to prevent conflict of MS-MPI and Intel MPI mpiexec.exe program or uninstall Intel MPI.
MS-MPI run-time should manually be copied to Ansys defined folder.

But I couldn't run in Distributed Memory on Cluster even with-out Infiniband.
If it works, then there would be no more problem for implementation of Ansys Fluent on Windows 10 with Infiniband Mellanox latest drivers.
digitalmg is offline   Reply With Quote

Old   December 8, 2017, 09:38
Default
  #8
New Member
 
M-G
Join Date: Apr 2016
Posts: 20
Rep Power: 4
digitalmg is on a distinguished road
Hi guys,
Finally I could run MS-MPI on Windows 10 with infiniband Mellanox ConnectX-3 on clusters for ANSYS Fluent.

Don't look for other MPIs as they are not supported for Infiniband on Windows since long ago.
First, check that Intel MPI is not in PATH (environment variables) as both of Intel and MS uses mpiexec.exe file name.

You could use the MPI packed with HPC 2012R2 but you should run smpd manually.
It works fine without any major change. but in host file, never use this format:
node01:4
node02:4

instead use format like below for MS-MPI
node01 4
node02 4

If you like to use MS-MPI 8.1.1, there is a service that could be run automatically and you are free from initiating for smpd each time called "MS-MPI Lunch Service". But you should add "Bin" folder from "C:\Program Files\Microsoft MPI\Bin" to "C:\Program Files\ANSYS Inc\v182\fluent\fluent18.2.0\multiport\mpi\win64\m s" in order to make it work on every node.
of-course you should create "ms" folder yourself.

MS-MPI packed with HPC 2016 is not tested yet.
Good luck.
hpvd likes this.

Last edited by digitalmg; December 8, 2017 at 14:00.
digitalmg is offline   Reply With Quote

Old   May 17, 2018, 11:18
Default
  #9
New Member
 
M-G
Join Date: Apr 2016
Posts: 20
Rep Power: 4
digitalmg is on a distinguished road
Windows 10 Build 1709 does not work with Micorsoft MPI and Mellanox Driver for me, But this problem no longer existed on Windows 10 Build 1803.
I strongly recommend a fresh installation of Windows 10 1803 and MS-MPI version 9.0.1 or newer and latest Mellanox driver to use Network Direct system in MPI communications.
digitalmg is offline   Reply With Quote

Old   May 18, 2018, 15:53
Default
  #10
New Member
 
Kate McKinnon
Join Date: Jun 2017
Posts: 4
Rep Power: 3
Kate McKinnon is on a distinguished road
Oh so a fresh installation can solve everything?
Kate McKinnon is offline   Reply With Quote

Old   May 19, 2018, 03:54
Default
  #11
New Member
 
M-G
Join Date: Apr 2016
Posts: 20
Rep Power: 4
digitalmg is on a distinguished road
Fresh installation solved my case, not everything.
I'm preparing a graphical tutorial for implementation of infiniband on windows 10 and Ansys Fluent.
The most important section is to get sure that two MS-MPI node could communicate each other with Network Direct and not TCP method.

This could be done by a sample test program as per attached.
MPI.zip

simply change the name of node01 , node02 in text file to your computer name or IP address of TCP onboard LAN cards and execute it by command line.
Leave IB card IPv4 as automatically assigned. they will get same Subnet invalid IP address after a couple of minutes.

If everything goes fine, Network direct will be introduced as the communication interface. Now you can apply changes to Ansys for the next stage as described on previous posts.

Best Regards
digitalmg is offline   Reply With Quote

Old   May 25, 2018, 00:49
Default
  #12
New Member
 
M-G
Join Date: Apr 2016
Posts: 20
Rep Power: 4
digitalmg is on a distinguished road
Hello,
A friend recommended that "ndinstall i" could be run from command prompt with administrative access in-order to check whether Network Direct is installed or Not.
It may help in cases that Mellanox drivers are not installed completely.
digitalmg is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT -4. The time now is 22:17.