|
[Sponsors] |
Seeking an overview and tips to Infiniband + Ansys (Windows 10) |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
December 1, 2017, 09:23 |
Seeking an overview and tips to Infiniband + Ansys (Windows 10)
|
#1 |
Member
Join Date: Jul 2011
Posts: 53
Rep Power: 14 |
Hi,
So I've got approval for the purchase of a 32-core setup for Ansys Fluent and CFX (see this thread: Hardware check: 32 core setup for CFD). It will be 2 node setup, each node with Dual Xeon 8C CPUs. In order to maximize performance and to maintain future expandability, I want to set these two nodes up with Infiniband (direct connection without a switch). As I start delving into the world of Infiniband I am starting to realise it can get complicated for a non-network engineer. I will, however, have to setup Infiniband on my own (without the help of my firm's IT department). I plan on using the Mellanox MCX353A-FCBT ConnectX-3 VPI Single-Port QSFP FDR IB PCIe card. The choice of card is limited because I have to purchase new from Dell, and its either this or a much more expensive EDR adapter. I hope to use Windows 10 Enterprise on both of the nodes. So far I have read that:
I currently run my Fluent and CFX runs on a parallel distributed setup using HP Platform MPI 9.1.4.2. As far as I can tell, HP Platform MPI 9.1.4.2 does not support WinOF versions past 2.1. Thus, I will have to switch to Intel MPI in order to use the latest version of WinOF (5.35). I appreciate any tips and pointers you may have |
|
December 1, 2017, 15:15 |
|
#2 |
Senior Member
Join Date: Mar 2009
Location: Austin, TX
Posts: 160
Rep Power: 18 |
If you're not using IPoIB, then there are no IP addresses associated with the Infiniband cards. Fluent and whatever MPI you end up using will use the ethernet connection to negotiate the RDMA connection.
It should be pretty easy. Just plug in the cable, install WinOF, fire up an OpenSM instance and start Fluent. The correct MPI library should be selected automatically. |
|
December 2, 2017, 11:58 |
|
#3 | |
Member
Join Date: Jul 2011
Posts: 53
Rep Power: 14 |
Quote:
So is this the wrong way to set things up? (Looks like it uses IPoIB because he sets up IPv4 addresses for the Infiniband cards?): NEW TUTORIAL: setting 2-node cluster with infiniband (WIN7) |
||
December 4, 2017, 09:49 |
|
#4 |
Senior Member
Erik
Join Date: Feb 2011
Location: Earth (Land portion)
Posts: 1,167
Rep Power: 23 |
You can set up IPoIB, and assign IP addresses. It will just be there in addition to the native Infiniband connection. I can transfer files over my IPoIB network with double the speed of the ethernet. But for solver, it uses the native Infiniband path, not TCP. You can check by opening Task Manager >> Networking, and it will show no traffic on ethernet or IPoIB during the solve, as everything is on the native infiniband line.
|
|
December 4, 2017, 10:18 |
|
#5 | |
Member
Join Date: Jul 2011
Posts: 53
Rep Power: 14 |
Quote:
Assuming my nodes are setup with both 10 GbE and native infiniband, but not IPoIB, will I just use the Ethernet based hostnames/IP-addresses of my nodes to initiate the run in CFX/Fluent, and then the MPI will initiate the native connection over the infiniband driver? |
||
December 5, 2017, 12:58 |
|
#6 |
Senior Member
Erik
Join Date: Feb 2011
Location: Earth (Land portion)
Posts: 1,167
Rep Power: 23 |
Yes, I just use the computer names, and it uses native infiniband.
If i specify the IP address of the IPoIB, it uses TCP over infiniband which sucks. If I specify the IP address of the gigabit LAN, it uses the standard gigabit network. If I turn off the infiniband connection and use the computer name, it uses the standard gigabit. It checks in a certain order if a connection is available or not, and moves to the next one on the list. You can specify interconnect order in the environmental variables. |
|
December 8, 2017, 03:17 |
|
#7 |
New Member
M-G
Join Date: Apr 2016
Posts: 28
Rep Power: 10 |
I could run Ansys Fluent with MS-MPI 8.1.1 successfully in Shared Memory on Local Machine. You cannot run it out of box on Ansys 18.2 version without some modifications:
Intel MPI environment path variables should be removed to prevent conflict of MS-MPI and Intel MPI mpiexec.exe program or uninstall Intel MPI. MS-MPI run-time should manually be copied to Ansys defined folder. But I couldn't run in Distributed Memory on Cluster even with-out Infiniband. If it works, then there would be no more problem for implementation of Ansys Fluent on Windows 10 with Infiniband Mellanox latest drivers. |
|
December 8, 2017, 08:38 |
|
#8 |
New Member
M-G
Join Date: Apr 2016
Posts: 28
Rep Power: 10 |
Hi guys,
Finally I could run MS-MPI on Windows 10 with infiniband Mellanox ConnectX-3 on clusters for ANSYS Fluent. Don't look for other MPIs as they are not supported for Infiniband on Windows since long ago. First, check that Intel MPI is not in PATH (environment variables) as both of Intel and MS uses mpiexec.exe file name. You could use the MPI packed with HPC 2012R2 but you should run smpd manually. It works fine without any major change. but in host file, never use this format: node01:4 node02:4 instead use format like below for MS-MPI node01 4 node02 4 If you like to use MS-MPI 8.1.1, there is a service that could be run automatically and you are free from initiating for smpd each time called "MS-MPI Lunch Service". But you should add "Bin" folder from "C:\Program Files\Microsoft MPI\Bin" to "C:\Program Files\ANSYS Inc\v182\fluent\fluent18.2.0\multiport\mpi\win64\m s" in order to make it work on every node. of-course you should create "ms" folder yourself. MS-MPI packed with HPC 2016 is not tested yet. Good luck. Last edited by digitalmg; December 8, 2017 at 13:00. |
|
May 17, 2018, 11:18 |
|
#9 |
New Member
M-G
Join Date: Apr 2016
Posts: 28
Rep Power: 10 |
Windows 10 Build 1709 does not work with Micorsoft MPI and Mellanox Driver for me, But this problem no longer existed on Windows 10 Build 1803.
I strongly recommend a fresh installation of Windows 10 1803 and MS-MPI version 9.0.1 or newer and latest Mellanox driver to use Network Direct system in MPI communications. |
|
May 19, 2018, 03:54 |
|
#10 |
New Member
M-G
Join Date: Apr 2016
Posts: 28
Rep Power: 10 |
Fresh installation solved my case, not everything.
I'm preparing a graphical tutorial for implementation of infiniband on windows 10 and Ansys Fluent. The most important section is to get sure that two MS-MPI node could communicate each other with Network Direct and not TCP method. This could be done by a sample test program as per attached. MPI.zip simply change the name of node01 , node02 in text file to your computer name or IP address of TCP onboard LAN cards and execute it by command line. Leave IB card IPv4 as automatically assigned. they will get same Subnet invalid IP address after a couple of minutes. If everything goes fine, Network direct will be introduced as the communication interface. Now you can apply changes to Ansys for the next stage as described on previous posts. Best Regards |
|
May 25, 2018, 00:49 |
|
#11 |
New Member
M-G
Join Date: Apr 2016
Posts: 28
Rep Power: 10 |
Hello,
A friend recommended that "ndinstall i" could be run from command prompt with administrative access in-order to check whether Network Direct is installed or Not. It may help in cases that Mellanox drivers are not installed completely. |
|
January 2, 2020, 16:17 |
|
#12 | |
Member
Join Date: Jul 2011
Posts: 53
Rep Power: 14 |
Quote:
|
||
January 4, 2020, 02:56 |
|
#13 |
Member
Join Date: Jul 2011
Posts: 53
Rep Power: 14 |
I've got two nodes, each with Mellanox ConnectX-5 cards hooked up to a managed SX6012 switch.
Using WinOF-2 2.30.50000 on Windows Server 2019. The infiniband is working correctly as Network Direct and I'm seeing full bandwidth (56 Gbits) and correct latency (ca. 1 us) in the Mellanox benchmarks. I've spent some time trying to get things to work for CFX and Fluent. My conclusions: Intel MPI 2018.3.210 only supports using the infiniband via IPoIB. IBM Platform MPI 9.1.4.5 I can't get working whatsoever for shared parallel runs, not even over TCP/IP. Though this may be due to compatibility issues with Windows Server 2019. I've installed MS HPC Pack 2016 (which includes MS-MPI), and this appears to work correctly running as a head node and a compute node. However, with this special MS-MPI implementation you have to submit CFX and Fluent jobs directly to the scheduler, and you can't start simulations directly from CFX Solver Manager or the Fluent launcher (they don't appear to use the correct MPI hook, unlike the scheduler in HPC Pack 2016). digitalmg: could you give a more detailed description of how you got MS-MPI working under windows? it would be very much appreciated! |
|
January 6, 2020, 11:33 |
|
#14 |
New Member
M-G
Join Date: Apr 2016
Posts: 28
Rep Power: 10 |
Dear SLC
My experience is regarding Mellanox X3 card with WinOF 5.35 and Ansys Fluent. ( Never tried on CFX) Just install MS-MPI latest version on both PCs and follow above mentioned hints carefully. it should work for you. |
|
December 30, 2020, 11:47 |
|
#15 |
Member
dab bence
Join Date: Mar 2013
Posts: 47
Rep Power: 13 |
I think a useful addendum to this thread is that only some versions of windows supports RDMA
"Windows 10 Enterprise, Windows 10 Education, and Windows 10 Pro for Workstations now include SMB Direct client support." https://docs.microsoft.com/en-us/windows-server/storage/file-server/file-server-smb-overview This is a useful blog http://wchukov.blogspot.com/2018/05/infinibandrdma-on-windows.html |
|
January 5, 2021, 06:57 |
|
#16 | |
Member
Join Date: Jul 2011
Posts: 53
Rep Power: 14 |
Quote:
As well as Windows Server, of course. |
||
January 10, 2021, 06:05 |
|
#17 |
Member
dab bence
Join Date: Mar 2013
Posts: 47
Rep Power: 13 |
Indeed.
Did you manage to pipe clean the process of getting decent inter-node performance with fluent on Windows ? It seems odd that Intel MPI is the only ANSYS install option and does not provide the best performance. I have read in a couple of places that the overwhelming majority of Fluent clusters are Linux based, so I guess it could explain why Windows is not high on their priority list for cluster optimizing. |
|
January 30, 2021, 11:05 |
|
#18 | |
New Member
M-G
Join Date: Apr 2016
Posts: 28
Rep Power: 10 |
Quote:
The only solution for high performance scalable clustering in windows environment is MS-MPI with Mellanox cards. For two nodes you don't need InfiniBand and TCP works fine. but you cannot scale well more than two nodes without InfiniBand. You may migrate to Linux to support other MPI. |
||
March 15, 2021, 11:29 |
|
#19 | |
Member
Join Date: Jul 2011
Posts: 53
Rep Power: 14 |
Quote:
Yes, I got Fluent and CFX working perfectly on my now 5 node Windows Server 2019 cluster. Only option for full performance was Microsoft HPC Pack + MS-MPI + WinOF-2 + A managed infiniband switch (must use a managed switch when running WinOF-2). |
||
April 8, 2021, 16:00 |
|
#20 |
Senior Member
Erik
Join Date: Feb 2011
Location: Earth (Land portion)
Posts: 1,167
Rep Power: 23 |
SLC:
It looks like HPC Pack can be installed on Windows 10. Would you think Windows 10 could be used for a cluster using HPC pack as you did? Thanks, Erik |
|
|
|