CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

Mellanox ConnectX-3 Infiniband problem with Platform MPI 9.1.3

Register Blogs Community New Posts Updated Threads Search

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   August 22, 2016, 02:16
Default Mellanox ConnectX-3 Infiniband problem with Platform MPI 9.1.3
  #1
New Member
 
M-G
Join Date: Apr 2016
Posts: 28
Rep Power: 9
digitalmg is on a distinguished road
Dears,
I have recently purchased two used Mellanox MCX353A-FCBT Infiniband VPI adapters.
On Winodws 7 x64 / Windows Server 2012 R2 / Windows 10 x64 , I have installed latest driver and firmware from Mellanox website.

Two cards are connected directly without switch and ofcourse by lunching opensm program, they work fine and I could ping each other with ibping.

Problem starts when I want them in Platform MPI 9.1.3 for work with ANSYS Fluent 17.1 regardless of windows version being used.
Unfortunately MPI does not detect card as it's forced by -IBAL switch.
Only TCP mode works fine.
The generate error is : (without -IBAL everything works fine but on TCP only )


C:\Users\Administrator>"%MPI_ROOT%\bin\mpirun" -mpi64 -IBAL -prot -netaddr 192.168.5.1 -hostlist ews15,ews17 c:\MPI\pp.exe
mpirun: Drive is not a network mapped - using local drive.
c:\MPI\pp.exe: Rank 0:1: MPI_Init: didn't find active interface/port
c:\MPI\pp.exe: Rank 0:1: MPI_Init: Can't initialize RDMA device
c:\MPI\pp.exe: Rank 0:1: MPI_Init: Internal Error: Cannot initialize RDMA protocol
MPI Application rank 0 exited before MPI_Init() with status 1
c:\MPI\pp.exe: Rank 0:0: MPI_Init: didn't find active interface/port

I have checked ibal.dll is located at Windows/system32

Any idea ?

Thanks
digitalmg is offline   Reply With Quote

Old   August 22, 2016, 10:05
Default
  #2
New Member
 
M-G
Join Date: Apr 2016
Posts: 28
Rep Power: 9
digitalmg is on a distinguished road
Dears,
As ghost82 said in another thread, WinOF 2.1.2 driver (which is no longer available) works fine and solved the problem.
But this case is for Windows 7 only, doesn't applicable for Windows 10 or Server 2012 R2
Further more, the release date of driver is 2010/9 , why this happens ?
Should IBM have updated it's MPI and it didn't ?
So what are their customers do ?

I'm confused. any Idea ?
digitalmg is offline   Reply With Quote

Old   August 22, 2016, 11:56
Default
  #3
Senior Member
 
Erik
Join Date: Feb 2011
Location: Earth (Land portion)
Posts: 1,167
Rep Power: 23
evcelica is on a distinguished road
I couldn't get my Connect-X cards working yet either, so I'm still using my infinihost-III cards, so I may not be too much help, but here are a few things to look at:

Did you cache your password for Platform MPI?
run the command
"%AWP_ROOT171%\commonfiles\MPI\Platform\9.1.3.1\Wi ndows\setpcmpipassword.bat"


Have you configured the firewall correctly to let through the proper programs on you public network?

evcelica is offline   Reply With Quote

Old   August 22, 2016, 12:15
Default
  #4
New Member
 
M-G
Join Date: Apr 2016
Posts: 28
Rep Power: 9
digitalmg is on a distinguished road
Dear Erik,
There is no problem with parameters. The problem is behind Platform MPI and Mellanox driver compatibility.
With WinOF 2.1.2 drive everything is OK on Windows 7 and ANSYS fluent speeds up to 210% with two nodes.
But my question is that after 2010, Mellanox released more than 10 drivers. Why they are not working with Platform MPI any more ?
digitalmg is offline   Reply With Quote

Old   August 26, 2016, 15:36
Default
  #5
New Member
 
M-G
Join Date: Apr 2016
Posts: 28
Rep Power: 9
digitalmg is on a distinguished road
Based on below link, IBM does not support WinOF later than 2.1
https://www.ibm.com/support/knowledg...platforms.html

I don't know the reason behind this.
Based on the link below, Intel MPI supports Mellanox WinOF 4.4 and Higher. Therefore there should be no problem for those who have ANSYS Fluent on Windows and Intel MPI beside new Mellanox Adapters.
https://software.intel.com/sites/def...es-windows.pdf
I have not tested it yet. Hope it works.

I don't understand why IBM Platform MPI does not upgrade it's MPI for support Mellanox new adapters in Windows ?
digitalmg is offline   Reply With Quote

Old   November 30, 2017, 17:21
Default
  #6
SLC
Member
 
Join Date: Jul 2011
Posts: 53
Rep Power: 14
SLC is on a distinguished road
Quote:
Originally Posted by digitalmg View Post
Based on below link, IBM does not support WinOF later than 2.1
https://www.ibm.com/support/knowledg...platforms.html

I don't know the reason behind this.
Based on the link below, Intel MPI supports Mellanox WinOF 4.4 and Higher. Therefore there should be no problem for those who have ANSYS Fluent on Windows and Intel MPI beside new Mellanox Adapters.
https://software.intel.com/sites/def...es-windows.pdf
I have not tested it yet. Hope it works.

I don't understand why IBM Platform MPI does not upgrade it's MPI for support Mellanox new adapters in Windows ?
Is this still an issue?

It looks like Platform MPI still only supports WinOF 2.1.

Did you get the Mellanox MCX353A cards working using Intel MPI?
SLC is offline   Reply With Quote

Old   December 2, 2017, 10:50
Default
  #7
New Member
 
M-G
Join Date: Apr 2016
Posts: 28
Rep Power: 9
digitalmg is on a distinguished road
Quote:
Originally Posted by SLC View Post
Is this still an issue?

It looks like Platform MPI still only supports WinOF 2.1.

Did you get the Mellanox MCX353A cards working using Intel MPI?
Hello,
Intel MPI comes with ANSYS 18.2.2, does not work with Mellanox WinOF 5.35 on Windows 10. Also I have Installed Latest Intel MPI ( 2018 ) on windows but it acts as service and ANSYS uses it's old 5.1.3.180 version for execution.
In Ansys Manual it is mentioned that Infiniband is only supported with MS-MPI on Windows Platform but there is no MS-MPI folder in "C:\Program Files\ANSYS Inc\v182\fluent\fluent18.2.0\multiport\mpi\win64" which leads to error when MS-MPI is selected. ( Of course I have installed Latest MS-MPI from Microsoft as service, but ansys executive files are not existed).

The only successful way is to use Windows 7 with Win-OF 2.1 Driver or Migrate to Linux.
digitalmg is offline   Reply With Quote

Old   December 2, 2017, 11:54
Default
  #8
SLC
Member
 
Join Date: Jul 2011
Posts: 53
Rep Power: 14
SLC is on a distinguished road
Quote:
Originally Posted by digitalmg View Post
Intel MPI comes with ANSYS 18.2.2, does not work with Mellanox WinOF 5.35 on Windows 10.
Do you get an error message when you try this combination?

According to the release notes of Intel MPI 5.1, Mellanox WinOF Rev 4.4 or higher is supported. Interestingly, the release notes mention compatibility with Windows Server, Windows 7, Windows 8, and Windows 8.1. It does not include Windows 10, but I wouldn't imagine it should be a problem...?

https://jp.xlsoft.com/documents/inte...es-windows.pdf
SLC is offline   Reply With Quote

Old   December 2, 2017, 12:13
Default
  #9
New Member
 
M-G
Join Date: Apr 2016
Posts: 28
Rep Power: 9
digitalmg is on a distinguished road
Quote:
Originally Posted by SLC View Post
Do you get an error message when you try this combination?

According to the release notes of Intel MPI 5.1, Mellanox WinOF Rev 4.4 or higher is supported. Interestingly, the release notes mention compatibility with Windows Server, Windows 7, Windows 8, and Windows 8.1. It does not include Windows 10, but I wouldn't imagine it should be a problem...?

https://jp.xlsoft.com/documents/inte...es-windows.pdf
As Intel MPI chooses available interconnect, I don't get any error but Ethernet is chosen by default. if I explicitly type infiniband in interconnect field, then I get the below error:
[0] MPI startup(): dapl fabric is not available and fallback fabric is not enabled
I have 2 ideas:
1- While the latest Intel MPI is installed on system (2018 Update 1 ) on two systems, a helloworld application should be tested by command line with dapl forced fabric. My guess is that Ansys is compiled with old Intel MPI and if this test works fine, we should wait for upcoming Ansys to solve this.
Make sure that even you install the latest Intel MPI, using setimpipassword.bat file for cashing the windows credential will lead to change the MPI Server location to Ansys embedded files which is 5.1.3.180. be careful about this.

2- As I asked Mellanox before regarding the supported MPI Platform (https://community.mellanox.com/thread/3402) they stated that MS-MPI is only supported. Idea 1 should be tested with MS-MPI and if it works fine, then ms-mpi files should be manually copied to the mentioned location and lunch Ansys again. it might work.
C:\Program Files\ANSYS Inc\v182\fluent\fluent18.2.0\multiport\mpi\win64\m s\Bin

If you have any idea, kindly let me know.
digitalmg is offline   Reply With Quote

Old   December 3, 2017, 10:48
Default
  #10
SLC
Member
 
Join Date: Jul 2011
Posts: 53
Rep Power: 14
SLC is on a distinguished road
http://www.ansys.com/-/media/ansys/c...-182.pdf?la=en

According to that document, CFX/Fluent under Windows 10 supports Infiniband with Intel MPI 5.1.3.

In the same document MS MPI is listed as only supported with Windows Server 2012.

So do we know for a fact that Mellanox WinOF 5.35 (latest version) does not support Intel MPI 5.1.3?

Have you contacted Ansys support?
SLC is offline   Reply With Quote

Old   December 4, 2017, 04:53
Default
  #11
SLC
Member
 
Join Date: Jul 2011
Posts: 53
Rep Power: 14
SLC is on a distinguished road
Have you tried to use either IBM Platform MPI 9.1.4 or Intel MPI 5.1.3 together with the OFED software from OpenFabrics.org (instead of using the Mellanox WinOF software)? The latest version is OFED 3.2

https://www.openfabrics.org/index.php/ofs-windows.html
SLC is offline   Reply With Quote

Old   December 8, 2017, 03:10
Default
  #12
New Member
 
M-G
Join Date: Apr 2016
Posts: 28
Rep Power: 9
digitalmg is on a distinguished road
Quote:
Originally Posted by SLC View Post
Have you tried to use either IBM Platform MPI 9.1.4 or Intel MPI 5.1.3 together with the OFED software from OpenFabrics.org (instead of using the Mellanox WinOF software)? The latest version is OFED 3.2

https://www.openfabrics.org/index.php/ofs-windows.html
OFED 3.2 could not be installed on Windows 10. if you want to use Windows 7 or Server 2008 R2, you don't have any problem with Mellanox old drivers.

On Windows 10, I have installed WinOF 4.95 till WinOF 5.35 on both nodes.
A Diagnostic tools from intel (IMB-RMA.exe) has been used With Intel MPI 2018.1 for each driver set. none of them could work with -DAPL switch.
it means Intel MPI dose not support WinOF 4.95 and newer versions on Windows 10.

MS-MPI 8.1.1 was successfully tested with WinOF 5.35 in Direct Connection mode. I could not integrate it with ANSYS Fluent yet but MS-MPI seems to be the only supported MPI for Windows 10 with Latest Mellanox Drivers.
digitalmg is offline   Reply With Quote

Old   January 22, 2019, 16:31
Default
  #13
New Member
 
Allen
Join Date: Dec 2018
Posts: 4
Rep Power: 7
aural is on a distinguished road
Should I launch opensm program on both pcs, or one will be just fine? Thank you!
Quote:
Originally Posted by digitalmg View Post
Dears,
I have recently purchased two used Mellanox MCX353A-FCBT Infiniband VPI adapters.
On Winodws 7 x64 / Windows Server 2012 R2 / Windows 10 x64 , I have installed latest driver and firmware from Mellanox website.

Two cards are connected directly without switch and ofcourse by lunching opensm program, they work fine and I could ping each other with ibping.
aural is offline   Reply With Quote

Old   January 23, 2019, 14:28
Default
  #14
New Member
 
M-G
Join Date: Apr 2016
Posts: 28
Rep Power: 9
digitalmg is on a distinguished road
Just one is enough.
digitalmg is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
[OpenFOAM.org] MPI compiling and version mismatch pki OpenFOAM Installation 7 June 15, 2015 16:21
Platform MPI and HP MPI installation? Whitebear CFX 0 February 18, 2013 02:57
Sgimpi pere OpenFOAM 27 September 24, 2011 07:57
Error using LaunderGibsonRSTM on SGI ALTIX 4700 jaswi OpenFOAM 2 April 29, 2008 10:54
Is Testsuite on the way or not lakeat OpenFOAM Installation 6 April 28, 2008 11:12


All times are GMT -4. The time now is 11:41.