CFD Online Discussion Forums

CFD Online Discussion Forums (http://www.cfd-online.com/Forums/)
-   FLUENT (http://www.cfd-online.com/Forums/fluent/)
-   -   Problem running fluent with InfiniBand (http://www.cfd-online.com/Forums/fluent/62822-problem-running-fluent-infiniband.html)

blackpuma March 20, 2009 16:48

Problem running fluent with InfiniBand
 
Good evening,

I hope someone can help me. I got a new small Cluster. The First one with Infiniband. Now i try to use fluent with InfiniBand but i got always a failure.

fluent_mpi.6.3.26: Rank 0:10: MPI_Init: dlopen failed: libmtl_common.so: cannot open shared object file: No such file or directory
fluent_mpi.6.3.26: Rank 0:10: MPI_Init: vapi_resolve_entrypoints() failed
fluent_mpi.6.3.26: Rank 0:10: MPI_Init: Can't initialize RDMA device
fluent_mpi.6.3.26: Rank 0:10: MPI_Init: MPI BUG: Cannot initialize RDMA protocol

I can start fluent over Ethernet without any problem.

Where i can get this file which is missing? libmtl_common.so

OS is CentOS 5.2

Good Bye
Blackpuma

shainer March 22, 2009 03:26

Have you run the subnet manager first for getting the IB network up?

blackpuma March 22, 2009 07:13

OpenSM on the headnode is running. A ibping worked. Have I to install this programm on every node?

Can someon tell me where i can get the file libmtl_common.so? In which paket the file is included?

shainer March 23, 2009 13:13

You can send email to hpc@mellanox.com, and they will be able to help you. This email is of the HCP Advisory Council help desk (free .. :-) )

Chinmay August 3, 2009 14:22

even i have the same problem with following error

Host spawning Node 0 on machine "cl1n004" (unix).
/home/cfd/FLUENT/Fluent.Inc/fluent6.3.26/bin/fluent -r6.3.26 3ddp -node -alnx86 -t16 -pib -mpi=hp -cnf=parallel -mport 10.0.1.4:10.0.1.4:38940:0
Starting /home/cfd/FLUENT/Fluent.Inc/fluent6.3.26/multiport/mpi/lnx86/hp/bin/mpirun -prot -vapi -e MPI_HASIC_VAPI=1 -e MPI_USE_MALLOPT_SBRK_PROTECTION=1 -e MPI_USE_MALLOPT_AVOID_MMAP=1 -f /tmp/fluent-appfile.25401
fluent_mpi.6.3.26: Rank 0:0: MPI_Init: ERROR: The total amount of memory that may be pinned (3355443 bytes), is insufficient to support even minimal rdma network transfers. This value was derived by taking 20% of physical memory (134217728 bytes) and dividing by the number of local ranks (8). A minimum of 14688256 bytes must be able to be pinned. These values can be changed by setting the environment variables MPI_PIN_PERCENTAGE and MPI_PHYSICAL_MEMORY (Mbytes).
fluent_mpi.6.3.26: Rank 0:0: MPI_Init: Error intializing pin/unpin structures
fluent_mpi.6.3.26: Rank 0:0: MPI_Init: MPI BUG: Cannot initialize RDMA protocol
MPI Application rank 0 exited before MPI_Init() with status 1
fluent_mpi.6.3.26: Rank 0:8: MPI_Init: ERROR: The total amount of memory that may be pinned (3355443 bytes), is insufficient to support even minimal rdma network transfers. This value was derived by taking 20% of physical memory (134217728 bytes) and dividing by the number of local ranks (8). A minimum of 14688256 bytes must be able to be pinned. These values can be changed by setting the environment variables MPI_PIN_PERCENTAGE and MPI_PHYSICAL_MEMORY (Mbytes).
fluent_mpi.6.3.26: Rank 0:8: MPI_Init: Error intializing pin/unpin structures
fluent_mpi.6.3.26: Rank 0:8: MPI_Init: MPI BUG: Cannot initialize RDMA protocol
MPI Application rank 8 exited before MPI_Init() with status 1
fluent_mpi.6.3.26: Rank 0:2: MPI_Init: ERROR: The total amount of memory that may be pinned (3355443 bytes), is insufficient to support even minimal rdma network transfers. This value was derived by taking 20% of physical memory (134217728 bytes) and dividing by the number of local ranks (8). A minimum of 14688256 bytes must be able to be pinned. These values can be changed by setting the environment variables MPI_PIN_PERCENTAGE and MPI_PHYSICAL_MEMORY (Mbytes).
fluent_mpi.6.3.26: Rank 0:2: MPI_Init: Error intializing pin/unpin structures
fluent_mpi.6.3.26: Rank 0:2: MPI_Init: MPI BUG: Cannot initialize RDMA protocol
MPI Application rank 1 killed before MPI_Init() with signal 15
MPI Application rank 2 exited before MPI_Init() with status 1
MPI Application rank 4 killed before MPI_Init() with signal 15
MPI Application rank 6 killed before MPI_Init() with signal 15
MPI Application rank 3 killed before MPI_Init() with signal 15
MPI Application rank 5 killed before MPI_Init() with signal 15
MPI Application rank 7 killed before MPI_Init() with signal 15
fluent_mpi.6.3.26: Rank 0:14: MPI_Init: ERROR: The total amount of memory that may be pinned (3355443 bytes), is insufficient to support even minimal rdma network transfers. This value was derived by taking 20% of physical memory (134217728 bytes) and dividing by the number of local ranks (8). A minimum of 14688256 bytes must be able to be pinned. These values can be changed by setting the environment variables MPI_PIN_PERCENTAGE and MPI_PHYSICAL_MEMORY (Mbytes).

blackpuma August 4, 2009 01:27

Good morning Chinmay!

Do you start fluent over Infiniband or Ethernet?

Try to set the hard an soft limit to unlimited. Therefor insert into the file /etc/security/limits.conf the 2 lines:

Code:

.
.
.
*              soft    memlock          unlimited
*              hard    memlock          unlimited
.
.
.

Insert this at all nodes.

Chinmay August 4, 2009 12:50

hi
Thanks for your help
I am trying to start fluent on Infiniband.
The hard and soft limits are already set to unlimited

blackpuma August 4, 2009 13:51

Are all Infiniband devices Active?

try ibstat

Code:

CA 'mlx4_0'
    CA type: MT25418
    Number of ports: 2
    Firmware version: 2.5.0
    Hardware version: a0
    Node GUID: 0x001e0bffff8446a4
    System image GUID: 0x001e0bffff8446a7
    Port 1:
        State: Active
        Physical state: LinkUp
        Rate: 20
        Base lid: 5
        LMC: 0
        SM lid: 1
        Capability mask: 0x02510868
        Port GUID: 0x001e0bffff8446a5
    Port 2:
        State: Down
        Physical state: Polling
        Rate: 10
        Base lid: 0
        LMC: 0
        SM lid: 0
        Capability mask: 0x02510868
        Port GUID: 0x001e0bffff8446a6

If not:

Have you opensm installed and is it running? It's the subnet manager.

Chinmay August 8, 2009 06:53

reply from ibstat:

CA 'mthca0'
CA type: MT25204
Number of ports: 1
Firmware version: 1.2.0
Hardware version: a0
Node GUID: 0x0008f1040397e9f0
System image GUID: 0x0008f1040397e9f3
Port 1:
State: Active
Physical state: LinkUp
Rate: 10
Base lid: 2
LMC: 0
SM lid: 1
Capability mask: 0x02510a68
Port GUID: 0x0008f1040397e9f1

I can run fluent using ethernet but not with infiniband

Chinmay August 8, 2009 06:58

Initially hpmpi was not installed, I have installed it now (ver. 2.3.1.), I installed it on the master node and two other nodes also, still couldn't float run using infiniband.

Stone August 28, 2011 01:16

Quote:

Originally Posted by Chinmay (Post 225116)
even i have the same problem with following error

Host spawning Node 0 on machine "cl1n004" (unix).
/home/cfd/FLUENT/Fluent.Inc/fluent6.3.26/bin/fluent -r6.3.26 3ddp -node -alnx86 -t16 -pib -mpi=hp -cnf=parallel -mport 10.0.1.4:10.0.1.4:38940:0
Starting /home/cfd/FLUENT/Fluent.Inc/fluent6.3.26/multiport/mpi/lnx86/hp/bin/mpirun -prot -vapi -e MPI_HASIC_VAPI=1 -e MPI_USE_MALLOPT_SBRK_PROTECTION=1 -e MPI_USE_MALLOPT_AVOID_MMAP=1 -f /tmp/fluent-appfile.25401
fluent_mpi.6.3.26: Rank 0:0: MPI_Init: ERROR: The total amount of memory that may be pinned (3355443 bytes), is insufficient to support even minimal rdma network transfers. This value was derived by taking 20% of physical memory (134217728 bytes) and dividing by the number of local ranks (8). A minimum of 14688256 bytes must be able to be pinned. These values can be changed by setting the environment variables MPI_PIN_PERCENTAGE and MPI_PHYSICAL_MEMORY (Mbytes).
fluent_mpi.6.3.26: Rank 0:0: MPI_Init: Error intializing pin/unpin structures
fluent_mpi.6.3.26: Rank 0:0: MPI_Init: MPI BUG: Cannot initialize RDMA protocol
MPI Application rank 0 exited before MPI_Init() with status 1
fluent_mpi.6.3.26: Rank 0:8: MPI_Init: ERROR: The total amount of memory that may be pinned (3355443 bytes), is insufficient to support even minimal rdma network transfers. This value was derived by taking 20% of physical memory (134217728 bytes) and dividing by the number of local ranks (8). A minimum of 14688256 bytes must be able to be pinned. These values can be changed by setting the environment variables MPI_PIN_PERCENTAGE and MPI_PHYSICAL_MEMORY (Mbytes).
fluent_mpi.6.3.26: Rank 0:8: MPI_Init: Error intializing pin/unpin structures
fluent_mpi.6.3.26: Rank 0:8: MPI_Init: MPI BUG: Cannot initialize RDMA protocol
MPI Application rank 8 exited before MPI_Init() with status 1
fluent_mpi.6.3.26: Rank 0:2: MPI_Init: ERROR: The total amount of memory that may be pinned (3355443 bytes), is insufficient to support even minimal rdma network transfers. This value was derived by taking 20% of physical memory (134217728 bytes) and dividing by the number of local ranks (8). A minimum of 14688256 bytes must be able to be pinned. These values can be changed by setting the environment variables MPI_PIN_PERCENTAGE and MPI_PHYSICAL_MEMORY (Mbytes).
fluent_mpi.6.3.26: Rank 0:2: MPI_Init: Error intializing pin/unpin structures
fluent_mpi.6.3.26: Rank 0:2: MPI_Init: MPI BUG: Cannot initialize RDMA protocol
MPI Application rank 1 killed before MPI_Init() with signal 15
MPI Application rank 2 exited before MPI_Init() with status 1
MPI Application rank 4 killed before MPI_Init() with signal 15
MPI Application rank 6 killed before MPI_Init() with signal 15
MPI Application rank 3 killed before MPI_Init() with signal 15
MPI Application rank 5 killed before MPI_Init() with signal 15
MPI Application rank 7 killed before MPI_Init() with signal 15
fluent_mpi.6.3.26: Rank 0:14: MPI_Init: ERROR: The total amount of memory that may be pinned (3355443 bytes), is insufficient to support even minimal rdma network transfers. This value was derived by taking 20% of physical memory (134217728 bytes) and dividing by the number of local ranks (8). A minimum of 14688256 bytes must be able to be pinned. These values can be changed by setting the environment variables MPI_PIN_PERCENTAGE and MPI_PHYSICAL_MEMORY (Mbytes).


hi,
Have you solve your problem? I encountered the same problem recently, if you solved it, can you help me out of puzzle ,I will appreciate it .

mali28 September 20, 2012 05:35

Quote:

Originally Posted by Stone (Post 321943)
hi,
Have you solve your problem? I encountered the same problem recently, if you solved it, can you help me out of puzzle ,I will appreciate it .

See the solution below:
http://www.eureka.im/1717.html


All times are GMT -4. The time now is 06:07.