OpenFoam- Infiniband - locks :(
We used on our simulation cluster OpenFOAM 1.4.x over Ethernet.
In order to make better use of our cluster and OpenFOAM parallelism we
decided to upgrade some nodes to infiniband and at the same time we upgraded
the OS from SuSE 10.1 to openSuSE 11.1. The storage used is a netapp
over NFS. The netapp resides on our public LAN.
The IB distribution used is OFED 1.5 and the frontend plays also the
role of Subnet Manager.
What we observe now are locks of the frontend and simulation nodes. This
locks are taking up to 10-15 minutes.
During this time no login onto the nodes and/or frontend are possible.
Some time the jobs are also dying during the locks.
nfsstat -c show a lot of retransmission on this computers. E.g:
Client rpc stats:
calls retrans authrefrsh
15873268 1651722867 0
Right now we are out of ideas ...any help will be greatly appreciated...
|All times are GMT -4. The time now is 10:26.|