CFD Online Discussion Forums

CFD Online Discussion Forums (http://www.cfd-online.com/Forums/)
-   OpenFOAM Running, Solving & CFD (http://www.cfd-online.com/Forums/openfoam-solving/)
-   -   MPI + AMI mesh tutorial NOT working. tested anyone ? (http://www.cfd-online.com/Forums/openfoam-solving/106999-mpi-ami-mesh-tutorial-not-working-tested-anyone.html)

Giuliano69 September 14, 2012 10:28

MPI + AMI mesh tutorial NOT working. tested anyone ?
 
Hi,
we have a cluster of four WS, on which we positively run the motorbike tutorial on 16 cores. (MPI + nfs ver 3)
We had to downgrade nfs to version 3 to get things work, but with this parameter works flawlessly
Code:

/etc/fstab
192.168.0.17:/home/cfduser/OpenFOAM  /home/cfduser/OpenFOAM/  nfs _netdev,nfsvers=3,proto=tcp,noac,auto  0  0

But when we try the mixerVesselAMI2D case, the system gets stacked in createDynamicFvMesh.H. The main node get cpu% at 99% in sys time, and the slave get wait state at 100%, locked ....

Anyone have ever tried the tutorial with mixerVesselAMI2D MPI ?
Any idea what could lock the tutorial ?

wyldckat September 15, 2012 08:36

Hi Giuliano,

Bridging what we already know from your posts at http://www.cfd-online.com/Forums/ope...ple-nodes.html (posts #11 and 12), here's what I think might be responsible for all of this:
  1. Server side of the NFS might be misconfigured for this task. Here's what we use at work:
    Code:

    fsid=0,crossmnt,rw,no_root_squash,sync,no_subtree_check
    Caution: use "fsid" very carefully, or don't use it at all.
  2. On the client side, we simply use the "defaults" option. openSUSE translates this to:
    Code:

    rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=server_ip,mountvers=3,mountport=59044,mountproto=udp,local_lock=none,addr=server_ip
    server_ip is the numerical IP address which I've erased when posting ;)
    I could see these options by running:
    Code:

    mount
  3. According to your diagnosis, the lock up occurs right at a time when all machines go load the mesh files, possibly the same exact file. This is why I think something is misconfigured, given that it won't allow more than 2 machines to access the same file via NFS.
Right now I don't have access to the cluster resources at work, so I can't test this tutorial case myself. So I'll post again either when I access to more resources or I stumble on something else about NFS.

edit: Seems to me that Ubuntu 12.04 and NFS don't seem to play well together :( : http://ubuntuforums.org/showthread.php?t=1478413

Best regards,
Bruno

PS: I moved this thread to where it seemed to fit better, namely in the "running" sub-forum.

Giuliano69 October 12, 2012 11:40

Thanks Bruno for you kind help.

I'm surprised to see that you use, "no_root_squash" . Are they diskless clients ?

man exports
root user on a client machine is also treated as root when accessing files on the NFS servehave you also tried nfs 4?

We found MPI perfomance very poor on nfs with XFS.
May I ask you which kind of filesystem are you using ?

Any parallel filesystem test ? :-)

Giuliano69 October 12, 2012 12:07

May I ask you a performance test ?

writing on a local directory I get 90Gb sec. We have all the 4 nodes on one HP switch 1910.

If I run
time dd bs=1M count=128 if=/dev/zero of=/home/cfduser/OpenFOAM/speedtest2 conv=fdatasync

I get a speed of 20 Gb/s
not so good, not so baad...

MAy I ask you your speed in your configuration ?

PS
conv=fdatasync gives real (longer) writing time, because wait for a write to end.

wyldckat October 12, 2012 16:47

Hi Giuliano,

Quote:

Originally Posted by Giuliano69 (Post 386306)
I'm surprised to see that you use, "no_root_squash" . Are they diskless clients ?

I vaguely remember it was due to some weird permissions problem... right now I keep it with that option, simply because "it works".

Quote:

Originally Posted by Giuliano69 (Post 386306)
have you also tried nfs 4?

I have tried many times and always failed :( Either because it was very new at the time, or I was very clumsy with it....

Quote:

Originally Posted by Giuliano69 (Post 386306)
We found MPI perfomance very poor on nfs with XFS.
May I ask you which kind of filesystem are you using ?

The standard ext3 and ext4. We haven't bothered with this, because writing to disk isn't the bottleneck for the cases we run.

Quote:

Originally Posted by Giuliano69 (Post 386306)
Any parallel filesystem test ? :-)

I've never done it. If you search the forum, you might find something...

Quote:

Originally Posted by Giuliano69 (Post 386311)
May I ask you a performance test ?

writing on a local directory I get 90Gb sec.

:eek: You mean 90MB/s, correct? 90 Megabyte per second?
Because 90 Gigabit per second would be... with todays technology... maybe an array of 10 SSDs in RAID0? ;)

Quote:

Originally Posted by Giuliano69 (Post 386311)
MAy I ask you your speed in your configuration ?

Now I'm curious... I'll do the test when I can and report back!

Best regards,
Bruno


All times are GMT -4. The time now is 22:00.