CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM > OpenFOAM Running, Solving & CFD

MPI + AMI mesh tutorial NOT working. tested anyone ?

Register Blogs Community New Posts Updated Threads Search

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   September 14, 2012, 10:28
Default MPI + AMI mesh tutorial NOT working. tested anyone ?
  #1
New Member
 
Giuliano Lotta
Join Date: May 2012
Posts: 12
Rep Power: 13
Giuliano69 is on a distinguished road
Hi,
we have a cluster of four WS, on which we positively run the motorbike tutorial on 16 cores. (MPI + nfs ver 3)
We had to downgrade nfs to version 3 to get things work, but with this parameter works flawlessly
Code:
/etc/fstab
192.168.0.17:/home/cfduser/OpenFOAM  /home/cfduser/OpenFOAM/  nfs _netdev,nfsvers=3,proto=tcp,noac,auto  0  0
But when we try the mixerVesselAMI2D case, the system gets stacked in createDynamicFvMesh.H. The main node get cpu% at 99% in sys time, and the slave get wait state at 100%, locked ....

Anyone have ever tried the tutorial with mixerVesselAMI2D MPI ?
Any idea what could lock the tutorial ?
Giuliano69 is offline   Reply With Quote

Old   September 15, 2012, 08:36
Default
  #2
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,975
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Hi Giuliano,

Bridging what we already know from your posts at http://www.cfd-online.com/Forums/ope...ple-nodes.html (posts #11 and 12), here's what I think might be responsible for all of this:
  1. Server side of the NFS might be misconfigured for this task. Here's what we use at work:
    Code:
    fsid=0,crossmnt,rw,no_root_squash,sync,no_subtree_check
    Caution: use "fsid" very carefully, or don't use it at all.
  2. On the client side, we simply use the "defaults" option. openSUSE translates this to:
    Code:
    rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=server_ip,mountvers=3,mountport=59044,mountproto=udp,local_lock=none,addr=server_ip
    server_ip is the numerical IP address which I've erased when posting
    I could see these options by running:
    Code:
    mount
  3. According to your diagnosis, the lock up occurs right at a time when all machines go load the mesh files, possibly the same exact file. This is why I think something is misconfigured, given that it won't allow more than 2 machines to access the same file via NFS.
Right now I don't have access to the cluster resources at work, so I can't test this tutorial case myself. So I'll post again either when I access to more resources or I stumble on something else about NFS.

edit: Seems to me that Ubuntu 12.04 and NFS don't seem to play well together : http://ubuntuforums.org/showthread.php?t=1478413

Best regards,
Bruno

PS: I moved this thread to where it seemed to fit better, namely in the "running" sub-forum.
__________________

Last edited by wyldckat; September 15, 2012 at 08:44. Reason: see "edit:"
wyldckat is offline   Reply With Quote

Old   October 12, 2012, 11:40
Default
  #3
New Member
 
Giuliano Lotta
Join Date: May 2012
Posts: 12
Rep Power: 13
Giuliano69 is on a distinguished road
Thanks Bruno for you kind help.

I'm surprised to see that you use, "no_root_squash" . Are they diskless clients ?

man exports
root user on a client machine is also treated as root when accessing files on the NFS servehave you also tried nfs 4?

We found MPI perfomance very poor on nfs with XFS.
May I ask you which kind of filesystem are you using ?

Any parallel filesystem test ? :-)
Giuliano69 is offline   Reply With Quote

Old   October 12, 2012, 12:07
Default
  #4
New Member
 
Giuliano Lotta
Join Date: May 2012
Posts: 12
Rep Power: 13
Giuliano69 is on a distinguished road
May I ask you a performance test ?

writing on a local directory I get 90Gb sec. We have all the 4 nodes on one HP switch 1910.

If I run
time dd bs=1M count=128 if=/dev/zero of=/home/cfduser/OpenFOAM/speedtest2 conv=fdatasync

I get a speed of 20 Gb/s
not so good, not so baad...

MAy I ask you your speed in your configuration ?

PS
conv=fdatasync gives real (longer) writing time, because wait for a write to end.
Giuliano69 is offline   Reply With Quote

Old   October 12, 2012, 16:47
Default
  #5
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,975
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Hi Giuliano,

Quote:
Originally Posted by Giuliano69 View Post
I'm surprised to see that you use, "no_root_squash" . Are they diskless clients ?
I vaguely remember it was due to some weird permissions problem... right now I keep it with that option, simply because "it works".

Quote:
Originally Posted by Giuliano69 View Post
have you also tried nfs 4?
I have tried many times and always failed Either because it was very new at the time, or I was very clumsy with it....

Quote:
Originally Posted by Giuliano69 View Post
We found MPI perfomance very poor on nfs with XFS.
May I ask you which kind of filesystem are you using ?
The standard ext3 and ext4. We haven't bothered with this, because writing to disk isn't the bottleneck for the cases we run.

Quote:
Originally Posted by Giuliano69 View Post
Any parallel filesystem test ? :-)
I've never done it. If you search the forum, you might find something...

Quote:
Originally Posted by Giuliano69 View Post
May I ask you a performance test ?

writing on a local directory I get 90Gb sec.
You mean 90MB/s, correct? 90 Megabyte per second?
Because 90 Gigabit per second would be... with todays technology... maybe an array of 10 SSDs in RAID0?

Quote:
Originally Posted by Giuliano69 View Post
MAy I ask you your speed in your configuration ?
Now I'm curious... I'll do the test when I can and report back!

Best regards,
Bruno
__________________
wyldckat is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Gambit problems Althea FLUENT 22 January 4, 2017 03:19
Moving mesh Niklas Wikstrom (Wikstrom) OpenFOAM Running, Solving & CFD 122 June 15, 2014 06:20
Sgimpi pere OpenFOAM 27 September 24, 2011 07:57
How to read a mesh from a directory other than the default tutorial directory mali28 OpenFOAM 1 July 6, 2011 14:08
[snappyHexMesh] snappyHexMesh won't work - zeros everywhere! sc298 OpenFOAM Meshing & Mesh Conversion 2 March 27, 2011 21:11


All times are GMT -4. The time now is 19:59.