CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > OpenFOAM Running, Solving & CFD

Something weird encountered when running OpenFOAM in parallel on multiple nodes

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Display Modes
Old   May 2, 2013, 04:03
Default Something weird encountered when running OpenFOAM in parallel on multiple nodes
  #1
New Member
 
Qiu Xiaoping
Join Date: Apr 2013
Location: IPE CAS China
Posts: 8
Rep Power: 5
xpqiu is on a distinguished road
Hello everyone,
Recently I have encountered something weird when I tried to run OpenFOAM in parallel on multiple nodes. I got two nodes (A123 and A122),each with 8 cores. After a lot of trial and error, my case ran successfully in parallel with 16 processes. However, when I tried to listen network traffic between A123 and A122 during the running, something I think weird happened.
Now let me discribe my situation in detail, and hope you to give me some advices.


The following steps depict how I implemented parallel running on multiple nodes:

1
Both A123 and A122 have installed 64 bit CentOS 5.4 ,with OpenFOAM 2.1.1 installed(the openfoam package was download from centFOAM :http://sourceforge.net/projects/centfoam/files/5.x/, I just decompressed the tar ball and source the bashrc in ../etc/ . The thirdParty package was used, with openMPI version of 1.5.3). After those settings, OpenFOAM works fine, both serial case and parallel case on single node ran successfully.

2
Then I changed the ssh settings so as to let A123 and A122 can visit each other without typing password.(i.e. on A123(or A122), I can log in A122(or A123) by typing "ssh A122(or A123)" ,password inputting is not needed. )

3
I made a directory named "shares" on both A123 and A122 (mkdir -p ~/shares), and on A122 , I utilized "mount" to mount the "shares directory on A123" to "~/shares"( mount -t nfs A123:$HOME/shares ~/shares (as a root)). So the dicrectory "~/shares" on A123 are shared by A123 and A122.

4
I made a hostfile (named "hosts_2-8"), filled with the following words:
A123 cpu=8
A122 cpu=8

5
After that , I copied my case to "~/shares" on A123, ran blockMesh , setted up decomposeParDict ,ran decomposePar ,and finally ran "mpirun --hostfile hosts_2-8 -np 16 pisoFoam -parallel".

So far everything looked fine. From the log ,I can see that the CFD domain was split into 16 parcels, and the case was computed with 16 processes(8 on A123 and 8 on A122, one of the processes on A123 was the master ,the reset were slaves).

Then I tried to listen network traffic between A123 and A122 . I installed and ran "iftop"(http://www.ex-parrot.com/~pdw/iftop/) on A122, and it turned out that data packages exchange between A123 and A122 only occured at the moment when the running begun and ended , and between that ,not even a byte of data exchange could be listenned !

So far as I known, when OpenFOAM ran in parallel, there should be data exchange between processes, and then there should be network traffic between A123 and A122 all the time.

Did I make any mistake, either on the implement of paralle running on multiple nodes or on my understanding of MPI ? Or I should use another software to listen the network traffic? By the way, I also tried wireshark, and the result are the same. I even tried pullling out the Ethernet cable during the running, and the running aborted ,as expected.

I hope you to give me some advices,thank you !

xpqiu
xpqiu is offline   Reply With Quote

Old   May 2, 2013, 04:30
Default
  #2
Senior Member
 
Håkon Strandenes
Join Date: Dec 2011
Location: Norway
Posts: 111
Rep Power: 11
haakon will become famous soon enough
I guess one possible explanation is that OpenMPI might not use the TCP/IP protocol (actually SSH over TCP/IP) for anything else than establishing and closing network links on a lower layer in the OSI model. By establishing low-level data links it is possible to gain much better performance than using some IP-based protocols. Such communications might not be "spotted" by the iftop or similar software, which probably stack on top of the IP protocol.

Does the lights on your network interfaces (in the back of the machine) blink? If so, I guess everything is working. If not, you should check that the computations are running on both machines, and not just one. Try to do a ps -A | grep Foam on both machines to see all OpenFOAM applications running. Both machines should display the same number of applications (8).
haakon is offline   Reply With Quote

Old   May 2, 2013, 04:59
Default
  #3
New Member
 
Qiu Xiaoping
Join Date: Apr 2013
Location: IPE CAS China
Posts: 8
Rep Power: 5
xpqiu is on a distinguished road
Thanks haakon,
The network works fine,and after typing "ps -A |grep Foam ", I can see 8 processes named "pisoFoam" on both A123 and A122,so I think the running is just normal.
As regards protocol,I am not familiar with that, I will search on the net.Thank you for your information.

xpqiu
xpqiu is offline   Reply With Quote

Reply

Tags
multiple nodes, network traffic, parallel computation

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Can not run OpenFOAM in parallel in clusters, help! ripperjack OpenFOAM Running, Solving & CFD 5 May 6, 2014 15:25
parallel error with cyclic BCs for pimpleDyMFoam and trouble in resuming running sunliming OpenFOAM Bugs 21 November 22, 2013 04:38
Running in parallel Djub OpenFOAM Running, Solving & CFD 3 January 24, 2013 17:01
Parallel run of OpenFOAM in linux and windows side by side m2montazari OpenFOAM Running, Solving & CFD 5 June 24, 2011 03:26
running without rsh between nodes hattonps OpenFOAM 10 March 22, 2010 16:02


All times are GMT -4. The time now is 04:00.