CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM > OpenFOAM Running, Solving & CFD

Losing Log when running in parallel

Register Blogs Community New Posts Updated Threads Search

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   February 27, 2014, 18:37
Default Losing Log when running in parallel
  #1
New Member
 
David H.
Join Date: Oct 2013
Posts: 25
Rep Power: 12
djh2 is on a distinguished road
Hi everyone, I've recently been running some simulations on a cluster at my school for research.

I've been having an issue, which seems to come up when I try to restart a run, or continue one using the controlDict inputs as appropriate.

The issue though, is I lose my log even though I can see from "top" that my process is running on the head node, and by "lsload" that the work is being distributed across the other nodes as well.
By "lose my log", I mean it will usually print a header and items but then stops at the next step. Also, there does not seem to be any write output while these processors are spinning their bits with reckless abandon.

I'm running OpenFOAM using the following command, which I usually copy and paste:

Code:
mpirun -np 48 pimpleFoam -parallel > log &
then

Code:
tail -f log
to view the progress.

Any ideas?

Here's what I've got now:
Code:
/*---------------------------------------------------------------------------*\                                                                                                                            
| =========                 |                                                 |                                                                                                                            
| \\      /  F ield         | OpenFOAM: The Open Source CFD Toolbox           |                                                                                                                            
|  \\    /   O peration     | Version:  2.1.0                                 |                                                                                                                            
|   \\  /    A nd           | Web:      www.OpenFOAM.org                      |                                                                                                                            
|    \\/     M anipulation  |                                                 |                                                                                                                            
\*---------------------------------------------------------------------------*/                                                                                                                            
Build  : 2.1.0-bd7367f93311                                                                                                                                                                                
Exec   : pimpleFoam -parallel                                                                                                                                                                              
Date   : Feb 27 2014                                                                                                                                                                                       
Time   : 21:20:21                                                                                                                                                                                          
Host   : "Cluster1"                                                                                                                                                                                        
PID    : 21310

Last edited by djh2; February 27, 2014 at 21:43.
djh2 is offline   Reply With Quote

Old   February 28, 2014, 01:43
Default
  #2
Senior Member
 
Bernhard
Join Date: Sep 2009
Location: Delft
Posts: 790
Rep Power: 21
Bernhard is on a distinguished road
Can you show the complete submit script?

Also, some scheduler will also output two files: <jobname>.[o|e]<jobid>, this might give you some hints to what went wrong.

Final question, what is the layout of your system? 48 cpus on a single node? Or are you using multiple nodes?
Bernhard is offline   Reply With Quote

Old   February 28, 2014, 08:54
Default
  #3
New Member
 
David H.
Join Date: Oct 2013
Posts: 25
Rep Power: 12
djh2 is on a distinguished road
I'm not using a script to run the job (like Allrun), although maybe I should make one.

In general, this is my method:
1) scp my files from my desktop (remote) onto cluster
2) run blockMesh
3) run decomposePar
4) run the job in parallel using mpirun.
5) run tail to follow the log progress
6) run reconstructPar
7) run paraFoam

As I have said previously, I'm using
Code:
mpirun -np 48 pimpleFoam -parallel > log &
as the command to start my parallel job.

This is running on a cluster of five nodes with 12 processors each. I have had times where everything goes as you'd expect, tail -f log brings me the running logfile and I can watch the steps go. Other times, this doesn't behave and what you see at the end of my first post is all the log provides.

For example, I ran the simulation from 0 to 0.01 with 0.001 write intervals. Then I reconstructed the results, viewed them, (results looked okay). Then I modified the controlDict to a later endTime, since the simulation was on the right track. Then the problem occurs. However, I am not able to reproduce any "successful" runs, where it seems now even a clean copy of my files causes no log output.

Is there significance to having two spaces between "parallel" and ">"? I noticed this usage in another posted topic, and now it seems that my simulation is behaving.
Code:
mpirun -np 48 pimpleFoam -parallel  > log &
djh2 is offline   Reply With Quote

Old   February 28, 2014, 09:20
Default
  #4
Senior Member
 
Alexey Matveichev
Join Date: Aug 2011
Location: Nancy, France
Posts: 1,930
Rep Power: 38
alexeym has a spectacular aura aboutalexeym has a spectacular aura about
Send a message via Skype™ to alexeym
Hi,

Do I get it right: you don't have any batch system on the cluster? You just log in and run a simulation with
Code:
mpirun -np 48 pimpleFoam -parallel > log &
?

If it is a case then you're trying to run all of your 48 processes on one node cause I do not see --hostfile or --host option in your command.

Cause there're lots of possibilities for the behavior (you run all your processes on one node and it can't launch itself, there is NFS caching, so results in the log file do not appear immediately etc), you need to provide more information about environment you're using to suggest anything.
alexeym is offline   Reply With Quote

Old   February 28, 2014, 09:41
Default
  #5
New Member
 
David H.
Join Date: Oct 2013
Posts: 25
Rep Power: 12
djh2 is on a distinguished road
The host file seems to be managed by the cluster, when I run: (running simulation on 60 procs now, same cluster though)
Code:
mpirun -np 60 pimpleFoam -parallel  > log &
then later it shows:

Code:
 [6]+  Stopped                 mpirun -f /shared/opt/mpihosts -np 60 pimpleFoam -parallel > log
I looked into restarting a stopped process, so I type "bg" to continue the solution.

I think this might be the issue that I'm having, because even though I use
Code:
> log &
for the output, something I do in the terminal whether it is checking the load or viewing the log is causing the process to stop solving.

The strange part is, even though it was "Stopped", the load was still 100%.

I think we can chalk this one up to an "linux amateur" problem, and not software related. Thanks for your time and input.

Last edited by djh2; February 28, 2014 at 13:10.
djh2 is offline   Reply With Quote

Reply

Tags
parallel


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Issues running custom code in parallel BigBlueDart OpenFOAM Programming & Development 4 October 23, 2013 06:17
OpenFoam Parallel running shipman OpenFOAM Running, Solving & CFD 3 August 17, 2013 10:50
Problem in Running OpenFoam in Parallel himanshu28 OpenFOAM Running, Solving & CFD 1 July 11, 2013 09:19
Running PimpleDyMFoam in parallel paul b OpenFOAM Running, Solving & CFD 8 April 20, 2011 05:21
running in parallel, at time t>0 bunni OpenFOAM 1 October 21, 2010 09:34


All times are GMT -4. The time now is 20:46.