CFD Online Discussion Forums

CFD Online Discussion Forums (http://www.cfd-online.com/Forums/)
-   Main CFD Forum (http://www.cfd-online.com/Forums/main/)
-   -   Parallel computation (http://www.cfd-online.com/Forums/main/13850-parallel-computation.html)

Shrinivas July 24, 2007 14:16

Parallel computation
 
Hi,

I am running a flow solver using MPI (i.e. parallel computing) The thing is that when its running it stops the calculation and the output file says

That a node(xxx): "waiting too long for completion"

can anyone tell me a solution to this. Has anyone encountered this problem before what is the remedy..

Thanks

Shrini

agg July 25, 2007 11:39

Re: Parallel computation
 
Could it be that the node xxx is waiting to receive data (blocking) from some other node, say yyy and there is no send posted by yyy to xxx ?

Use a debugger or insert print statements just before and after the receive statements to see where exactly it is getting stuck.

Shrinivas July 25, 2007 11:47

Re: Parallel computation
 
Thanks Agg,

Ok, Actually the solver runs for like ~10000 time steps and then the code does not respond/stalls and when i kill the job the output file displays that node(xxx) is waiting long for completion. I checked the SEND RECEIVE command too tht is doing fine. Is this anything to do with load balancing algorithm or something like that.

Also, when I check a particular status of a node....It echoes that the node is running and also the job is running...but again the output file from the code is not appended, this forces me to delete the job and the diagnosis report tells that 3 of four nodes are waiting to complete.

Thanks for the help,

br,

_shrini

agg July 26, 2007 14:37

Re: Parallel computation
 
Does the problem run to completion on one processor?

Load balancing may be a problem. However, why does the problem occur only after 10000 time steps? There must be some collective communication (e.g. time step calculation using allreduce) where all processors must wait at the end of each time step. The load balance problem should then be seen after each time step. You said you are using a flow solver. What variables are you computing? u,v,w,p,rho?

Shrinivas July 26, 2007 15:00

Re: Parallel computation
 
Thanks agg,

I mean it is not specific to 10000 time steps. This occurs abruptly. Also this problem does not occur always. I have encountered this problem 3/15 times that I have run the case.

Yes i am computing u,v,w rho and T. It is a incompressible flow solver with structured mesh blocks and unstructured decomposition of the blocks i.e.Adjacent blocks maybe oriented in a different way, i axis of one block coincides with j axis of another block.

Currently I am running 8 blocks on 8 processors and the load balancing is turned off.

thanks for everything

Shirnivas


All times are GMT -4. The time now is 04:01.