CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   OpenFOAM Running, Solving & CFD (https://www.cfd-online.com/Forums/openfoam-solving/)
-   -   Parallel run crashing (https://www.cfd-online.com/Forums/openfoam-solving/58253-parallel-run-crashing.html)

matejfor December 4, 2008 08:15

Dear all, I have a problem w
 
Dear all,
I have a problem with a simulation crashing in parallel.

solver: oodles (OF 1.5.x)
what works: simulation runs in single, runs in parallel on local quad core

what is the problem: distributed parallel using mpich (tha latest) and SGE for distribution on dual and quad core LINUX cluster crashes after several timesteps. Not dependent on saving data.

crash happens with different solvers and geometries. Other codes using the same mpich has no problem. The crash is not a problem of convergence or solver, but a mpich communication between the different nodes.

Has anyone been facing the same problem? Any extra tweaking of mpich for openfoam?

any hint is welcomed.
thanks
matej

liuch May 10, 2009 20:05

Dear Matej,

We're facing the same problem on our 8-core xeon linux cluster. MPICH works very well on one single CPU but fails when >1 CPU. It seems a sys. problem & we're working with the Sys Admin try to solve it. Thanks for sharing.

Best regards,
Chun-Ho

matejfor May 11, 2009 02:09

Hi,

how we have solved the problem was to run it on a one machine, meanwhile our admin upgraded the kernels on the nodes and the problem somehow disappeared. Now we can run distributed, but with a very small effectivity, but it seems to be a network related problem.

matej

liuch May 11, 2009 02:18

Hi Matej,

Thanks for your reply. Yes, I pretty sure it's system related & directly connected to the network/MPI setting. We have another set of OpenFOAM using OpenMPI on dual-processor, quad-core machine but we never have such "SAE" problem.

Our sys. adm. is currently busy with some othe stuff. We'll work together later this week.

Best regards,
Chun-Ho


All times are GMT -4. The time now is 12:09.