CFD Online Discussion Forums

CFD Online Discussion Forums (http://www.cfd-online.com/Forums/)
-   OpenFOAM Running, Solving & CFD (http://www.cfd-online.com/Forums/openfoam-solving/)
-   -   Problem running OF on cluster (http://www.cfd-online.com/Forums/openfoam-solving/106140-problem-running-cluster.html)

Rebecca513 August 20, 2012 11:54

Problem running OF on cluster
 
Dear all,

I am trying to run OF on cluster. I installed centFOAM, which has OF-2.1.1.

I have run the tutorial case pitzDaily, both parallel and non-parallel on the cluster without a problem.

But when I tried to run my own mesh, it gives me error:

--> FOAM FATAL IO ERROR:
wrong token type - expected Scalar, found on line 3 the word 'nan'

file: /scratch/gpfs/hangdeng/FOAM_Run/test1/system/data::solverPerformance::p at line 3.


From function operator>>(Istream&, Scalar&)
in file lnInclude/Scalar.C at line 91.

FOAM exiting



I thought my mesh might have problem. However, I ran the same mesh and case set-up on my workstation, everything is fine. On my workstation, it is OF-2.0.X.

I am not sure whether it is because of the version difference, or there is something more complicated that went wrong on the cluster installation.

If anyone has any idea or suggestion, I greatly appreciate it.

Thank you so much for your help.

Best,

Hang

Rebecca513 August 21, 2012 16:37

On the cluster, the error happened at time step 99, this is how it looks:

Time = 98

smoothSolver: Solving for Ux, Initial residual = 0.521635, Final residual = 0.00282749, No Iterations 7
smoothSolver: Solving for Uy, Initial residual = 0.610594, Final residual = 0.00331592, No Iterations 8
smoothSolver: Solving for Uz, Initial residual = 0.42209, Final residual = 0.00283999, No Iterations 7
GAMG: Solving for p, Initial residual = 0.30105, Final residual = 7.36962e+55, No Iterations 100
GAMG: Solving for p, Initial residual = 0.849026, Final residual = 3.30525e+93, No Iterations 100
time step continuity errors : sum local = 9.55575e+155, global = 6.81616e+147, cumulative = 6.81616e+147
ExecutionTime = 439.68 s ClockTime = 444 s

Time = 99

smoothSolver: Solving for Ux, Initial residual = 0.566129, Final residual = 0.000374191, No Iterations 2
smoothSolver: Solving for Uy, Initial residual = 0.63567, Final residual = 0.00245801, No Iterations 2
smoothSolver: Solving for Uz, Initial residual = 0.50241, Final residual = 0.000574002, No Iterations 2
GAMG: Solving for p, Initial residual = nan, Final residual = nan, No Iterations 100
GAMG: Solving for p, Initial residual = nan, Final residual = nan, No Iterations 100

It is obvious that the residuals for both p and U are too high, which generates this 'nan' error.

However, the log file on my workstation looks quite normal:

Time = 98

smoothSolver: Solving for Ux, Initial residual = 0.00059195, Final residual = 4.37997e-06, No Iterations 7
smoothSolver: Solving for Uy, Initial residual = 0.000782548, Final residual = 6.39388e-06, No Iterations 7
smoothSolver: Solving for Uz, Initial residual = 0.000572221, Final residual = 4.79188e-06, No Iterations 7
GAMG: Solving for p, Initial residual = 0.00922742, Final residual = 8.04101e-06, No Iterations 5
GAMG: Solving for p, Initial residual = 0.00850465, Final residual = 7.19074e-06, No Iterations 5
time step continuity errors : sum local = 5.29957e-05, global = -1.78685e-07, cumulative = -0.000110236
ExecutionTime = 499.88 s ClockTime = 500 s

Time = 99

smoothSolver: Solving for Ux, Initial residual = 0.000572748, Final residual = 4.24225e-06, No Iterations 7
smoothSolver: Solving for Uy, Initial residual = 0.000761879, Final residual = 6.23747e-06, No Iterations 7
smoothSolver: Solving for Uz, Initial residual = 0.000557909, Final residual = 4.68015e-06, No Iterations 7
GAMG: Solving for p, Initial residual = 0.00920748, Final residual = 8.09545e-06, No Iterations 5
GAMG: Solving for p, Initial residual = 0.00850217, Final residual = 7.26579e-06, No Iterations 5
time step continuity errors : sum local = 5.35504e-05, global = -2.14871e-07, cumulative = -0.000110451
ExecutionTime = 503.95 s ClockTime = 504 s

Given that the case set-ups are the same, I am not sure why the computation process has gone wrong on the server.

Can anyone give me some idea or suggestion? I truly appreciate it!

Thank you.

Best,

Hang

wyldckat August 21, 2012 16:59

Hi Hang,

From your first post, the address seems a bit strange:
Code:

system/data::solverPerformance::p
Are you using a customized solver?

On the cluster:
  • Does the error occur with only 1 machine or it doesn't matter how many machines you use?
  • Is the main "system" folder of the case visible to all nodes?

The differences shown in the second post are indeed very far apart; initial residuals are 1000 times smaller in your own machine with 2.0.x.

I believe CentFOAM still has an install option for 2.0.x as well. The other possibility would be to install 2.1.1 in your machine.
Other than you testing things on your side, we'll need at least to know:
  • What solver are you using, or at least based on which solver?
    • If you're using a custom solver, can your case work with the original solver?
  • How many cells or points does your mesh have?
  • Does the mesh have any cyclic, mapped, wedge or any other special boundary condition?
    • If it does, which decomposition method did you use?
  • Was the mesh generated in parallel or in serial (single-core)? This is mostly relevant in case it was made with snappyHexMesh.
  • Does running checkMesh in parallel give the same output with both versions of OpenFOAM?
Best regards,
Bruno

Rebecca513 August 21, 2012 17:17

Hi Bruno,

Thank you for the reply!

I tried to run parallel earlier (decomposed using simple method), it gave me similar errors, I thought the issue was related to parallel computation, so I instead tried to run the mesh on a single core. The errors in the posts are for the single-core run.

So,
  • Does the error occur with only 1 machine or it doesn't matter how many machines you use?
    I guess it doesn't matter how many machines I use.
  • Is the main "system" folder of the case visible to all nodes?
    I think so.

  • What solver are you using, or at least based on which solver?
    simpleFoam, no change to the solver has been made
  • How many cells or points does your mesh have?
    it is an unstructured mesh (not generated by snappyHexMesh), 67562 points and 338756 cells.
  • Does the mesh have any cyclic, mapped, wedge or any other special boundary condition?
    No

Thank you~

Best,

Hang

wyldckat August 22, 2012 17:58

Hi Hang,

If think you forgot to answer this question:
Quote:

Does running checkMesh in parallel give the same output with both versions of OpenFOAM?
Without an example case where we're able to reproduce this very same problem, then all that is left is for you to test this on your side, namely:
  • If you are using CentOS on your workstation, try installing 2.1.1.
  • Or try installing 2.0.x on your cluster.
I say this because there are simply too many changes made between the two versions of OpenFOAM, to be able to assess the one change that might have caused this to happen.

Although, the one detail that comes to mind is that the configuration of "fvSolution" might have some minor differences between the two versions. For example, if you run a command similar to this one:
Code:

diff -Nur ~/OpenFOAM/OpenFOAM-2.0.x/tutorials/incompressible/simpleFoam/pitzDaily ~/OpenFOAM/OpenFOAM-2.1.x/tutorials/incompressible/simpleFoam/pitzDaily
You'll a similar output to this one:
Code:

@@ -1,7 +1,7 @@
 /*--------------------------------*- C++ -*----------------------------------*\
 | =========                |                                                |
 | \\      /  F ield        | OpenFOAM: The Open Source CFD Toolbox          |
-|  \\    /  O peration    | Version:  2.0.0                                |
+|  \\    /  O peration    | Version:  2.1.x                                |
 |  \\  /    A nd          | Web:      www.OpenFOAM.org                      |
 |    \\/    M anipulation  |                                                |
 \*---------------------------------------------------------------------------*/
@@ -80,12 +80,18 @@
 
 relaxationFactors
 {
-    p              0.3;
-    U              0.7;
-    k              0.7;
-    epsilon        0.7;
-    R              0.7;
-    nuTilda        0.7;
+    fields
+    {
+        p              0.3;
+    }
+    equations
+    {
+        U              0.7;
+        k              0.7;
+        epsilon        0.7;
+        R              0.7;
+        nuTilda        0.7;
+    }
 }

As you can see, the relaxation parameters have been regrouped... er, wait, this does indeed look like what might be triggering the error you're getting! By default, the relaxation parameters might be set to 1 or higher than the ones you have in your case!




By the way, you can safely have more than one version of OpenFOAM on your machines. For example, instead of having this in "~/.bashrc":
Code:

source $HOME/OpenFOAM/OpenFOAM-2.0.x/etc/bashrc
You can have this:
Code:

alias of20x='source $HOME/OpenFOAM/OpenFOAM-2.0.x/etc/bashrc'

alias of210='source $HOME/OpenFOAM/OpenFOAM-2.1.0/etc/bashrc'

Then on each new terminal, run of210 or of20x to start the desired environment.


Best regards,
Bruno

Rebecca513 August 23, 2012 11:36

Hi Bruno,

Thank you for the reply.

I copied the system files from OF21 tutorial, and changed the values accordingly, but it is still giving me the same error.

I will try and install OF20 see if it works.

About
'Does running checkMesh in parallel give the same output with both versions of OpenFOAM?'

I am not sure how to run checkMesh in parallel, could you elaborate on that a little bit.

Thank you so much.

Best,

Hang

wyldckat August 23, 2012 15:04

Hi Hang,

Quote:

Originally Posted by Rebecca513 (Post 378350)
I copied the system files from OF21 tutorial, and changed the values accordingly, but it is still giving me the same error.

Unfortunately, this is one of those reasons why switching between OpenFOAM versions isn't straight forward. You should compare all of the tutorials you've based yourself on.

But still, my usual suggestion is to create a small and simple case that can reproduce the same error, then share it here on the forum. Usually, a modified tutorial does the trick. Of course the same steps should be taken for execution, whenever possible. For example, mapFields and so on.

Quote:

Originally Posted by Rebecca513 (Post 378350)
About
'Does running checkMesh in parallel give the same output with both versions of OpenFOAM?'

I am not sure how to run checkMesh in parallel, could you elaborate on that a little bit.

:confused: It's simple! The same way you run simpleFoam in parallel, you can run checkMesh as well! :D
For example, with foamJob:
Code:

foamJob -s -p checkMesh
foamJob -s -p simpleFoam

Best regards,
Bruno

Rebecca513 August 23, 2012 22:39

Hi Bruno,

I did checkMesh on my workstation and the cluster, the logs are uploaded to the link below:

Now the problem is that if I cut 1/10th of the mesh out and run it (using all the system files from OF20), it works on the cluster with and without -parallel. But when the mesh is larger, the problem starts to pop out.

I uploaded the case which failed on the cluster here: http://www.princeton.edu/~hangdeng/, I appreciate it if you could take a look.

Thank you.

Best,

Hang

wyldckat August 24, 2012 18:46

Hi Hang,

Access is restricted on that link. I can see the list of files, but I don't have permissions for downloading.

Cut 1/10th... do you mean that you're simulating only part of the whole volume, or the cell count is 1/10th (i.e., a coarser mesh)?

Best regards,
Bruno

Rebecca513 August 24, 2012 19:28

Hi Bruno,

Sorry for the confusion, I meant part of the mesh, not a coarser mesh.

Apologies that I didn't realize the link has the restriction.

Do you mind giving me you email address through private message so that I can share it with you through dropbox or google drive?


Thank you~

Best,

Hang

Rebecca513 August 24, 2012 19:51

Hi Bruno,

Never mind, I have changed the permission so that you should be able to download the files from this link:http://www.princeton.edu/~hangdeng/

Thank you.

Best,

Hang

wyldckat August 25, 2012 06:32

Hi Hang,

I've confirmed that this problem is triggered as soon as we switch from OpenFOAM 2.0.x to 2.1.0. I've tried doing some minor adjustments in "fvSchemes", reducing the relaxation parameters, even tried re-decomposing + using scotch; and tried converting the mesh using foamFormatConvert in case it was some sort of mesh incompatibility...
And nothing worked!

BUT! I've found an interesting solution :D polyDualMesh!

Here are the steps I've taken:
  1. Removed the processor folders.
  2. Changed decomposition from simple to scotch.
  3. Converted the mesh:
    Code:

    polyDualMesh 30 -overwrite
    This converted the mesh from "tetrahedra: 339501" to "polyhedra: 67664" :D
  4. Decomposed and ran:
    Code:

    decomposePar
    foamJob -p -s simpleFoam

  5. The case was solved at a blazing speed :D The case converged very fast (less than 100 iterations), even if the skew faces count went from 1 to 17 :rolleyes:

A few more notes on the changes needed from OpenFOAM 2.0 to 2.1:
  • As you've seen before, relaxation parameters we're regrouped.
  • Convergence control for your case should be something like this:
    Code:

    SIMPLE
    {
        nNonOrthogonalCorrectors 1;
        residualControl
        {
            p              1e-5;
            U              1e-5;
        }
    }



Conclusion: if you want, you can/should report this bug to the OpenFOAM team, since this seems to be a very strange numerical discrepancy, mainly due to the tetrahedral mesh. Sharing the case with them is crucial, since this seems to be a very isolated problem.
I think you already know, but in case you don't, the bug tracker for OpenFOAM is this one: http://www.openfoam.com/mantisbt/

Best regards,
Bruno

Rebecca513 August 25, 2012 17:25

Hello Bruno,

Thank you soooo much!

polyDualMesh works! At least for the single-core case.

But I was not able to run decomposePar with scotch. I followed this tutorial (http://web.student.chalmers.se/group...elLucchini.pdf) in setting up the dict file:
\*---------------------------------------------------------------------------*/
FoamFile
{
version 2.0;
format ascii;
class dictionary;
location "system";
object decomposeParDict;
}
// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //

numberOfSubdomains 4;

method scotch;

simpleCoeffs
{
n ( 2 1 1 );
delta 0.001;
}

hierarchicalCoeffs
{
n ( 2 2 1 );
delta 0.001;
order xyz;
}

metisCoeffs
{
processorWeights
{
1
1
1
1
};
}

manualCoeffs
{
dataFile "";
}

distributed no;

roots ( );


// ************************************************** *********************** //

but it gives me errors:

Selecting decompositionMethod scotch


--> FOAM FATAL ERROR:
You are trying to use scotch but do not have the scotchDecomp library loaded.
This message is from the dummy scotchDecomp stub library instead.

Please install scotch and make sure that libscotch.so is in your LD_LIBRARY_PATH.
The scotchDecomp library can then be built in $FOAM_SRC/parallel/decompose/decompositionMethods/scotchDecomp

Am I missing something?

Relating to polyDualMesh:
(1) could you please elaborate on the number '30'. I actually posted a thread (http://www.cfd-online.com/Forums/ope...ydualmesh.html) a while ago about polyDualMesh, where I used 60 but failed to convert the mesh.
(2) After the conversion, my understanding is that the geometry of the object should not be changed, right?

Also, I have other even larger and more complex meshes. I will try on the cluster, and let you know whether they work as well!

Thank you~

Best,

Hang

wyldckat August 26, 2012 08:33

Hi Hang,

It looks like "scotch" isn't built for some reason :(
Perhaps in the cluster with OpenFOAM 2.1.1 it is working as intended.

As for polyDualMesh: the value is the feature angle with which the converter works with when looking at the mesh. I basically got lucky, because the other value I had tried was 150 and was a lot worse. A few more indications on how to use it:
  • http://openfoamwiki.net/index.php/Po...esh_generation
  • From here: http://www.idurun.com/?p=367
    Quote:

    Where the feature angle is that the minimum angle between two faces.
  • Don't forget to run checkMesh after running polyDualMesh, to assess how good/bad the result is.
    edit: keep in mind that if the errors aren't very bad (skew faces <6 instead of 4), then it might work as intended!
  • And also check the other options:
    Code:

    polyDualMesh -help
And yes, the meshes should be identical, when it comes to geometrical representation. If not, then something went very wrong, possibly due to a bad feature angle.

Best regards,
Bruno

Rebecca513 August 26, 2012 18:09

Hello Bruno,

Thank you for your reply. That clears a lot of things up~

I tried scotch on the cluster, it is not working, I will see whether simple can be used as an alternative.

Best,

Hang


All times are GMT -4. The time now is 07:39.