CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   CFX (https://www.cfd-online.com/Forums/cfx/)
-   -   Error depends on number of partitions. (https://www.cfd-online.com/Forums/cfx/126043-error-depends-number-partitions.html)

mrshives November 6, 2013 18:18

Error depends on number of partitions.
 
1 Attachment(s)
Hello;

Background:
I have a simulation with a rather complicated set of CEL expressions and user functions. It represents a lab-scale tidal turbine using an actuator disk approach. The turbine is not physically represented, rather its forces are calculated using the simulated flow-field and tabulated lift and drag coefficients. These forces are then applied to the control volumes within a subdomain.

The Problem:
When I try to run the simulation with 4 processors (Local MPI), the sim crashes and I get an error message (below). When I run on 3 partitions, the simulation runs perfectly.

Previous versions of the simulation ran fine on any number of partitions, but then I added some functionality (a "stall delay" model). To do this, I added two "Additional Variables" (Lcoeff, Dcoeff) which are defined within my domain using algebraic expressions. The stall delay correction uses the lift and drag coefficients at the rotor blade root, and half-span positions as part of its formulation. So, I created monitor points corresponding to those locations, and use "probe(Lcoeff)@mp1" functionality to get the values I need. It was after I added these functions and additional variables that I started to have problems.

The question:
Does anyone know why this might occur? One possible explanation is that monitor points need to be located in the same partition as the location where the expression using them is applied. But i'm not so sure about that.

So what?:
I suppose I could just run with 3 partitions, but I do plan to use this simulation methodology later-on for simulations of many turbine rotors using a cluster. If there are errors that randomly occur based on the number of partitions, it could be very problematic in the future. So i want to understand why this is happening.

CCL:
I've included an attachment of some of the relevant CCL. I think the problem is somewhere in here. This is just some of the ccl for the sim, and if you want to see more please let me know.

The (cryptic - as usual) error message:
+--------------------------------------------------------------------+
| ERROR #001100279 has occurred in subroutine ErrAction. |
| Message: |
| Floating point exception: Invalid number |
| |
| |
| |
| |
| |
+--------------------------------------------------------------------+

+--------------------------------------------------------------------+
| ERROR #001100279 has occurred in subroutine ErrAction. |
| Message: |
| Stopped in routine FPX: C_FPX_HANDLER |
| |
| |
| |
| |
| |
+--------------------------------------------------------------------+

+--------------------------------------------------------------------+
| An error has occurred in cfx5solve: |
| |
| The ANSYS CFX solver exited with return code 1. No results file |
| has been created. |
+--------------------------------------------------------------------+

ghorrocks November 6, 2013 21:03

First of all - a very nicely posed question. I wish all questions could be as clear as this one :)

Your idea that it could be the probe point linking to a region in another partition is possible. But on rare occasions CFX does give weird numerical errors on multiprocessor runs - so it might just be CFX weirdness and not something you can do much about.

A work-around which might be more palatable is to use a different partitioning algorithm. Usually there is little performance difference between the various algorithms so you can usually choose a different one and not suffer a noticeable speed loss. You might be able to choose an algorithm which keeps the point and the linked region in the same partition using the coordinate bisection approaches and that might avoid the problem for all number of partitions.

singer1812 November 7, 2013 10:40

Quick question for you to check,

Did you build this in CFX14 and try running it with CFX12.0 solver?

Your ccl file seems to indicate something like this.

mrshives November 8, 2013 16:38

I'm using 14.0. It is strange that the "results verison = 12.0."
I'm running locally and only have v14.0 installed.

mrshives November 8, 2013 16:39

Thanks Glen, In the end i'll use a different formulation that doesn't use the monitor points since it relies only on local values.


All times are GMT -4. The time now is 13:42.