CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   CFX (https://www.cfd-online.com/Forums/cfx/)
-   -   Fatal overflow in linear solver occur when execute solution in parallel (https://www.cfd-online.com/Forums/cfx/224753-fatal-overflow-linear-solver-occur-when-execute-solution-parallel.html)

karachun March 3, 2020 10:27

Fatal overflow in linear solver occur when execute solution in parallel
 
5 Attachment(s)
Hi, folks!

Let me introduce my problem and physics description first.

I`m performing sloshing analysis in 2D rectangular box (256x256 mm), caused by impulse acceleration load. Maximum acceleration is 8 g, ramped load, load duration is 70 ms.
Working liquids are air and water, both are incompressible. Turbulence model SST. I use Homogeneous multiphase model with Standard Free surface model option. I include Buoyancy model too. I set Surface tension model OFF.
I perform tutorial “Flow around bump” before my analysis and mainly use solver setting from this tutorial.
My computational domain is isolated, BC consist from four walls and two symmetry planes. I initialize transient simulation using expressions – set test water tank is half filed with water.
At convergence control I ser 3 to 5 coefficient loops and select High resolution scheme for all equations. I add Advanced option -> Volume Fraction Coupling -> Coupled. I use Double Precision for all runs.

Now let`s proceed to failure I faced.

At the very beginning of mesh convergence/timestep investigation I faced strange error. When I run problem in Serial I manage to solve it and obtain some results. Although, when I run solution in parallel, my solution diverge at first timestep. No matter if I solve first timesteps at single core and then restart to parallel I still get divergence.

Errorcode is
Code:

+--------------------------------------------------------------------+
 | ERROR #004100018 has occurred in subroutine FINMES.                |
 | Message:                                                          |
 | Fatal overflow in linear solver.                                  |
 +--------------------------------------------------------------------+

Therefore I have two questions:
1) What I`m doing wrong and how can I fix issue with parallel solution.
2) Is there any other mistakes in my physics/numeric setup?

I attach CCL file, two meshes (4 and 8 mm element size) and two output files – successful solution and failed one.

Thanks in advance!

Opaque March 3, 2020 12:46

If you want to investigate what might be the problem, you can take advantage of the fact you solved the problem in serial (1 core).

Run the problem in parallel using the same initial conditions/guess as in the serial run. Set both simulations to stop before the timestep that had failed in parallel before.

Now you should have to results files, as well as two output files.

Compare the two output files using a graphical file difference tool, so you can compare what is different between the two output files. Ignore the obvious things such as parallel settings, partitioning information (for now), etc.

Are the diagnostics of the solution steps the same, or close enough? If not, then you got something to investigate further. Both solutions, in theory, should proceed identically if the solution of the linear equations are identical.

Hope the above helps,

karachun March 3, 2020 13:11

Thanks for answer.
Unfortunately my simulation fail at second coefficient loop of first iteration.
Code:

  ======================================================================
 TIME STEP =    1 SIMULATION TIME = 1.0000E-04 CPU SECONDS = 1.684E+01
 ----------------------------------------------------------------------
 | SOLVING : Wall Scale                                              |
 ----------------------------------------------------------------------
 |      Equation      | Rate | RMS Res | Max Res |  Linear Solution |
 +----------------------+------+---------+---------+------------------+
 | Wallscale-Bulk      | 0.00 | 2.7E-04 | 2.7E-04 | 31.8  8.9E-02  OK|
 +----------------------+------+---------+---------+------------------+
 | Wallscale-Bulk      | 0.09 | 2.4E-05 | 1.6E-04 | 39.5  6.3E-02  OK|
 +----------------------+------+---------+---------+------------------+
 | Wallscale-Bulk      | 0.30 | 7.0E-06 | 4.8E-05 | 39.5  6.3E-02  OK|
 +----------------------+------+---------+---------+------------------+
 ----------------------------------------------------------------------
 COEFFICIENT LOOP ITERATION =    1              CPU SECONDS = 1.750E+01
 ----------------------------------------------------------------------
 |      Equation      | Rate | RMS Res | Max Res |  Linear Solution |
 +----------------------+------+---------+---------+------------------+
 | U-Mom-Bulk          | 0.00 | 9.6E-25 | 7.7E-24 |      8.2E+20  * |
 | V-Mom-Bulk          | 0.00 | 2.2E-23 | 1.8E-22 |      1.1E+21  * |
 | W-Mom-Bulk          | 0.00 | 0.0E+00 | 0.0E+00 |      0.0E+00  OK|
 | Mass-Water          | 0.00 | 2.5E-44 | 2.0E-43 |      5.2E+22  * |
 | Mass-Air            | 0.00 | 1.3E-45 | 1.0E-44 | 15.8  9.1E+22  * |
 +----------------------+------+---------+---------+------------------+
 | K-TurbKE-Bulk        | 0.00 | 9.6E-16 | 2.8E-14 | 10.6  4.7E-10  OK|
 | O-TurbFreq-Bulk      | 0.00 | 6.2E-02 | 1.0E+00 | 17.3  8.9E-07  OK|
 +----------------------+------+---------+---------+------------------+
 ----------------------------------------------------------------------
 COEFFICIENT LOOP ITERATION =    2              CPU SECONDS = 1.878E+01
 ----------------------------------------------------------------------
 |      Equation      | Rate | RMS Res | Max Res |  Linear Solution |
 +----------------------+------+---------+---------+------------------+
 
 +--------------------------------------------------------------------+
 | ERROR #004100018 has occurred in subroutine FINMES.                |
 | Message:                                                          |
 | Fatal overflow in linear solver.                                  |
 +--------------------------------------------------------------------+

Everything going to fail from beginning.
In addition - I run problem on other PC and error is gone - solution run normal in parallel. Maybe I should reinstall Ansys.

Opaque March 3, 2020 13:39

I would compare the output files between the successful run, and the failed run to understand what is different at the start of the run.

Similarly, the suggestion above applies for the diagnostics in the first coefficient loops. They must, in theory, be identical but there are subtle differences in parallel that should go away when converged.

Gert-Jan March 3, 2020 13:52

Did you try 2 or 3 partitions?
Did you try an alternative parallelization method like Recursive bisection (instead of Metis)?


Regs, Gert-Jan

karachun March 3, 2020 14:18

Gert-Jan, Opaque.
I will try tomorrow these recommendations.

karachun March 4, 2020 03:44

Opaque
Output files are different from the beginning. Look like solution already diverged during first coeff loop.

Failed parallel output
Code:

======================================================================
 TIME STEP =    1 SIMULATION TIME = 1.0000E-04 CPU SECONDS = 1.661E+01
 ----------------------------------------------------------------------
 | SOLVING : Wall Scale                                              |
 ----------------------------------------------------------------------
 |      Equation      | Rate | RMS Res | Max Res |  Linear Solution |
 +----------------------+------+---------+---------+------------------+
 | Wallscale-Bulk      | 0.00 | 2.1E-04 | 2.1E-04 | 46.9  1.0E-01  ok|
 +----------------------+------+---------+---------+------------------+
 | Wallscale-Bulk      | 0.10 | 2.1E-05 | 1.7E-04 | 46.9  1.3E-01  ok|
 +----------------------+------+---------+---------+------------------+
 | Wallscale-Bulk      | 0.34 | 7.3E-06 | 6.2E-05 | 46.9  1.3E-01  ok|
 +----------------------+------+---------+---------+------------------+
 ----------------------------------------------------------------------
 COEFFICIENT LOOP ITERATION =    1              CPU SECONDS = 1.734E+01
 ----------------------------------------------------------------------
 |      Equation      | Rate | RMS Res | Max Res |  Linear Solution |
 +----------------------+------+---------+---------+------------------+
 | U-Mom-Bulk          | 0.00 | 7.0E-12 | 7.9E-11 |      1.8E+08  F |
 | V-Mom-Bulk          | 0.00 | 1.3E-10 | 1.5E-09 |      1.8E+08  F |
 | W-Mom-Bulk          | 0.00 | 0.0E+00 | 0.0E+00 |      0.0E+00  OK|
 | Mass-Water          | 0.00 | 2.0E-18 | 2.3E-17 |      2.5E+09  F |
 | Mass-Air            | 0.00 | 6.3E-21 | 7.5E-20 | 15.7  4.1E+09  F |
 +----------------------+------+---------+---------+------------------+
 | K-TurbKE-Bulk        | 0.00 | 2.3E-06 | 3.4E-06 | 11.0  7.5E-10  OK|
 | O-TurbFreq-Bulk      | 0.00 | 1.2E-01 | 1.0E+00 | 12.8  3.5E-15  OK|
 +----------------------+------+---------+---------+------------------+

Normal serial output
Code:

======================================================================
 TIME STEP =    1 SIMULATION TIME = 1.0000E-04 CPU SECONDS = 2.055E+00
 ----------------------------------------------------------------------
 | SOLVING : Wall Scale                                              |
 ----------------------------------------------------------------------
 |      Equation      | Rate | RMS Res | Max Res |  Linear Solution |
 +----------------------+------+---------+---------+------------------+
 | Wallscale-Bulk      | 0.00 | 2.1E-04 | 2.1E-04 | 39.1  8.5E-02  OK|
 +----------------------+------+---------+---------+------------------+
 | Wallscale-Bulk      | 0.08 | 1.7E-05 | 3.6E-05 | 46.7  6.6E-02  OK|
 +----------------------+------+---------+---------+------------------+
 | Wallscale-Bulk      | 0.30 | 5.1E-06 | 1.1E-05 | 46.7  6.6E-02  OK|
 +----------------------+------+---------+---------+------------------+
 ----------------------------------------------------------------------
 COEFFICIENT LOOP ITERATION =    1              CPU SECONDS = 2.382E+00
 ----------------------------------------------------------------------
 |      Equation      | Rate | RMS Res | Max Res |  Linear Solution |
 +----------------------+------+---------+---------+------------------+
 | U-Mom-Bulk          | 0.00 | 2.9E-02 | 3.2E-01 |      2.0E-03  OK|
 | V-Mom-Bulk          | 0.00 | 4.1E-02 | 4.7E-01 |      1.8E-03  OK|
 | W-Mom-Bulk          | 0.00 | 0.0E+00 | 0.0E+00 |      0.0E+00  OK|
 | Mass-Water          | 0.00 | 5.8E-06 | 4.8E-05 |      5.1E-03  OK|
 | Mass-Air            | 0.00 | 2.6E-07 | 2.4E-05 | 15.8  1.5E-02  ok|
 +----------------------+------+---------+---------+------------------+
 | K-TurbKE-Bulk        | 0.00 | 3.0E-07 | 2.1E-06 |  8.6  1.5E-15  OK|
 | O-TurbFreq-Bulk      | 0.00 | 1.2E-01 | 1.0E+00 |  9.3  1.3E-16  OK|
 +----------------------+------+---------+---------+------------------+

Gert-Jan
I have tried different partitioning methods but solution still diverge at first iteration.
Some meshes (I have 8, 4 and 2 mm variants) run on 3 cores and some on 2 and 3 cores but fail at 4 cores. Coarse 8 mm case can run on all four cores.

Gert-Jan March 4, 2020 04:13

This is strange. I would ask ANSYS.

Also, in Post, I would check to see how the partitioning is done (look for partition number). I would partition in vertical or horizontal direction .

Btw, do you now have 1 element in the 3rd dimension? I would also perform a test with 2 elements.

karachun March 4, 2020 06:06

2 Attachment(s)
Use four elements per thickness for sure but still get divergence.
Check partition numbers on mesh – looks adequate.

ghorrocks March 4, 2020 18:03

If areas of very high gradients (such as free surfaces) align with partition boundaries you can get convergence problems. It is best to make sure partition boundaries do not align with free surfaces. Based on your images of the partitions you are using it appears this is contributing.

I would try other partitioning algorithms (eg recursive bisection) and check that they give you a better partition pattern. I would think vertical stripes would probably be a good pattern for you. But as your free surface sloshes around all over the place it might be challenging to find a partition shape which avoids the free surface for the entire run, you will have to compromise a bit there.

karachun March 5, 2020 04:45

Thanks.
Today, at one of test runs, I observe error that may confirm your statement. I run model on 3 cores and solution run ok for some time but then I got sudden divergence (at one timestep model run as usual and at other all diverge). When I change back to serial solver error is disappear.
I will try different partition methods and write here if I have success.
BTW if problem is in large gradients then it is possible to reduce these gradients somehow?
The goal of my calculation is to obtain pressure time history to use it in Finite Element Analysis. Therefore I can neglect some of physics that have minor impact on pressure at wall.
As I understand for this problem I should account two main features:
-) bulk flow of water;
-) pressure change inside tank.
I have already perform convergence study and can say that I can neglect turbulence effects and use laminar viscous model.
On the way is study homogeneous vs. inhomogeneous multiphase model. Best practices recommend use inhomogeneous for problems where interface didn’t remain constant but again – this interphase interaction may not effect on results that I want to obtain.

ghorrocks March 5, 2020 05:37

Quote:

I observe error that may confirm your statement.
I am not just a pretty face, you know :)

If your simulation is super-sensitive to the free surface lining up with the partition boundary this suggests your model is very numerically unstable. A free surface simulation in a square box should not be very numerically unstable - so your problem is likely to actually be poor model setup causing instability. So to fix the root cause you should improve the numerical stability.

Here are some tips:
* Double precision numerics
* Smaller timestep (how did you set the time step? Did you guess? If so then you guessed wrong)
* Improve mesh quality
* Better initial conditions
* Check the physics is correctly configured
* Tighter convergence tolerance.

ghorrocks March 5, 2020 05:44

Just had a look at your setup.
* You have a fixed time step size. Unless this is the result of a time step sensitivity study this will be wrong. I recommend you change to adaptive time stepping, converging on 3-5 coeff loops per iteration.
(Actually, your simulation reaches convergence later on in 3 or 4 coeff loops so your time step probably is not too far off for this convergence tolerance)
* You have min 3, max 5 coeff loops per iteration. Why have you done this? Set this to no minimum and max 10.
* Have you checked your convergence tolerance is adequate? You should do a sensitivity check on this.
* I see this is pseudo-2D simulation. In that case make the thickness in the z direction equal to the element size in the X or Y directions. This will make your elements closer to aspect ratio 1.

karachun March 6, 2020 03:08

5 Attachment(s)
I have tried to partition domain onto four vertical stripes and get failed but solution with three horizontal partitions (therefore whole initial free surface belong to one partition) run fine. But now I cannot be sure that at some time during simulation free surface location don’t cause this error again.

*Increase geometry thickness to make elements close to 1:1 - ready.

*I have changed timestepping control to automatic timestepping. Here is my timestep controls.
HTML Code:

TIME STEPS:
  First Update Time = 0.0 [s]
  Initial Timestep = 1e-6 [s]
  Option = Adaptive
  Timestep Update Frequency = 1
  TIMESTEP ADAPTION:
    Maximum Timestep = 0.001 [s] (based on input data discretization levet I need in FEA analisys)
    Minimum Timestep = 1e-10 [s]
    Option = Number of Coefficient Loops
    Target Maximum Coefficient Loops = 5
    Target Minimum Coefficient Loops = 3
    Timestep Decrease Factor = 0.8
    Timestep Increase Factor = 1.06
  END
...
SOLVER CONTROL:
...
CONVERGENCE CONTROL:
  Maximum Number of Coefficient Loops = 10
  Minimum Number of Coefficient Loops = 1
  Timescale Control = Coefficient Loops
END

I have attached ccl file with new model setup.

* I run sensitive study to determine adequate RMS Residual level.
Results on 1e-3 and 1e-4 are pretty close. I use time history of force acting on side wall and pressure at one point as convergence parameter. Result are pretty close.
Case with 5e-5 is still solving, I will update this question later, whec achieve results.

BTW I have noticed that solver control convergence only for main flow parameters like mass, momentum and volume fraction. On the other hand CFX allow turbulence residuals to fall much coarser. I didn’t assign special controls to turbulence residuals. Is it planned solver behavior?
For example I have target RMS 1e-5, flow parameters are converged, and turbulence residuals are 5e-5 for K and 3e-4 for Omega at third coeff. loop and solver don’t iterate further but start new timestep.

Gert-Jan March 6, 2020 03:28

My 50 cents:

If you run these kind of multiphase simulations, then convergence on residuals is quite hard. So, 1e-4 might be hard to reach. Better add multiple monitor points, like your pressure point. And monitor Pressure, velocity and volume fraction.
To make sure that within a timestep you reach convergence, switch on the option "Monitor Coefficient Loop Convergence". You can find this in the top of the CFX-Pre-tab Output Control>Monitor. This will give you the progression of the variables within a timestep. Best results are obtained if you have flatliners everywhere.

This also allows you to create a graph in the solver manager to plot these coefficient loops. I would also recommend to plot the time step size. You can do this by creating a monitor in CFX-Pre with the option "Expression” and variable “Time Step Size”. These things won't help the solver, but graphically shows you what the solver is doing and where it has difficulties.

ghorrocks March 6, 2020 04:27

Note that for this sensitivity analysis, rather than the normal approach of comparing important variables (like what you seem to be doing quite nicely) you should look at how numerically stable the result is.

Maybe consider choosing the most unstable configuration - 4 partitions on METIS appears to crash on your initial setup very early - so try tighter convergence and smaller time steps on this configuration and see if it does not crash.

karachun March 6, 2020 05:31

5 Attachment(s)
To Gert-Jan.
Thanks for advice, I`ll use Coeff Loop Convergence for further.
I have plot timestep already. For RME 1e-4 most of time timestep is equal or larger than 1e-4 but sometimes it fall to 1e-5.
Also I monitor Residuals history of main flow quantities (mass, momentum, volume fraction), most of the time convergence is ok, residuals are below desired level but sometimes there are “spikes” when solver cannot converge during 10 coeff loops. As I mentioned before – look like solver consider turbulence residuals or use much more loose convergence criteria for them.
Here is my additional convergence statistics.

To ghorrocs.
Unfortunately with four partitions solution, in 95% of cases, diverge at second coeff loop of first timestep, before Residuals metrics can be applied.
At this point I assume that is “unsafe” to launch solution with many cores. Even with tree cores and “horizontal stripes” partitions. Some variants (different mesh size, physics, residuals) are run normal on multicore and some can fail somewhere in the middle of solution time. I can not recognize system in these fails.

Gert-Jan March 6, 2020 05:39

You can also partition in a radial direction. Or use a certain direction. Why not try (1,1,0) or (1,2,0)? Then it is not in line with your free surface. Aternatively, use more elements.....

karachun March 6, 2020 08:55

3 Attachment(s)
While trying to launch solution on many cores I found another issue. According to documentation I set Pressure Level Information (place point inside air phase) and my results have changed dramatically. When I check pressure contour I don’t see any difference. It is strange because I have set pressure distribution using expression (hydrostatic pressure) and I suppose that initialization with expression is enough.

Opaque March 6, 2020 12:15

Have you looked into the previous output file for a warning regarding the pressure level information?

If you have a closed system, the pressure level is undefined. Some setup may get away without it, but the initial conditions are not guaranteed to define the level.

karachun March 6, 2020 13:35

Yes, I have this warning message. But point with coordinates (0.004, 0.252, 0.0) is placed in part, filled with air. And when I define Pressure Level Information manually, therefore I perform equivalent action, I suppose.
Code:

  +--------------------------------------------------------------------+
 |                  Reference Pressure Information                  |
 +--------------------------------------------------------------------+

 Domain Group: Default Domain
 
  Pressure has not been set at any boundary conditions.
  The pressure will be set to  0.00000E+00 at the following location:
  Domain      : Default Domain
  Node        :        1 (equation        1)
  Coordinates : ( 4.00000E-03, 2.52000E-01, 0.00000E+00).

 +--------------------------------------------------------------------+
 |                      ****** Notice ******                        |
 | This is a multiphase simulation in a closed system.                |
 | A global correction will be applied to the volume fractions to    |
 | accelerate mass conservation.                                      |
 +--------------------------------------------------------------------+

 Domain Group: Default Domain
 
  Buoyancy has been activated.  The absolute pressure will include
  hydrostatic pressure contribution, using the following reference
  coordinates: ( 4.00000E-03, 2.52000E-01, 0.00000E+00).

Here is part of ccl, when I set Pressure Level manually.

Code:

    PRESSURE LEVEL INFORMATION:
      Cartesian Coordinates = 0.128 [m], 0.192 [m], 0 [m]
      Option = Cartesian Coordinates
      Pressure Level = 0 [atm]
    END


karachun March 10, 2020 09:16

2 Attachment(s)
Small update – I have recalculate test cases using Coeff Loop control and my solution look stable. With automatic timesteping most of the timesteps converge less than 10 loops.
Based on pressure results I can judge that level of RMS residuals 1e-4 is adequate for this simulation.

karachun March 12, 2020 04:45

At this time I have check all numerical and physics setting.
Here is summary of my tests.
-) RMS Residuals level of 1e-4 is adequate.
-) Turbulence and surface tension can be neglected.
-) Homogeneous multiphase model is preferable unless whitepapers recommend to use inhomogeneous.
-) Manual placing point for Pressure Level Information is mandatory. Automatic pressure level point selection produce unrealistic results.
-) Running solution in parallel may or may not lead to convergence problems. Use parallel solution with caution.
I warn that my conclusions are applicable only to my simulation settings and only in context of my problem goals. They not fit all sloshing problems.

I thank Opaque, Gert-Jan and ghorrocs for help. I learned a lot of new CFX tricks from this thread from you all. Thank you!

ghorrocks March 12, 2020 05:02

Thanks for the summary.

It is quite rare for parallel simulations to be different to serial simulations. So your case is one of the rare exceptions.

Gert-Jan March 12, 2020 06:01

I just found out that the problems with alignment of free surfaces with partition divisions is recognized by CFX and is described in paragraph 7.18.5.9. in the Manual.

Ashkan Kashani April 25, 2023 22:43

3 Attachment(s)
Hello. I would appreciate your comments on the following.

Problem description: The flow underneath a floating stationary rectangular body.
For more modelling details: See the attached CCL file.
What's wrong? I'm facing the same divergence problem when doing parallel runs. I'm running the simulation on 128 cores. Everything starts off smoothly. But at some point during the transient solution, the linear solver starts to fail (as signified by "F", see Figure 1), which persists until the CFX solution crashes eventually with the following message:
+--------------------------------------------------------------------+
| ERROR #004100018 has occurred in subroutine FINMES. |
| Message: |
| Fatal overflow in linear solver. |
+--------------------------------------------------------------------+
As discussed above, I also suspect that the partitioning is to blame. My suspicion is supported by the fact that the free surface happens to coincide with some interfaces of the adjacent partitions, see Figure 2.

My questions:
1- In the case partitioning topology is involved, how to ensure other partitioning methods do a better job?
Quote:

Originally Posted by ghorrocks (Post 760525)
* I see this is pseudo-2D simulation. In that case make the thickness in the z direction equal to the element size in the X or Y directions. This will make your elements closer to aspect ratio 1.

2- Is the above a precautionary measure or necessary? So would it cause numerical instability in the CFX solver if the extrusion dimension exceeds the elements' planar size by far? Consider the case where only one element lies between the two lateral faces to which a symmetry boundary condition has been prescribed.

3- Any other recommendations to get it to converge more easily?

ghorrocks April 25, 2023 23:00

Making the element aspect ratio closer to 1 always improves the numerical stability. But whether your simulation has a problem with numerical stability depends on what you are modelling, how you have set the simulation up, mesh quality and many other factors. So some simulations will be very sensitive to this, and some will not.

Ashkan Kashani April 25, 2023 23:48

I would also appreciate your comment on the problem with partitioning.

ghorrocks April 26, 2023 00:17

You editted your question and changed it after I had answered it! Please don't do that in future. If you have another question or request further clarification please add a new post to the thread.

Yes, the partitioning might be affecting stability. Then just change the partitioning algorithm. There are many different partitioning algorithms, look in the documentation for available options.

FINMES error: See FAQ https://www.cfd-online.com/Wiki/Ansy...do_about_it.3F

Gert-Jan April 26, 2023 04:16

Not sure if it helps, but here I suggest to try to partition in x-direction, perpendicular to the free surface. In the solver manager/Define Run/Partitioner Tab you can select various methods from which a main axis direction is one of the options.

I would not start with 128 partitions, because that might lead to many thin slices, but with less. Just give it a try.

Ashkan Kashani April 28, 2023 01:37

Thank you ghorrocks and Gert-Jan. I've got two more questions regarding your responses.
1- Since in my pseudo-2D simulation all the elements share the same size in the direction of mesh extrusion, the resulting aspect ratio will have a broad range that may exceed 1 by far, no matter what value is set for the thickness (which is equal to extrusion length between symmetry planes). Given that, how to keep the aspect ratio close to 1 in order to improve numerical stability?
2- I would like to try partitions that are aligned in one specific direction (normal to the free surface in my setup). However, I can't find the User Specified Direction partitioning method among the options given for the command -part-mode in the documentation. The only options are 'metis-kway' (MeTiS k-way), 'metis-rec' (MeTiS Recursive Bisection), 'simple' (Simple Assignment), 'drcb' (Directional Recursive Coordinate Bisection), 'orcb' (Optimized Recursive Coordinate Bisection) and 'rcb' (Recursive Coordinate Bisection). How to set this up?
I would appreciate your help.

Gert-Jan April 28, 2023 02:21

Reagrding point 1, use the same mesh size everywhere and make your extrusion depth the same as your mesh size. There is no workaround here. But you can create larger mesh sizes in air and water, far away from the free surface. But when using the user specific direction partitioning mehtod, I don't know what will happen if you have less elements over the length of you sky and sea floor than partitions you have. Better perform a few tests here.

Regarding point 2, there are multiple options,
1) you can create an Execution control in CFX-Pre (Insert>Solver>Execution Control). There you have the same options as in the solver manager, so you can select the partioning method you like. This partitioning information is written to the definition file so already available when running from command line. No need the add additional settings.
2) if you use a results file to start from in the command line, and you do not specify anything, the solver will run the case with the same settings as the result file was run. The partitioning settings were written to the results file, so everything is already available.
3) you can extract the execution control settings from a clean definition file by typing:
cfx5cmds -read -def <file.def> -ccl <settings_def.ccl>
Do the same for a succesful results file:
cfx5cmds -read -def <file.res> -ccl <settings_res.ccl>
Then using a text editor copy and paste the execution control section from the results-ccl to the definition-ccl
Then overwrite the old settings in the definition file with the new settings by:
cfx5cmds -write -def <file.def> -ccl <settings_def.ccl>
and off you go.

Ashkan Kashani May 28, 2023 13:05

Quote:

Originally Posted by Ashkan Kashani (Post 848792)
Hello. I would appreciate your comments on the following.

Problem description: The flow underneath a floating stationary rectangular body.
For more modelling details: See the attached CCL file.
What's wrong? I'm facing the same divergence problem when doing parallel runs. I'm running the simulation on 128 cores. Everything starts off smoothly. But at some point during the transient solution, the linear solver starts to fail (as signified by "F", see Figure 1), which persists until the CFX solution crashes eventually with the following message:
+--------------------------------------------------------------------+
| ERROR #004100018 has occurred in subroutine FINMES. |
| Message: |
| Fatal overflow in linear solver. |
+--------------------------------------------------------------------+
As discussed above, I also suspect that the partitioning is to blame. My suspicion is supported by the fact that the free surface happens to coincide with some interfaces of the adjacent partitions, see Figure 2.

My questions:
1- In the case partitioning topology is involved, how to ensure other partitioning methods do a better job?


2- Is the above a precautionary measure or necessary? So would it cause numerical instability in the CFX solver if the extrusion dimension exceeds the elements' planar size by far? Consider the case where only one element lies between the two lateral faces to which a symmetry boundary condition has been prescribed.

3- Any other recommendations to get it to converge more easily?

I would like to report my finding on what appears to be causing the instability in my simulation case in the hope that somebody may find it helpful.
I tried different partitioning methods only to DELAY the solver failure, rather than getting rid of it indefinitely. Eventually, I realized that the instability issue is stemming from a large extrusion length, leading to very high aspect ratio elements that are apparently implicated in the solver failure. Fixing this stabilized the solution greatly.

ghorrocks May 28, 2023 18:48

Improving mesh quality always helps, and sometimes in ways you would not expect. Good to hear you got it working.

Ashkan Kashani June 23, 2023 12:12

Hello again :)
I've got another relevant inquiry so I'm posting it here. I appreciate your comments.
I have observed cases where the linear solver keeps failing ('F' is returned in the output file). Still, the transient solution seems to go on unaffected, i.e. the RMS values remain low and the monitor points (such as lift) do not show anything odd evolving.
1- Why does the linear solver failure not mess up the RMS values (RMS values are still well below the tolerance)? Are those two unrelated matters?
2- Under such circumstances, are the results still reliable regardless of the recurrent failure of the linear solver

ghorrocks June 23, 2023 23:20

You need to understand the structure of the solver to answer those questions.

When you do iterations in CFX you are seeing the outer loop of the solver. Each one of these iterations has the coefficients updated to account for the non-linear nature of the Navier Stokes equations. But at each iteration the non-linear parts of the Navier Stokes equations are linearised, which leaves you with a set of linear equations to solve. CFX uses a Multigrid solver to solve these linear equations. If you have studied numerical methods you would know that there are many linear equation solvers out there - from matrix inversion (which exactly solves the equations in one go, but the number of equations required is impractical for all but small problems) to iterative solvers (which iterate to the solution and can use far less calculations than direct solvers, but do not give an exact answer so require iterations until they are close enough).

So the "F" for the linear solver shows the Multigrid solver has not iterated to its specified accuracy. As the outer equations progress the inner linear solver is really just giving you better linearisation coefficients. The inner solution just needs to be good enough that the linearisation is better than the last outer iteration. This can be done with quite poor inner equation convergence. That is why the convergence reported on the inner equations is only 0.1 or 0.01, that is all which is required. And often coarser than that is adequate as well - which is why your simulation still converges despite the poor linear solver failure.

Of course if the linear solver does a really bad job and the linearisation gets worse then the outer loops are going to diverge and the run will crash. So you cannot go too far with this. But if the linear solver works moderately well that is often still enough for the outer loop to still converge.

Ashkan Kashani July 7, 2023 11:34

Thanks ghorrocks, As always, thoroughly answered.


All times are GMT -4. The time now is 13:44.