CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   SU2 (https://www.cfd-online.com/Forums/su2/)
-   -   Code hangs while writing volume solution (https://www.cfd-online.com/Forums/su2/115374-code-hangs-while-writing-volume-solution.html)

grjmpower March 29, 2013 06:44

Code hangs while writing volume solution
 
2 Attachment(s)
Hi,

SU^2 is hanging up while writing the volume solution using version 2.0.2. (The surface solution seems to be written OK) This happens at the end of the run or if a WRT_SOL_FREQ is specified. I don't have this problem with the ONERA M6 case in the TestCases, but I can't see any major differences related to output options in the configuration files. This case does have around 3 million elements, so that might be a factor.

I've attached the configuration and output files.

Thanks,

Greg

economon April 11, 2013 14:02

Hi Greg,

Thanks for the feedback - we are aware of the limitations in version 2.0.2. This was a developer release, and the volume output routines are still being updated.

We will be releasing another developer version next week, and this version will be able to handle larger cases due to further improvements in the I/O routines. We are working to improve the parallel performance of the code in general (both memory & scalability), and we will continue to release these updates as we go over the next several months. When you get the chance, please try your cases with the updated versions.

Cheers,
Tom

dtucker April 13, 2013 11:53

Restart File...
 
4 Attachment(s)
This may or may not be an associated problem, but I've noticed in a recent run that a restart_flow.dat wasn't created.

I had modified my config file so that WRITE_SOL_FREQ = 999999 and I could be sure that the flow solution wouldn't be written until the solution had converged (from a previous run I expected this around 7000 iterations). I set it up this way because I am running 2.0.2 and knew the flow solution process would take a while.

Is there any relationship between WRITE_SOL_FREQ and the frequency/sequence of writing the restart file?? ...I didn't think there would be.

On a subsequent run I changed my convergence criteria to force it to write a flow solution and the restart file looks to have been created at the end of the process. I can't tell for sure when it was generated, but the last I looked yesterday evening it had not been.

Particulars:
- about 6.3 million cells in my mesh
- using 72 processors on a Cray XT5

See attachments for details:
Run 1 - first run config and output files...NO RESTART FILE WAS CREATED
Run 2 - second run config and output files...restart file was created

Thanks!
Dave

PS: I do understand 2.0.2 is in development, but I haven't had any luck with 2.0 due to memory issues. Thanks for your work thus far!

dtucker April 14, 2013 18:14

Follow-up
 
I've done a few subsequent runs and am finding the same thing; the restart file is not being generated until the run completes!

This could lead to time-wastage if something goes wrong with the run, would you let me know when the restart file should be generated, and how often it is updated?

Thanks!

Quote:

Originally Posted by dtucker (Post 420302)
This may or may not be an associated problem, but I've noticed in a recent run that a restart_flow.dat wasn't created...


economon April 18, 2013 13:46

Hi David (& Greg),

As the output routines have been changing quite a bit recently, I would recommend trying the new developer release (V2.0.3) that will be posted later today. Note that further updates to the output routines are still in the works, but they should be more stable with this release.

As for the restart files, they will indeed be generated according to the frequency specified with the WRT_SOL_FREQ option. Also, in V2.0.3, we have a new module named SU2_SOL which can generate a solution file given an SU2 mesh and restart file as input. This will be called automatically when using the parallel_computation.py script (only restarts are written which are merged into a solution at the end of the simulation). Lastly, if you want to completely turn off volume and surface solution writing during a simulation, please try the following options:

WRT_VOL_SOL= NO
WRT_SRF_SOL= NO

which will turn off all volume and surface solution file writing while still writing the restart files.

Thanks for trying SU2, and look for more updates to the output as we go. Hope this helps!
Tom

dtucker April 19, 2013 17:35

Ahhhh!
 
Thanks, I'm going to add those points to my "master config" file; it is good to be able to do preliminary runs without writing the flow/surface solutions.

I'm compiling 2.0.3 (rev1) now, and will follow-up here. (but if I forget to, no news is good news!)

Cheers!
Dave

dtucker April 21, 2013 15:27

Volume Solution
 
I'm editing this comment as new data is available. I have gotten a flow.dat file written, so I've answered my own question: it'll only be written once a converged restart file is available.

The process from start to finish to write the flow.dat took between 2-8 hours (I can't figure out how to determine more accurately - I was asleep). My mesh is about 6.4 million cells, should it take that long??

...I should add that this was using 12 processors.


Original Comment:
Quote:

Originally Posted by economon (Post 421602)
...As for the restart files, they will indeed be generated according to the frequency specified with the WRT_SOL_FREQ option. Also, in V2.0.3, we have a new module named SU2_SOL which can generate a solution file given an SU2 mesh and restart file as input. This will be called automatically when using the parallel_computation.py script (only restarts are written which are merged into a solution at the end of the simulation)...

Ok, so I'm getting an output of: surface_flow.csv, restart_flow17.dat, and history.plt. This is a so-far unconverged simulation, do I need the solution converged before the flow solution (volume) will be written?

...I don't understand your point above:
Quote:

Originally Posted by economon (Post 421602)
...(only restarts are written which are merged into a solution at the end of the simulation)...

Would you re-iterate the process and requirements of getting my hands on a volume flow solution? What is the syntax to call the SU2_SOL?

...I feel like I'm close, but missing something.

Thanks!
Dave

grjmpower April 21, 2013 15:34

Using 2.0.3:

I am also not getting any flow files created. The partition surface files are created, but no partition flow files. No combined surface flow file is created and then the SU2_SOL code hangs while it is writing the flow file (or that is what the output is indicating).

dtucker April 22, 2013 16:40

Volume Solution
 
Ok, I really abusing electrons here, but didn't want to lose any of the points...

I re-ran a group of simulations and actually timed it from "Writing flow solution" to "Exit Success". It was an hour for my 1 degree AoA case.

That still seems a bit long to me, is that unusual?

(that is now the only question I'd like addressed, my questions below are Overcome By Events):)

Quote:

Originally Posted by dtucker (Post 422153)
I'm editing this comment as new data is available. I have gotten a flow.dat file written, so I've answered my own question: it'll only be written once a converged restart file is available.

The process from start to finish to write the flow.dat took between 2-8 hours (I can't figure out how to determine more accurately - I was asleep). My mesh is about 6.4 million cells, should it take that long??

...I should add that this was using 12 processors.


Original Comment:


Ok, so I'm getting an output of: surface_flow.csv, restart_flow17.dat, and history.plt. This is a so-far unconverged simulation, do I need the solution converged before the flow solution (volume) will be written?

...I don't understand your point above:


Would you re-iterate the process and requirements of getting my hands on a volume flow solution? What is the syntax to call the SU2_SOL?

...I feel like I'm close, but missing something.

Thanks!
Dave


economon May 2, 2013 13:06

Hi guys,

Can you please check the memory usage during file writing? We would like to know whether it is a memory issue, i.e. there is a leak or your machine is maxing out the memory and it is causing long delays in file writing, or if there is an actual issue in the new output routines. One of the motivations for using the new SU2_SOL module is that it has much less memory overhead than the solver.

Thanks,
Tom

dtucker May 11, 2013 16:06

Hmmm...
 
Quote:

Originally Posted by economon (Post 424748)
Hi guys,

Can you please check the memory usage during file writing? We would like to know whether it is a memory issue, i.e. there is a leak or your machine is maxing out the memory and it is causing long delays in file writing, or if there is an actual issue in the new output routines. One of the motivations for using the new SU2_SOL module is that it has much less memory overhead than the solver.

Thanks,
Tom

Sorry for the delay; I've been on vacation. I'm afraid I don't know how to look into that; I've been utilizing the Kraken supercomputer in Knoxville and I'm not sure I can retrieve that info. I do know that I was splitting the job between 12 processors, does that help?

Dave

PS: This was my output after the iterations were complete:

Code:

  ------------------------- Exit Success (SU2_CFD) ------------------------

Application 5600742 resources: utime ~280145s, stime ~355s

-------------------------------------------------------------------------
|    _____  _    _  ___                                              |
|    / ____| | |  | | |__ \      Web: su2.stanford.edu                |
|  | (___  | |  | |    ) |      Twitter: @su2code                    |
|    \___ \  | |  | |  / /      Forum: www.cfd-online.com/Forums/su2/ |
|    ____) | | |__| |  / /_                                            |
|  |_____/  \____/  |____|  Suite (Solution Exporting Code)          |
|                            Release 2.0.3                            |
-------------------------------------------------------------------------

------------------------ Physical case definition -----------------------
Input mesh file name: rev8.su2

-------------------------- Output information ---------------------------
The output file format is Tecplot ASCII (.dat).
Flow variables file name: flow.

------------------- Config file boundary information --------------------
Navier-Stokes wall boundary marker(s): Navier-Stokes_Wall.
Far-field boundary marker(s): Farfield.
Symmetry plane boundary marker(s): Symmetry.
Inlet boundary marker(s): Inlet.
Outlet boundary marker(s): Outlet.

---------------------- Read grid file information -----------------------
Three dimensional problem.
6858554 interior elements (incl. halo cells). 1269928 points (incl. ghost points)
Identify vertices.

------------------------- Solution Postprocessing -----------------------
Reading and storing the solution from restart_flow18.dat.
Writing the volume solution.

------------------------- Exit Success (SU2_SOL) ------------------------

Application 5601705 resources: utime ~49326s, stime ~23s
the command: aprun -n 12 /lustre/scratch/dtucker/local/bin_power_2.0.3/SU2_CFD config_CFD.cfg
the location: /lustre/scratch/dtucker/Project/rev8/run18/15
the command: aprun -n 12 /lustre/scratch/dtucker/local/bin_power_2.0.3/SU2_SOL config_SOL.cfg
the location: /lustre/scratch/dtucker/Project/rev8/run18/15


economon May 21, 2013 03:33

Hi Dave,

One more thing you can try... Another advantage of SU2_SOL is that new solution files can always be generated on the fly using just the restart and config files. This can be done on any machine with any number of cores. It is even handy for converting between different solution file types, if need be, without rerunning expensive simulations.

For instance, if it seems like you are having memory problems on one machine due to a large mesh, you could move the restart and config files to a different machine with more GB per core (or possibly a regular workstation with a large amount of RAM) and simply run SU2_SOL (on any number of cores) to generate your solution file.

Hope this helps!
Tom

economon August 10, 2013 13:54

Hi guys,

Just an update: with V2.0.5, we made major improvements to file IO, including adjustments to the algorithms for merging solution files. You should see much better performance for large cases.

Cheers,
Tom


All times are GMT -4. The time now is 02:07.