CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM > OpenFOAM Programming & Development

swap vs RAM

Register Blogs Members List Search Today's Posts Mark Forums Read

Like Tree1Likes
  • 1 Post By akidess

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   February 8, 2016, 12:11
Default swap vs RAM
  #1
Member
 
Jo Mar
Join Date: Jun 2015
Posts: 54
Rep Power: 10
KingKraut is on a distinguished road
Dear all,

I have the following problem with the simulation of a case on a server. The server possesses 32 cores and 8*64GB = 512GB RAM, however only 4 GB swap (if I see it correctly at the output of the "top" command in a terminal.

At a certain point the application keeps breaking off with a segmentation fault in OF211 as well as OF222, so I guess it is not a OF-version specific problem. Moreover, I ran the case on several grids (0.5 Mio, 1.7 Mio and 4.5 Mio cells with of different complexity). The smaller grids compute fine, however the large grid breaks off with the segmentation fault. So I am pretty sure, that it is an issue with the allocated memory. Furthermore, since the code is running on the smaller grids, the solver code should be fine, too... Apart from that the large configuration is working fine on a different cluster.

Searching for the problem here and in other forums, I realized, that the allocated memory as well as the available swap could be possible reasons for this. Since we possess of 512 GB RAM I would prefer not to increase the swap partition in the running system, but rather make the solver and openFOAM use the disposable RAM.
In the logfile I can even see, that for tthe medium-sized case, the computation is only using ~8GB of memory and no swap at all. However, the simulation on the largest case does not compute. It breaks off before even the first time-step is computed.
Since the code is running on the smaller grids, the solver code should be fine, too... Apart from that the same large case is working fine on a different cluster. So I exclude this to be the reason for the problem.

My guess is, that I should be allocating more RAM to the process somehow. However, I could not find any command or option, which I can set within openFOAM for this. Can anybody tell me if I am going in the right direction? Or even better a solution for this issue? At the moment I am pretty much stuck on this.

All help is highly appreciated. Thanks to anyone looking into this!

Best regards
KingKraut

Edit: I am a little confused, that OpenFOAM does not automatically use the available RAM, which should be sufficient to compute the problem... Maybe it needs some "help" finding it?
KingKraut is offline   Reply With Quote

Old   February 9, 2016, 03:18
Default
  #2
Senior Member
 
akidess's Avatar
 
Anton Kidess
Join Date: May 2009
Location: Germany
Posts: 1,377
Rep Power: 29
akidess will become famous soon enough
1. Typically a segmentation fault is not a memory limit issue.
2. If you are sure it is, your HPC queue manager (PBS, SGI, whatever) can impose much stricter memory limits.
__________________
*On twitter @akidTwit
*Spend as much time formulating your questions as you expect people to spend on their answer.
akidess is offline   Reply With Quote

Old   February 9, 2016, 03:38
Default
  #3
Member
 
Jo Mar
Join Date: Jun 2015
Posts: 54
Rep Power: 10
KingKraut is on a distinguished road
Dear akidess,
thanks a lot for your reply.

1: From my search on the internet I suspected it might a memory limit issue. Moreover, the fact, that it works fine on a smaller grid made this assumption plausible to me. Apart from that, the fact, that the combination of large grid with the identical solver works on a larger HPC cluster made me think, that the available memory is the problem.
What else can be the reason for this segmentation fault then? I heard of the option to recompile OpenFOAM in debug mode. Until now I hoped I could try a different option, if maybe it was a common problem...

2: On the server, on which the problem occurrs, no queue manager is installed, so I don't think that this comes from there. However, does OpenFOAM give the option to tell how much RAM to use? I did not find this or know about this yet, but I would like to make sure, that I am not missing out something simple...

Thanks again for your help!

Best
Kraut
KingKraut is offline   Reply With Quote

Old   February 9, 2016, 04:16
Default
  #4
Senior Member
 
akidess's Avatar
 
Anton Kidess
Join Date: May 2009
Location: Germany
Posts: 1,377
Rep Power: 29
akidess will become famous soon enough
Yeah, I guess you could do that. You might want to try posting the log file and error message here first though OpenFOAM does not have a memory limiter as far as I am aware. Also 4.5M cells is by no means a very large mesh nowadays.
__________________
*On twitter @akidTwit
*Spend as much time formulating your questions as you expect people to spend on their answer.
akidess is offline   Reply With Quote

Old   February 9, 2016, 04:45
Default
  #5
Member
 
Jo Mar
Join Date: Jun 2015
Posts: 54
Rep Power: 10
KingKraut is on a distinguished road
Dear akidess,

the error message really is very short. Run in serial, it only shows "segmentation fault" in the terminal. Please see the attached complete logfile log_pisoResistanceFoam_serial.

In parallel (16 cores) it gives the attached output log_pisoResistanceFoam_parallel. Apparently the application manages to proceed a little further, it gives some more of the custom solver's output, however, it crashes again with a segmentation fault (same with 32 cores). In the following the last few lines, which are written more in comparison to the output of the serial run:

Quote:
pulse duration: 1
time step: 0.0001
nTimeStepsInPulse: 10000
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 17405 on node Wzmbx001 exited on signal 11 (Segmentation fault).
-------------------------------------------------------------------------
Does this output help?

What really irritates me, is the fact, that everything works fine on the smaller meshes, and also the configuration with the larger mesh does not create any problems on a different cluster. But here on our "little" server, the larger mesh crashes like this. This was also the reason that I thought this is an issue of RAM, since I see no difference in the cases, other than the mesh size...

Thanks a lot again for your help and time!

Best
Kraut
Attached Files
File Type: txt log_pisoResistanceFoam_serial.txt (12.9 KB, 5 views)
File Type: txt log_pisoResistanceFoam_parallel.txt (13.9 KB, 2 views)
KingKraut is offline   Reply With Quote

Old   February 9, 2016, 08:13
Default
  #6
Senior Member
 
akidess's Avatar
 
Anton Kidess
Join Date: May 2009
Location: Germany
Posts: 1,377
Rep Power: 29
akidess will become famous soon enough
I was expecting a stack trace. If you are running the tool in the background you will need to redirect stderr along with stdout. By the way, you should have mentioned you are using a custom application. It may be that some of your code is bogus.
__________________
*On twitter @akidTwit
*Spend as much time formulating your questions as you expect people to spend on their answer.
akidess is offline   Reply With Quote

Old   February 9, 2016, 08:44
Default
  #7
Member
 
Jo Mar
Join Date: Jun 2015
Posts: 54
Rep Power: 10
KingKraut is on a distinguished road
Hi akidess,

I am very sorry, that I did not mention, the solver was custom! Of course I should have written that, but must have forgotten! Please excuse!

I tried to redirect stderr to stdout by adding 2>&1 at the end of the command for the execution of the solver code like this:

Quote:
pisoResistanceFoam > log_pisoResistanceFoam_serial 2>&1
However, this does not change the output. Do I have to do this somewaht different in OpenFOAM?
I will go and have a look, how to obtain a more detailed stack trace. I have not yet recompiled OpenFOAM in debug mode. Is this necessary to receive the appropriate stack trace?

Apart from that, I don't think, that the custom solver code is faulty, because it does work without problems on the smaller cases. Furthermore, on another platform it worked and works fine on the same grid, where here it keeps failing...

Thank you very much again for all your replies so far! I guess some of my problems and questions might be rather inexperienced - but I am indeed a beginner with OpenFOAM.
So many thanks for your help!

Best
Kraut
KingKraut is offline   Reply With Quote

Old   February 9, 2016, 09:47
Default
  #8
Senior Member
 
Olivier
Join Date: Jun 2009
Location: France, grenoble
Posts: 272
Rep Power: 17
olivierG is on a distinguished road
Hello,
As Anton say, i don't think this is a memory limit. Usually, a 4-5 M cells use 4-10 Go RAM which is far lower than the 64 Go you have (or even 512 ?).
Just check the memory usage and you will see.

But your trouble may come from the processor boundary, and the way you use partitioning: which method ? simple ? scotch, metis, ... try to change the way you split the mesh.
And since you use a custom solver, this could really be the bogus part. (missing a gsum, ...)

regards,
olivier
olivierG is offline   Reply With Quote

Old   February 9, 2016, 10:05
Default
  #9
Member
 
Jo Mar
Join Date: Jun 2015
Posts: 54
Rep Power: 10
KingKraut is on a distinguished road
Hallo olivierG,

thanks a lot for the reply.
I checked the memory usage and as you said, the same solver code is only using ~5GB on a smaller case (~1.7Mio cells)

For the decomposition of the larger grid, I use the method scotch and this works fine on a different cluster with the custom solver code. The decomposition of the grid does not have an effect on the execution of solvers, that come with OpenFOAM, such as icoFoam or pisoFoam.
Yesterday I added the preservePatches option in the decomposition, because there are some cyclicAMI patches in the geometry. But this did not have an effect on the problem, either...

Thanks again for your help!

Best
Johannes
KingKraut is offline   Reply With Quote

Old   March 8, 2016, 13:15
Default
  #10
Member
 
Jo Mar
Join Date: Jun 2015
Posts: 54
Rep Power: 10
KingKraut is on a distinguished road
Hey all,

I finally found the reason for the segmentation fault!

It lies in the custom solver code, however, the code itself is not faulty by itself! I will try and explain the situation, since I know the problem now, but not the solution...

The following is the line in the code, which causes the problem:

Quote:
scalar outFluxArray[nOutlets][nCycles][nTimeStepsInPulse];
the scalar is created to store outflow values of the solver code at the outlets (nOutlets) of the geometry in order to check for periodicity of the problem for a maximum number of nCycles and check for a termination condition. This is then evaluated for every timeStep in the Pulse (nTimeStepsInPulse).

By trying out different combinations I found out, that the solver does not produce a segmentation, if the number of entries in the outFluxArray does not exceed a certain number (depending on the size of the computer).

On my virtual machine with 4 cores it breaks down with 12 outlets, 10000 timeSteps and 10 nCycles (1.2Mio DOF in the outFluxArray). On a larger server (computation on 24 cores) it works fine up to an outFluxArray with less than 1.5 Mio entries.
On a larger HPC cluster parallelized on 128 cores, the system worked with nOutlets=43, nCycles=10 and nTimeStepsInPulse=10000. I will try a larger configuration as well. However I guess, at some point this will also break down there...

I guess there is no easy solution to this, other than changing the whole solver code in this regard to check for periodicity (the variable is used at various places...). However, in the small hope, that there is a different alternative within OpenFOAM to define this outFluxArray as above, and allow it to have this many entries I post this here. Maybe a different definition for this outFluxArray other than scalar can be given? I tried to look into scalar.H within OpenFOAM. But I think I am a little of the track there, looking for a solution... At least, I am glad I found the problem.

By the way, I am using OpenFOAM version 2.2.2 and 2.1.1. The mistake occurrs in both versions. Maybe this is an issue solved in newer versions?

So thanks to everybody taking the time reading and looking into this! I highly appreciate, even though I am not very optimistic, that this can be solved as easily as I might wish. Thanks again.

Best regrads
Johannes


PS: I hope my explanations were clear enough!!
KingKraut is offline   Reply With Quote

Old   March 9, 2016, 04:09
Default
  #11
Senior Member
 
akidess's Avatar
 
Anton Kidess
Join Date: May 2009
Location: Germany
Posts: 1,377
Rep Power: 29
akidess will become famous soon enough
I'm a little rusty, but I faintly remember you do not want to allocate large arrays on the stack. Move it to the heap:
Code:
scalar* outFluxArray = new scalar[nOutlets*nCycles*nTimeStepsInPulse];
KingKraut likes this.
__________________
*On twitter @akidTwit
*Spend as much time formulating your questions as you expect people to spend on their answer.
akidess is offline   Reply With Quote

Old   March 11, 2016, 04:35
Default
  #12
Member
 
Jo Mar
Join Date: Jun 2015
Posts: 54
Rep Power: 10
KingKraut is on a distinguished road
Dear akidess,

thanks a lot for the suggestion!
I replaced the line (and the other invocations of the variable) in the code like you proposed and the code does not produce the segmentation fault anymore!
Now I only have to figure out how I have to change the different loops in the code referring to the variable that it actually gives the correct values... But I should manage that.
Thank you very much again for your help!

Best
Johannes
KingKraut is offline   Reply With Quote

Old   March 11, 2016, 11:10
Default
  #13
Member
 
Jo Mar
Join Date: Jun 2015
Posts: 54
Rep Power: 10
KingKraut is on a distinguished road
The adaptation of the solver code to this new type of variable was simple enough. However, the alteration decreases the performance of the solver code noticeably (factor 4/3 per computed timestep on 24 cores), which was expected from what I understand from the different types of memory, heap and stack.
I guess I will have to look into the possibilities of assigning more memory to stack or look for other possibilities to improve the solver code in this regard...

Thanks a lot again for the help!
KingKraut is offline   Reply With Quote

Old   March 11, 2016, 11:46
Default
  #14
Senior Member
 
akidess's Avatar
 
Anton Kidess
Join Date: May 2009
Location: Germany
Posts: 1,377
Rep Power: 29
akidess will become famous soon enough
Did you place the "new" command inside a loop? Then you've got yourself a text-book memory leak. The access performance should be near identical.
__________________
*On twitter @akidTwit
*Spend as much time formulating your questions as you expect people to spend on their answer.
akidess is offline   Reply With Quote

Old   March 11, 2016, 11:55
Default
  #15
Member
 
Jo Mar
Join Date: Jun 2015
Posts: 54
Rep Power: 10
KingKraut is on a distinguished road
I have to admit, I might have been a bit fast with my estimation.
The performance seemed to start slower for the first ~20 minutes, but leveled off at a similar computation time per timestep, as for the former version
Sorry about that. Thanks anyway for your fast reply!!!

I didn't put the initialization of the variable in a loop, so it still works all nicely, apart from this slower start. But I guess, this could also be form of the day of the server! =)

Best
Johannes
KingKraut is offline   Reply With Quote

Old   July 19, 2016, 04:39
Default
  #16
Member
 
Jo Mar
Join Date: Jun 2015
Posts: 54
Rep Power: 10
KingKraut is on a distinguished road
Dear all,

with the solution found back then a new problem occurred a couple of weeks ago. The solver with the array initialized by
Quote:
scalar* outFluxArray = new scalar[nOutlets*nCycles*nTimeStepsInPulse];
worked fine on most grids up to certain grid numbers ( ~7 Mio) however after that the solutions broke off after a couple of timesteps with a memory corruption:
Quote:
*** glibc detected *** *** glibc detected *** /zhome/academic/HLRS/xwu/xwujomar/OpenFOAM/xwujomar-2.2.2-gnu/platforms/crayxeGccDPOpt/bin/newFoam/zhome/academic/HLRS/xwu/xwujomar/OpenFOAM/xwujomar-2.2.2-gnu/platforms/crayxeGccDPOpt/bin/newFoam: : munmap_chunk(): invalid pointermunmap_chunk(): invalid pointer: 0x: 0x00000100039576e0000001000395eb80 ***
***
*** glibc detected *** /zhome/academic/HLRS/xwu/xwujomar/OpenFOAM/xwujomar-2.2.2-gnu/platforms/crayxeGccDPOpt/bin/newFoam: malloc(): memory corruption: 0x0000010003c649f0 ***
*** glibc detected *** /zhome/academic/HLRS/xwu/xwujomar/OpenFOAM/xwujomar-2.2.2-gnu/platforms/crayxeGccDPOpt/bin/newFoam: munmap_chunk(): invalid pointer: 0x0000010003a8f000 ***
*** glibc detected *** /zhome/academic/HLRS/xwu/xwujomar/OpenFOAM/xwujomar-2.2.2-gnu/platforms/crayxeGccDPOpt/bin/newFoam: malloc(): memory corruption: 0x0000010003984a50 ***
or alternatively with a corrupted double-linked list:
Quote:
*** glibc detected *** /zhome/academic/HLRS/xwu/xwujomar/OpenFOAM/xwujomar-2.2.2-gnu/platforms/crayxeGccDPOpt/bin/heapFoam: corrupted double-linked list: 0x00000100047df950 ***
Since the machine where I am computing this is big enough to handle a large array of the form
Quote:
scalar outFluxArray[nOutlets][nCycles][nTimeStepsInPulse];
with more than 10 Mio entries in the stack (at least the segmentation fault because of which I opened this thread does not occurr again), I went back to this previous form.

So for now, this problem I could solve with this former solver. However, I don't understand why the solver with this variable stored in the heap produces a memory corruption? How come like this memory is tried to be written or deleted, where it is not supposed to be? And why does this only happen at a later point in the computations? Before the error occurrs ~100 timesteps are computed without any problems. Furthermore, I did not experience any problems with the variable stored in the heap on smaller grids?

I guess the only way to find this out, will be proper debugging, and I hope I find the time to do this. But has anyone experienced a similar problem (the memory corruption with a custom solver after a few timesteps on certain mesh sizes) and could shed some light on this?

Thanks to all of you reading and looking into this! All help is highly appreciated. As soon as I have any news, of course I will post this here!

Best regards and thanks a lot!
Johannes
KingKraut is offline   Reply With Quote

Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
RAM and Swap Space and HD space Anindya Siemens 2 May 25, 2016 22:40
mother board and ram amount. zero_custom Hardware 4 January 4, 2016 17:27
Solver uses swap / to less RAM / large grid matmax-168 CFX 1 April 15, 2015 07:25
Swap memory usage and free RAM cfd_level0 STAR-CCM+ 3 January 5, 2013 17:52
RAM and Swap size ANINDYA Main CFD Forum 1 March 9, 2001 03:53


All times are GMT -4. The time now is 13:06.