|
[Sponsors] |
February 8, 2016, 12:11 |
swap vs RAM
|
#1 |
Member
Jo Mar
Join Date: Jun 2015
Posts: 54
Rep Power: 10 |
Dear all,
I have the following problem with the simulation of a case on a server. The server possesses 32 cores and 8*64GB = 512GB RAM, however only 4 GB swap (if I see it correctly at the output of the "top" command in a terminal. At a certain point the application keeps breaking off with a segmentation fault in OF211 as well as OF222, so I guess it is not a OF-version specific problem. Moreover, I ran the case on several grids (0.5 Mio, 1.7 Mio and 4.5 Mio cells with of different complexity). The smaller grids compute fine, however the large grid breaks off with the segmentation fault. So I am pretty sure, that it is an issue with the allocated memory. Furthermore, since the code is running on the smaller grids, the solver code should be fine, too... Apart from that the large configuration is working fine on a different cluster. Searching for the problem here and in other forums, I realized, that the allocated memory as well as the available swap could be possible reasons for this. Since we possess of 512 GB RAM I would prefer not to increase the swap partition in the running system, but rather make the solver and openFOAM use the disposable RAM. In the logfile I can even see, that for tthe medium-sized case, the computation is only using ~8GB of memory and no swap at all. However, the simulation on the largest case does not compute. It breaks off before even the first time-step is computed. Since the code is running on the smaller grids, the solver code should be fine, too... Apart from that the same large case is working fine on a different cluster. So I exclude this to be the reason for the problem. My guess is, that I should be allocating more RAM to the process somehow. However, I could not find any command or option, which I can set within openFOAM for this. Can anybody tell me if I am going in the right direction? Or even better a solution for this issue? At the moment I am pretty much stuck on this. All help is highly appreciated. Thanks to anyone looking into this! Best regards KingKraut Edit: I am a little confused, that OpenFOAM does not automatically use the available RAM, which should be sufficient to compute the problem... Maybe it needs some "help" finding it? |
|
February 9, 2016, 03:18 |
|
#2 |
Senior Member
Anton Kidess
Join Date: May 2009
Location: Germany
Posts: 1,377
Rep Power: 29 |
1. Typically a segmentation fault is not a memory limit issue.
2. If you are sure it is, your HPC queue manager (PBS, SGI, whatever) can impose much stricter memory limits.
__________________
*On twitter @akidTwit *Spend as much time formulating your questions as you expect people to spend on their answer. |
|
February 9, 2016, 03:38 |
|
#3 |
Member
Jo Mar
Join Date: Jun 2015
Posts: 54
Rep Power: 10 |
Dear akidess,
thanks a lot for your reply. 1: From my search on the internet I suspected it might a memory limit issue. Moreover, the fact, that it works fine on a smaller grid made this assumption plausible to me. Apart from that, the fact, that the combination of large grid with the identical solver works on a larger HPC cluster made me think, that the available memory is the problem. What else can be the reason for this segmentation fault then? I heard of the option to recompile OpenFOAM in debug mode. Until now I hoped I could try a different option, if maybe it was a common problem... 2: On the server, on which the problem occurrs, no queue manager is installed, so I don't think that this comes from there. However, does OpenFOAM give the option to tell how much RAM to use? I did not find this or know about this yet, but I would like to make sure, that I am not missing out something simple... Thanks again for your help! Best Kraut |
|
February 9, 2016, 04:16 |
|
#4 |
Senior Member
Anton Kidess
Join Date: May 2009
Location: Germany
Posts: 1,377
Rep Power: 29 |
Yeah, I guess you could do that. You might want to try posting the log file and error message here first though OpenFOAM does not have a memory limiter as far as I am aware. Also 4.5M cells is by no means a very large mesh nowadays.
__________________
*On twitter @akidTwit *Spend as much time formulating your questions as you expect people to spend on their answer. |
|
February 9, 2016, 04:45 |
|
#5 | |
Member
Jo Mar
Join Date: Jun 2015
Posts: 54
Rep Power: 10 |
Dear akidess,
the error message really is very short. Run in serial, it only shows "segmentation fault" in the terminal. Please see the attached complete logfile log_pisoResistanceFoam_serial. In parallel (16 cores) it gives the attached output log_pisoResistanceFoam_parallel. Apparently the application manages to proceed a little further, it gives some more of the custom solver's output, however, it crashes again with a segmentation fault (same with 32 cores). In the following the last few lines, which are written more in comparison to the output of the serial run: Quote:
What really irritates me, is the fact, that everything works fine on the smaller meshes, and also the configuration with the larger mesh does not create any problems on a different cluster. But here on our "little" server, the larger mesh crashes like this. This was also the reason that I thought this is an issue of RAM, since I see no difference in the cases, other than the mesh size... Thanks a lot again for your help and time! Best Kraut |
||
February 9, 2016, 08:13 |
|
#6 |
Senior Member
Anton Kidess
Join Date: May 2009
Location: Germany
Posts: 1,377
Rep Power: 29 |
I was expecting a stack trace. If you are running the tool in the background you will need to redirect stderr along with stdout. By the way, you should have mentioned you are using a custom application. It may be that some of your code is bogus.
__________________
*On twitter @akidTwit *Spend as much time formulating your questions as you expect people to spend on their answer. |
|
February 9, 2016, 08:44 |
|
#7 | |
Member
Jo Mar
Join Date: Jun 2015
Posts: 54
Rep Power: 10 |
Hi akidess,
I am very sorry, that I did not mention, the solver was custom! Of course I should have written that, but must have forgotten! Please excuse! I tried to redirect stderr to stdout by adding 2>&1 at the end of the command for the execution of the solver code like this: Quote:
I will go and have a look, how to obtain a more detailed stack trace. I have not yet recompiled OpenFOAM in debug mode. Is this necessary to receive the appropriate stack trace? Apart from that, I don't think, that the custom solver code is faulty, because it does work without problems on the smaller cases. Furthermore, on another platform it worked and works fine on the same grid, where here it keeps failing... Thank you very much again for all your replies so far! I guess some of my problems and questions might be rather inexperienced - but I am indeed a beginner with OpenFOAM. So many thanks for your help! Best Kraut |
||
February 9, 2016, 09:47 |
|
#8 |
Senior Member
Olivier
Join Date: Jun 2009
Location: France, grenoble
Posts: 272
Rep Power: 17 |
Hello,
As Anton say, i don't think this is a memory limit. Usually, a 4-5 M cells use 4-10 Go RAM which is far lower than the 64 Go you have (or even 512 ?). Just check the memory usage and you will see. But your trouble may come from the processor boundary, and the way you use partitioning: which method ? simple ? scotch, metis, ... try to change the way you split the mesh. And since you use a custom solver, this could really be the bogus part. (missing a gsum, ...) regards, olivier |
|
February 9, 2016, 10:05 |
|
#9 |
Member
Jo Mar
Join Date: Jun 2015
Posts: 54
Rep Power: 10 |
Hallo olivierG,
thanks a lot for the reply. I checked the memory usage and as you said, the same solver code is only using ~5GB on a smaller case (~1.7Mio cells) For the decomposition of the larger grid, I use the method scotch and this works fine on a different cluster with the custom solver code. The decomposition of the grid does not have an effect on the execution of solvers, that come with OpenFOAM, such as icoFoam or pisoFoam. Yesterday I added the preservePatches option in the decomposition, because there are some cyclicAMI patches in the geometry. But this did not have an effect on the problem, either... Thanks again for your help! Best Johannes |
|
March 8, 2016, 13:15 |
|
#10 | |
Member
Jo Mar
Join Date: Jun 2015
Posts: 54
Rep Power: 10 |
Hey all,
I finally found the reason for the segmentation fault! It lies in the custom solver code, however, the code itself is not faulty by itself! I will try and explain the situation, since I know the problem now, but not the solution... The following is the line in the code, which causes the problem: Quote:
By trying out different combinations I found out, that the solver does not produce a segmentation, if the number of entries in the outFluxArray does not exceed a certain number (depending on the size of the computer). On my virtual machine with 4 cores it breaks down with 12 outlets, 10000 timeSteps and 10 nCycles (1.2Mio DOF in the outFluxArray). On a larger server (computation on 24 cores) it works fine up to an outFluxArray with less than 1.5 Mio entries. On a larger HPC cluster parallelized on 128 cores, the system worked with nOutlets=43, nCycles=10 and nTimeStepsInPulse=10000. I will try a larger configuration as well. However I guess, at some point this will also break down there... I guess there is no easy solution to this, other than changing the whole solver code in this regard to check for periodicity (the variable is used at various places...). However, in the small hope, that there is a different alternative within OpenFOAM to define this outFluxArray as above, and allow it to have this many entries I post this here. Maybe a different definition for this outFluxArray other than scalar can be given? I tried to look into scalar.H within OpenFOAM. But I think I am a little of the track there, looking for a solution... At least, I am glad I found the problem. By the way, I am using OpenFOAM version 2.2.2 and 2.1.1. The mistake occurrs in both versions. Maybe this is an issue solved in newer versions? So thanks to everybody taking the time reading and looking into this! I highly appreciate, even though I am not very optimistic, that this can be solved as easily as I might wish. Thanks again. Best regrads Johannes PS: I hope my explanations were clear enough!! |
||
March 9, 2016, 04:09 |
|
#11 |
Senior Member
Anton Kidess
Join Date: May 2009
Location: Germany
Posts: 1,377
Rep Power: 29 |
I'm a little rusty, but I faintly remember you do not want to allocate large arrays on the stack. Move it to the heap:
Code:
scalar* outFluxArray = new scalar[nOutlets*nCycles*nTimeStepsInPulse];
__________________
*On twitter @akidTwit *Spend as much time formulating your questions as you expect people to spend on their answer. |
|
March 11, 2016, 04:35 |
|
#12 |
Member
Jo Mar
Join Date: Jun 2015
Posts: 54
Rep Power: 10 |
Dear akidess,
thanks a lot for the suggestion! I replaced the line (and the other invocations of the variable) in the code like you proposed and the code does not produce the segmentation fault anymore! Now I only have to figure out how I have to change the different loops in the code referring to the variable that it actually gives the correct values... But I should manage that. Thank you very much again for your help! Best Johannes |
|
March 11, 2016, 11:10 |
|
#13 |
Member
Jo Mar
Join Date: Jun 2015
Posts: 54
Rep Power: 10 |
The adaptation of the solver code to this new type of variable was simple enough. However, the alteration decreases the performance of the solver code noticeably (factor 4/3 per computed timestep on 24 cores), which was expected from what I understand from the different types of memory, heap and stack.
I guess I will have to look into the possibilities of assigning more memory to stack or look for other possibilities to improve the solver code in this regard... Thanks a lot again for the help! |
|
March 11, 2016, 11:46 |
|
#14 |
Senior Member
Anton Kidess
Join Date: May 2009
Location: Germany
Posts: 1,377
Rep Power: 29 |
Did you place the "new" command inside a loop? Then you've got yourself a text-book memory leak. The access performance should be near identical.
__________________
*On twitter @akidTwit *Spend as much time formulating your questions as you expect people to spend on their answer. |
|
March 11, 2016, 11:55 |
|
#15 |
Member
Jo Mar
Join Date: Jun 2015
Posts: 54
Rep Power: 10 |
I have to admit, I might have been a bit fast with my estimation.
The performance seemed to start slower for the first ~20 minutes, but leveled off at a similar computation time per timestep, as for the former version Sorry about that. Thanks anyway for your fast reply!!! I didn't put the initialization of the variable in a loop, so it still works all nicely, apart from this slower start. But I guess, this could also be form of the day of the server! =) Best Johannes |
|
July 19, 2016, 04:39 |
|
#16 | ||||
Member
Jo Mar
Join Date: Jun 2015
Posts: 54
Rep Power: 10 |
Dear all,
with the solution found back then a new problem occurred a couple of weeks ago. The solver with the array initialized by Quote:
Quote:
Quote:
Quote:
So for now, this problem I could solve with this former solver. However, I don't understand why the solver with this variable stored in the heap produces a memory corruption? How come like this memory is tried to be written or deleted, where it is not supposed to be? And why does this only happen at a later point in the computations? Before the error occurrs ~100 timesteps are computed without any problems. Furthermore, I did not experience any problems with the variable stored in the heap on smaller grids? I guess the only way to find this out, will be proper debugging, and I hope I find the time to do this. But has anyone experienced a similar problem (the memory corruption with a custom solver after a few timesteps on certain mesh sizes) and could shed some light on this? Thanks to all of you reading and looking into this! All help is highly appreciated. As soon as I have any news, of course I will post this here! Best regards and thanks a lot! Johannes |
|||||
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
RAM and Swap Space and HD space | Anindya | Siemens | 2 | May 25, 2016 22:40 |
mother board and ram amount. | zero_custom | Hardware | 4 | January 4, 2016 17:27 |
Solver uses swap / to less RAM / large grid | matmax-168 | CFX | 1 | April 15, 2015 07:25 |
Swap memory usage and free RAM | cfd_level0 | STAR-CCM+ | 3 | January 5, 2013 17:52 |
RAM and Swap size | ANINDYA | Main CFD Forum | 1 | March 9, 2001 03:53 |