CFD Online Discussion Forums - Virtual memory problem with parallel runs

Page 1 of 2

Show 40 post(s) from this thread on one page

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)

- OpenFOAM Running, Solving & CFD (https://www.cfd-online.com/Forums/openfoam-solving/)

- - Virtual memory problem with parallel runs (https://www.cfd-online.com/Forums/openfoam-solving/60577-virtual-memory-problem-parallel-runs.html)

Ali (Ali)

January 2, 2005 16:16

Dear friends, I ran a case

Dear friends,

I ran a case with around 300,000 cells on 32 processors without a problem. When I increased the grid points to around 1,200,000 and used 64 processors, it didn't start and gave the following errr. I don't think this number of cells is too much for 64 processors. Has anybody experienced similar error. I would appreciate if you let me know what's wrong.

The error message:
-------------------------------------
new cannot satisfy memory request.
This does not necessarily mean you have run out of virtual memory.
It could be due to a stack violation causedby e.g. bad use of pointers or an out of date shared library

Henry Weller (Henry)

January 2, 2005 19:06

It sounds like the case is ru

It sounds like the case is running on one processor, maybe 64 copies each on their own processor. FOAM uses between 1 and 2k per cell depending on the code so 1.2e6 sounds like it would fill 32bit addressing for some of the codes.

Ali (Ali)

January 2, 2005 19:40

Thanks a lot Henry, You ar

Thanks a lot Henry,

You are right. I had got such error when I wanted to run it on 1 machine, but this is a lot more processors. I'm using PBS to submit jobs randomly on 64 processos out of a larger cluster consisting of IBM Dual 3.0 GHz BladeCenter processors with over 2GB of memory each. Actually, for the smaller job (300,000 cells), the 32 parallel processors were only a little faster than when I ran it on a single machine (with a little higher memory and approximately same CPU speed). Is there any special partitioning method or other ways of improving the parallel effeciency for irregular geometries? (Now, I'm using simple decomposition method)

The thing is that I got this message twice when I submitted this job, and for the 3rd try, it worked and surprisingly it started to work on 64 processors when I had decreased the number of subdomiains from 64 to 32 in decomposeParDict and decompositionDict. It seems whatever the number of subdomains in this two dicts, it works by 64 processors and gives no error. Is it what usually happens or it should give an error or message concerning the number of subdomains is not the same as the number of processors requested?

Regards,

Henry Weller (Henry)

January 3, 2005 06:40

I am surprised the speed-up w

I am surprised the speed-up was so small, we get much more than this. What is the inter-connect speed of your machine?

There are three decomposition techniques supplied with FOAM, thry the other two and look at the decomposition statistics decomposePar prints which will give you an idea of how effective the approach is for your case.

You might also find it useful to play with

scheduledTransfer 1;
floatTransfer 0;
nProcsSimpleSum 16;

in .OpenFOAM-1.0/controlDict, in particular floatTransfer which could be set to 1 to enable the parallel transfer of data to be floats rather than doubles and possibly change scheduledTransfer and/or nProcsSimpleSum.

I don't understand why you have two decompositon dictionaries, you should have only one and of course the information in it should correspond to the decomposition you are using!

sampaio

June 9, 2005 16:27

I got the same message as Ali,

I got the same message as Ali, when trying to decompose the case (decomposePar). Should I run it with mpirun? (I will try as soon as the parallel machine where I am running comes back to live...)

Processor 2
Number of cells = 1042872
Number of faces shared with processor 1 = 260718
Number of faces shared with processor 3 = 260718
Number of boundary faces = 9928

Processor 3
Number of cells = 1042872
Number of faces shared with processor 2 = 260718
Number of faces shared with processor 0 = 260718
Number of boundary faces = 9928
new cannot satisfy memory request.
This does not necessarily mean you have run out of virtual memory.
It could be due to a stack violation caused by e.g. bad use of pointers or an out of date shared library
Aborted
[luizebs@green01 oodles]$

mattijs

June 10, 2005 05:22

decomposePar has to hold the u

decomposePar has to hold the undecomposed case and all the pieces it decomposes into. So it uses on average twice the storage the single mesh uses.

Maybe you just run out of memory? What does 'top show when you run decomposePar?

sampaio

June 10, 2005 13:48

Yeah. I did run out of memory.

Yeah. I did run out of memory.

Then, I tried to run it with lamexec (I am not sure I could, I am very inexperienced in parallel computations...):

lamexec -np 4 decomposePar . GL3 </dev/null>& logd &

Is this the right command? (I tried with mpirun first, but it looks like decomposePar is not an MPI application, is it?)

Note there are 4 "Processor 3" in the output. I just printed the last 2.

Thanks for your help,
luiz

Processor 3
Number of cells = 1042872
Number of faces shared with processor 2 = 260718
Number of faces shared with processor 0 = 260718
Number of boundary faces = 9928

Processor 3
Number of cells = 1042872
Number of faces shared with processor 2 = 260718
Number of faces shared with processor 0 = 260718
Number of boundary faces = 9928
new cannot satisfy memory request.
This does not necessarily mean you have run out of virtual memory.
It could be due to a stack violation caused by e.g. bad use of pointers or an out of date shared library
1765 (n1) exited due to signal 6
[luizebs@green01 oodles]$

mattijs

June 10, 2005 14:37

You cannot run decomposePar in

You cannot run decomposePar in parallel.

sampaio

June 10, 2005 14:54

But then, how can I run a big

But then, how can I run a big mesh in parallel that does not fit the memory requirement of an isolated node?

Does that mean that my mesh size to be run in parallel is limited by the memory requirement of a single node?

Thanks a lot,
luiz

mattijs

June 10, 2005 15:09

What normally is being done:

What normally is being done:
- have a computer with a lot of memory to do the decomposition on.
- run on smaller nodes.

Even better:
- do your mesh generation in parallel (and no, blockMesh does not run in parallel)

sampaio

June 10, 2005 16:02

thanks, Mattijs Since this is

thanks, Mattijs
Since this is the computer with higher mem I have, I have no other option but #2.

But I have no Gambit or other mesh generator on this parallel machine, which means I would have to generate a gambit mesh in other computer (single node) and convert it using some Foam utility (gambitToFoam). Question: does gambitToFoam run in parallel? Or it has the same limitation as blockMesh?

If I could not find a way to use gambit in a parellal machine, I will probably have to use decomposePar, which will again have memory problems, right?

What about this: I sequentially construct 4 (number of nodes) smaller (4times) meshes using blockMesh and manually copy each of the polyMesh dir generated into processor0-3/constant/polyMesh.
Then i change the boundary condition, trying to mimic a boundary file generated via decomposePar.

Do you think it would work?

Thanks,
Luiz

hjasak

June 10, 2005 16:07

Nope. decomposePar orders the

Nope. decomposePar orders the faces on parallel boundaries in a special way (i.e. the ordering is the same on both sides of the parallel interface). The chance of getting this right without using decomposePar are slim AND you need to knwo exactly what you're doing...

Hmm,

Hrv

sampaio

June 10, 2005 18:05

Thanks Hrvoje, In my case, I

Thanks Hrvoje,
In my case, I have z-direction homogeneous geometry, and I am planing to partition it in the z-direction as well. Does this make my chances better?
Again, this is my only possibility, since I have no way to generate a paralel mesh ready to be used my foam (in other words, without the need to first run gambitToFoam or decomposePar first).

BTW, which of Foam utilities can be ran in paralel (with either mpirun or lamexec)? gambitToFoam, for instance? renumberMesh?

Thanks a lot again,
luiz

hjasak

June 10, 2005 19:44

Well, you might have half a ch

Well, you might have half a chance but...
- you'll have to do a ton of mesh manipulation by hand because I bet the front and back planes will be numbered differently
- unless you grade the mesh (such that faces have different areas and you get a matching error), you might keep getting a running code with rubbish results
- Mattijs might have written some utilities for re-ordering parallel (cyclic?) faces, which may be re-used. (I'm sure he'll pitch in with some ideas - thanks, Mattijs) :-)

To put it straight, I have personally written the parallel mesh decomposition and reconstruction tools and I wouldn't want to be in your skin... It would be much easier to find a 64-bit machine and do the job there.

Alternatively, make thick slices for each CPU, decompose the mesh and then use mesh refiniment on each piece separately to get the desired number of layers (or something similar).

BTW, have you considered how you are going to look at results - paraFoam does not run in parallel either. Maybe some averaging in the homogenous direction or interpolation to a coarser mesh is in order.

As for utilities, call them with no arguments and (most of them) should tell you. Off the cuff, I would say that mesh maniputation tools won't work in parallel but data post-processing (apart from graphics) will.

Good luck,

Hrv

sampaio

June 10, 2005 21:55

"BTW, have you considered how

"BTW, have you considered how you are going to look at results - paraFoam does not run in parallel either. Maybe some averaging in the homogenous direction or interpolation to a coarser mesh is in order."

Yes. I ve already built a corser mesh and mapped sucessufly (from a not so refined mesh case).

Thanks a lot for your comments...
luiz

sampaio

June 13, 2005 16:47

Ok. I reduce a little bit the

Ok. I reduce a little bit the mesh, and was able to run decomposePar without problems.

But when I run the case (with mpirun) I get:
(Still looks like I have some memory problem (I mean, hopefully not me, but my simulation), doesnt it?)

[0] Case : GL2meules
[0] Nprocs : 4
[0] Slaves :
3
(
green02.5942
green03.4885
green04.4738
)

Create time

Create mesh, no clear-out for time = 150

MPI_Bsend: unclassified: No buffer space available (rank 2, MPI_COMM_WORLD)
Rank (2, MPI_COMM_WORLD): Call stack within LAM:
Rank (2, MPI_COMM_WORLD): - MPI_Bsend()
Rank (2, MPI_COMM_WORLD): - main()
-----------------------------------------------------------------------------
One of the processes started by mpirun has exited with a nonzero exit
code. This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.

PID 22401 failed on node n0 (192.168.0.1) with exit status 1.
-----------------------------------------------------------------------------
[1]+ Exit 1 mpirun -np 4 glLES . GL2meules -parallel 1>&logm
[luizebs@green01 oodles]$

Thanks a lot,
luiz

henry

June 13, 2005 16:54

What happens if you increase M

What happens if you increase MPI_BUFFER_SIZE?

sampaio

June 13, 2005 17:10

The same thing. (only rank 2 c

The same thing. (only rank 2 changed to rank 1)

What should be this value? It was 20000000.

Thanks,
luiz

green02.6630
green03.5573
green04.5426
)

Create time

Create mesh, no clear-out for time = 150

MPI_Bsend: unclassified: No buffer space available (rank 1, MPI_COMM_WORLD)
-----------------------------------------------------------------------------
One of the processes started by mpirun has exited with a nonzero exit
code. This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.

PID 23152 failed on node n0 (192.168.0.1) with exit status 1.
-----------------------------------------------------------------------------
Rank (1, MPI_COMM_WORLD): Call stack within LAM:
Rank (1, MPI_COMM_WORLD): - MPI_Bsend()
Rank (1, MPI_COMM_WORLD): - main()
[luizebs@green01 oodles]$ echo %MPI_BUFFER_SIZE
%MPI_BUFFER_SIZE
[1]+ Exit 1 mpirun -np 4 glLES . GL2meules -parallel </dev/null>&logm
[luizebs@green01 oodles]$

mattijs

June 14, 2005 05:52

Too hard to calculate. Just

Too hard to calculate.

Just double it (and make sure to 'lamwipe' and 'lamboot' so the new settings are known by lamd) and try again. Keep on doing until you don't get this message.

sampaio

June 15, 2005 16:01

Thanks Mattijs, It is working

Thanks Mattijs,
It is working now.

But what would be the consequences of an unecessary higher value of this buffer size?

Where can I learn more about all these things (mostly linux and running linux in parallel)? I feel so week (I only later found out that I should put my export MPI_BUFFER_SIZE=xxxxx in my bashrc, but I am not even sure why... I suspect it has to do with exporting to all nodes instead of just the current one...)

Could you provide some pointers (linux and parallel stuff)? Books, online tutorials, etc...

I really feel the need to know better what is happening, but i dont know how to start...

thanks,
luiz

All times are GMT -4. The time now is 06:38.

Page 1 of 2

Show 40 post(s) from this thread on one page