CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   OpenFOAM Programming & Development (https://www.cfd-online.com/Forums/openfoam-programming-development/)
-   -   MPI & shared memory (https://www.cfd-online.com/Forums/openfoam-programming-development/213112-mpi-shared-memory.html)

usv001 December 15, 2018 03:57

MPI & shared memory
 
Dear Foamers,

I would like to know if the following is possible:

Say that I am running a case in parallel. Assuming that all the cores are within the same node, is it possible to declare a shared memory in the heap that is visible to all the cores. Specifically, if each processor creates a field as shown below,

Code:

scalarField* fieldPtr(new scalarField(n));
Can one core access the field created by another core using the pointer address?

Has anyone implemented something like this before? If so, how to go about doing it?

USV

olesen December 30, 2018 06:30

Currently no DMA or RDMA wrapping in OpenFOAM. You will have create your own MPI communicators, access windows etc.

usv001 December 31, 2018 00:47

MPI/OpenMP Hybrid Programming in OpenFOAM
 
Thank you Mark.

After a little scouring of the Internet, I came to the same conclusion. However, there is a simple but limited solution which is to use OpenMP. Since I created my own schemes and solver, I was able to incorporate quite a bit of OpenMP parallelism into the code. For those trying to use existing solvers/schemes, unfortunately, this won't help you too much unless you re-write the schemes with OpenMP pragmas.

To compile with OpenMP, include the 'fopenmp' flag in the file '$WM_PROJECT_DIR/wmake/rules/linux64Gcc/c++Opt'. So, it should read like this:
Code:

$:cat $WM_PROJECT_DIR/wmake/rules/linux64Gcc/c++Opt
c++DBUG    =
c++OPT      = -O2 -fopenmp

Remember to source the etc/bashrc file before compiling.

In your solver/scheme, you may need to include "omp.h" for the pragmas to work. After this, you're pretty much set. You can parallelize loops as follows:

Code:

#pragma omp parallel for
forAll(neighbour, celli)
{
    ...
}

When running in MPI/OpenMP mode, I decomposed the domain into the number of NUMA nodes that I am using rather than the total number of cores. After that, I set the number of OMP threads to the number of cores present in each NUMA node using the environment variable 'OMP_NUM_THREADS'. For instance, let's say that there are 4 NUMA nodes in each socket and each NUMA node consists of 6 cores (i.e. 24 cores per socket). If I wish to use 2 sockets in total, I can decompose the domain into 8 sub-domains and run the solver as follows:

Code:

export OMP_NUM_THREADS=6
mpirun -np 8 --map-by ppr:1:numa:pe=6 solver -parallel

This tells that I would like to start 8 MPI processes (same as the number of sub-domains) and each process should be given to 1 NUMA node with 6 cores/threads allocated to each process. Inside each NUMA node, OpenMP can then parallelize over the 6 cores/threads available.

A word of caution though. This may not run any faster (it fact, it ran much slower in many cases) unless a significant portion of the code (i.e. the heavy duty loops) is parallelized and the OpenMP overhead is kept small. Usually, the benefits start showing at higher core counts when MPI traffic starts to dominate. In other cases, I think the bulit-in MPI alone is more efficient.

Lastly, I am no expert in these areas. Just an amateur. So, there could be other things that I am missing and better ways to go about doing it. So, feel free to correct my mistakes and suggest better ways to go about it...

Cheers,
USV

olesen December 31, 2018 05:06

Quote:

Originally Posted by usv001 (Post 720416)
Thank you Mark.

After a little scouring of the Internet, I came to the same conclusion. However, there is a simple but limited solution which is to use OpenMP.
...
To compile with OpenMP, include the 'fopenmp' flag in the file '$WM_PROJECT_DIR/wmake/rules/linux64Gcc/c++Opt'. So, it should read like this:
Code:

$:cat $WM_PROJECT_DIR/wmake/rules/linux64Gcc/c++Opt
c++DBUG    =
c++OPT      = -O2 -fopenmp


The preferred method is to use the COMP_OPENMP and LINK_OPENMP definitions instead (in your Make/options file) and do NOT touch the wmake rules. Apart from less editing, easier upgrading etc, these are also defined for clang and Intel as well as gcc.
Take a look at the cfmesh integration for examples of using these defines, as well as various openmp directives.

Note that it is also good practice (I think) to guard your openmp pragmas with ifdef/endif so that you can rapidly enable/disable these. Sometimes debugging mpi + openmp can be rather "challenging".

olesen December 31, 2018 05:15

Quote:

Originally Posted by usv001 (Post 720416)
A word of caution though. This may not run any faster (it fact, it ran much slower in many cases) unless a significant portion of the code (i.e. the heavy duty loops) is parallelized and the OpenMP overhead is kept small.

Memory bandwidth affects many codes (not just OpenFOAM). You should give this a read :
https://www.ixpug.org/images/docs/IX...g-OpenFOAM.pdf

zhangyan December 31, 2018 20:02

Hi
I'm also interested in this issue.
I want to ask is there any possible to create a shared class whose member variables cost much memory.

PS: For openMP in OpenFOAM, I've found a github repository.

usv001 January 2, 2019 09:00

Hello Mark,

Quote:

Originally Posted by olesen (Post 720425)
The preferred method is to use the COMP_OPENMP and LINK_OPENMP definitions instead (in your Make/options file) and do NOT touch the wmake rules. Apart from less editing, easier upgrading etc, these are also defined for clang and Intel as well as gcc.
Take a look at the cfmesh integration for examples of using these defines, as well as various openmp directives.

That looks interesting. I tried look to for them but couldn't find anything relevant. Could you please post an example of how the Make/options file should look like?

By the way, when OpenMP is not linked, the relevant pragmas are ignored by the compiler. This happens in both GCC and ICC. I don't use Clang though. So, I guess there is no need for guards.

Quote:

Originally Posted by olesen (Post 720426)
Memory bandwidth affects many codes (not just OpenFOAM). You should give this a read :
https://www.ixpug.org/images/docs/IX...g-OpenFOAM.pdf

I agree with you completely. I have been doing some preliminary profiling of my code and memory accesses are taking up nearly 80% of the computation time! Clearly, OpenFOAM would do better with more vectorization.

USV

olesen January 2, 2019 10:05

Quote:

Originally Posted by usv001 (Post 720596)
Hello Mark,

That looks interesting. I tried look to for them but couldn't find anything relevant. Could you please post an example of how the Make/options file should look like?

By the way, when OpenMP is not linked, the relevant pragmas are ignored by the compiler. This happens in both GCC and ICC. I don't use Clang though. So, I guess there is no need for guards.


The simplest example is applications/test/openmp/Make/options (in 1712 and later).


If you check the corresponding source file (Test-openmp.C) you'll perhaps see what I mean about the guards. As a minimum, you need a guard around the include <omp.h> statement.
After that you can decide to use any of the following approaches:
  1. Just use the pragmas and let the compiler decide to use/ignore.
  2. Guard with the standard #ifdef _OPENMP
  3. Guard with the cfmesh/OpenFOAM #ifdef USE_OMP

The only reason I suggest the USE_OMP guard is to let you explicitly disable openmp for benchmarking and debugging as required by changing the Make/options entry. If you don't need this for benchmarking, debugging etc, no worries.



Quote:


I agree with you completely. I have been doing some preliminary profiling of my code and memory accesses are taking up nearly 80% of the computation time! Clearly, OpenFOAM would do better with more vectorization.
I wouldn't draw the same conclusion at all, but state that vectorization makes the most sense when the arithmetic intensity is much higher (see the roofline model in the CINECA presentation).


All times are GMT -4. The time now is 03:19.