Parallel computing and OpenMP

October 23, 2006, 09:58

Dear Friends,

I am planning to parallelise my 2D unsteady code. I would like to know your comments on usage of MPI/OpenMP for parallelisation of such a code. I have read through quite a few material on MPI vs OpenMP. What I would like to also know is the possibility and ease of OpenMP + Distributed memory architecture ?

Regards and Thanks in advance,

Ganesh

October 23, 2006, 12:08

generally: the OpenMP is designed for shared memory multi-processor machnes and MPI is designed for distributed memory machines but it support shared memory and work on shared memory machines fine.

if you look at literaure it is implyed that performance of MPI is better than OpenMP but for large scale problems combination is recommended (see recent artilce in parallel computing journal on this subject) but why?!!! (MPI seems to more time consuming as use TCP/IP, while OpenMP transfer data without network). In my opinion it is because of in OpenMP we directly breake loopes and vectors while in MPI we breake data to smaler blocks and then work on them, so local data is small and certainly we gain cache efficieny (reducing cache-miss) (see some article regarding to this subject in Int. J. High Per. Computing, some article related to this is freely online).

ease of implementation: certainly OpenMP is easier to implementation, usually with adding few (command) line in code.

October 24, 2006, 03:23

My experience is that MPI scales better than OpenMP. Also, implementations of OpenMP that I've used are difficult to profile, since all CPUs are 100% loaded, even if doing nothing.

October 24, 2006, 08:23

This could be a huge discussion in which 100 people have 100 different opinions. The answer depends largely on the type and size of problem you are solving and on the serial algorithm you are trying to parallelize. So, unless you're telling ganesh what you are working on, having "good" or "bad" experience with either MPI or OpenMP doesn't really say anything.

October 27, 2006, 08:01

Dear Mani,

I am presently trying to parallelise an unsteady 2D FVM solver. The solver is being used, in its serial form to handle problems like laminar vortex shedding past cylinder, lift hysterisis of pitching airfoils and aeroelastic computations. The problem size in 2D, I would like to test my parallel code with would be of the range of 30,000-70,000 volumes in the domain. These unsteady problems on a serial machine (with the serial code) could take on reasonably fine grids, computational times running into 13-15 hours, so my first aim is in speeding up the computation and hence the parallelisation. I would later like to look into concepts of scalability of the code and other issues, so in a long run, in terms of programming advantage as well as scalability, I would like to know the experience /comments of those who have worked in parallelising CFD codes.

Regards and Thanks in advance,

Ganesh

October 27, 2006, 09:03

Hi Ganesh,

My experience in implementing CFD codes in parallel is restricted only to Finite Element Method with hybrid parallelism (MPI or OpenMP alone or both at the same time). After having implemented a 2D code, I started a 3D code from scratch putting the MPI part (I guess, it's easier to build a parallel code from scratch than changing an old one). I could say that this subject is broad enough to spend several days discussing several aspects without reaching any conclusion so, I'll give you my impressions only based on the parallelism models and their basic ideas (I'll not say anything regarding specific mathematical methods or algorithms of doing parallelism).

Easiness or hardness is relative but most of the time you'll find people saying that OpenMP is easier to implement than MPI. Ok, OpenMP is easy since you don't have memory dependency. A simple example of memory dependency in unstructured grids is when you have elements (or cells) loops updating recursively nodal (or point) values. As you can have more than one thread working in different cells sharing the same node, your result will probably be undefined or polluted. It can be solved by defining blocks of cells where no one cell within this block share nodal data. In my case, I use a mesh coloring algorithm to build these blocks. Of course, your loop will have to be transformed in order to use this blocks of elements accordingly.

Regarding MPI, I'd say that the difficulty is more associated with implementation than with the comprehension of the parallelism model employed. MPI makes the problem easier to understand in parallel (at least in my case) because you can clearly see that after partitioning your model you'll have several pieces of problem that will need to be solved almost independently in each processor and that there are some data to be combined across these pieces of domain.

The subject related to scalability is another long, long story. It'll depend on several things: system architecture, algorithms, implementation, etc, etc... but *in my case* I've been getting better results with MPI than OpenMP using a SGI Altix system.

For someone willing to implement a parallel code, I'd say that it's important to understand the basic concepts behind the parallel models (memory dependency: OpenMP, data partitioning: MPI) before get starting your job. It'll make your decision/implementation easier. Furthermore, the performance you *should* reach with any parallel model chosen. Poor or wonderful speedups, at least some gain in perfomance you must reach, otherwise there'll be an indication that you've done something wrong, in the concept or in the implementation.

Hope I've helped in anything

Cheers

Renato.

October 27, 2006, 09:43

Ganesh, how efficient MPI and OpenMP are will depend mostly on the type of solver you are using, i.e. the solution algorithm and data structure.

When people say it's difficult to implement MPI code, I suspect they are talking about parallelization of an existing code (what you are trying to do). If you were to use domain decomposition with MPI, your code would have to be restructured, and depending on the code that may not be a trivial task, even if you have done it before. You need to start thinking in SIMD mode. Some serial codes are already cut out for that, others may need more attention.

The benefit, of course, is a potentially higher efficiency. I have run similar computations efficiently on my code based on mpich (and so have many other people), but that code was designed to be parallel almost from scratch. At the very least, I can tell you that we have probably spent as much time on designing the parallel aspects of the code (general unstructured interfacing of structured multiblock grids) as designing the flow solver (this may be exaggerated in my memory, but to do it right, it will take a significant amount of time). However, yours can be far simpler than mine, if you either have an unstructured grid, or you use multiblock in a fully structured way.

I would say OpenMP is good for three things: for quick and dirty parallelization, as the only viable method if your solution algorithm does not allow for large chunk data decomposition, and last but not least as a complementary level of parallelization on top of any MPI code (I haven't tried the latter, yet).

October 27, 2006, 10:15

13-15 hours is not a very long computational time for an unsteady-type flow simulation. I have studied spatially developing mixing layers, and found that in 2-D, a given simulation took around 24-30 hours to run, whereas a fully 3-D flow with transition to turbulence and fully-developed flow contained within the domain took ~800 hours to complete, given the same time step and number of time steps computed. Parallelisation of the code would be of obvios benefit in the 3-D case, and is a work in progress for me at the present time.

I also have no doubt that the results you obtain from vortex shedding simultions would be greatly improved by adding the extra dimension into the calculation.

October 23, 2006, 09:58	Parallel computing and OpenMP	#1
ganesh Guest Posts: n/a	Dear Friends, I am planning to parallelise my 2D unsteady code. I would like to know your comments on usage of MPI/OpenMP for parallelisation of such a code. I have read through quite a few material on MPI vs OpenMP. What I would like to also know is the possibility and ease of OpenMP + Distributed memory architecture ? Regards and Thanks in advance, Ganesh

October 23, 2006, 12:08	Re: Parallel computing and OpenMP	#2
rt Guest Posts: n/a	generally: the OpenMP is designed for shared memory multi-processor machnes and MPI is designed for distributed memory machines but it support shared memory and work on shared memory machines fine. if you look at literaure it is implyed that performance of MPI is better than OpenMP but for large scale problems combination is recommended (see recent artilce in parallel computing journal on this subject) but why?!!! (MPI seems to more time consuming as use TCP/IP, while OpenMP transfer data without network). In my opinion it is because of in OpenMP we directly breake loopes and vectors while in MPI we breake data to smaler blocks and then work on them, so local data is small and certainly we gain cache efficieny (reducing cache-miss) (see some article regarding to this subject in Int. J. High Per. Computing, some article related to this is freely online). ease of implementation: certainly OpenMP is easier to implementation, usually with adding few (command) line in code.

October 24, 2006, 03:23	Re: Parallel computing and OpenMP	#3
Steve Guest Posts: n/a	My experience is that MPI scales better than OpenMP. Also, implementations of OpenMP that I've used are difficult to profile, since all CPUs are 100% loaded, even if doing nothing.

October 24, 2006, 08:23	Re: Parallel computing and OpenMP	#4
Mani Guest Posts: n/a	This could be a huge discussion in which 100 people have 100 different opinions. The answer depends largely on the type and size of problem you are solving and on the serial algorithm you are trying to parallelize. So, unless you're telling ganesh what you are working on, having "good" or "bad" experience with either MPI or OpenMP doesn't really say anything.

October 27, 2006, 08:01	Re: Parallel computing and OpenMP	#5
ganesh Guest Posts: n/a	Dear Mani, I am presently trying to parallelise an unsteady 2D FVM solver. The solver is being used, in its serial form to handle problems like laminar vortex shedding past cylinder, lift hysterisis of pitching airfoils and aeroelastic computations. The problem size in 2D, I would like to test my parallel code with would be of the range of 30,000-70,000 volumes in the domain. These unsteady problems on a serial machine (with the serial code) could take on reasonably fine grids, computational times running into 13-15 hours, so my first aim is in speeding up the computation and hence the parallelisation. I would later like to look into concepts of scalability of the code and other issues, so in a long run, in terms of programming advantage as well as scalability, I would like to know the experience /comments of those who have worked in parallelising CFD codes. Regards and Thanks in advance, Ganesh

October 27, 2006, 09:03	Re: Parallel computing and OpenMP	#6
Renato. Guest Posts: n/a	Hi Ganesh, My experience in implementing CFD codes in parallel is restricted only to Finite Element Method with hybrid parallelism (MPI or OpenMP alone or both at the same time). After having implemented a 2D code, I started a 3D code from scratch putting the MPI part (I guess, it's easier to build a parallel code from scratch than changing an old one). I could say that this subject is broad enough to spend several days discussing several aspects without reaching any conclusion so, I'll give you my impressions only based on the parallelism models and their basic ideas (I'll not say anything regarding specific mathematical methods or algorithms of doing parallelism). Easiness or hardness is relative but most of the time you'll find people saying that OpenMP is easier to implement than MPI. Ok, OpenMP is easy since you don't have memory dependency. A simple example of memory dependency in unstructured grids is when you have elements (or cells) loops updating recursively nodal (or point) values. As you can have more than one thread working in different cells sharing the same node, your result will probably be undefined or polluted. It can be solved by defining blocks of cells where no one cell within this block share nodal data. In my case, I use a mesh coloring algorithm to build these blocks. Of course, your loop will have to be transformed in order to use this blocks of elements accordingly. Regarding MPI, I'd say that the difficulty is more associated with implementation than with the comprehension of the parallelism model employed. MPI makes the problem easier to understand in parallel (at least in my case) because you can clearly see that after partitioning your model you'll have several pieces of problem that will need to be solved almost independently in each processor and that there are some data to be combined across these pieces of domain. The subject related to scalability is another long, long story. It'll depend on several things: system architecture, algorithms, implementation, etc, etc... but in my case I've been getting better results with MPI than OpenMP using a SGI Altix system. For someone willing to implement a parallel code, I'd say that it's important to understand the basic concepts behind the parallel models (memory dependency: OpenMP, data partitioning: MPI) before get starting your job. It'll make your decision/implementation easier. Furthermore, the performance you should reach with any parallel model chosen. Poor or wonderful speedups, at least some gain in perfomance you must reach, otherwise there'll be an indication that you've done something wrong, in the concept or in the implementation. Hope I've helped in anything Cheers Renato.

October 27, 2006, 09:43	Re: Parallel computing and OpenMP	#7
Mani Guest Posts: n/a	Ganesh, how efficient MPI and OpenMP are will depend mostly on the type of solver you are using, i.e. the solution algorithm and data structure. When people say it's difficult to implement MPI code, I suspect they are talking about parallelization of an existing code (what you are trying to do). If you were to use domain decomposition with MPI, your code would have to be restructured, and depending on the code that may not be a trivial task, even if you have done it before. You need to start thinking in SIMD mode. Some serial codes are already cut out for that, others may need more attention. The benefit, of course, is a potentially higher efficiency. I have run similar computations efficiently on my code based on mpich (and so have many other people), but that code was designed to be parallel almost from scratch. At the very least, I can tell you that we have probably spent as much time on designing the parallel aspects of the code (general unstructured interfacing of structured multiblock grids) as designing the flow solver (this may be exaggerated in my memory, but to do it right, it will take a significant amount of time). However, yours can be far simpler than mine, if you either have an unstructured grid, or you use multiblock in a fully structured way. I would say OpenMP is good for three things: for quick and dirty parallelization, as the only viable method if your solution algorithm does not allow for large chunk data decomposition, and last but not least as a complementary level of parallelization on top of any MPI code (I haven't tried the latter, yet).

October 27, 2006, 10:15	Re: Parallel computing and OpenMP	#8
Andrew Guest Posts: n/a	13-15 hours is not a very long computational time for an unsteady-type flow simulation. I have studied spatially developing mixing layers, and found that in 2-D, a given simulation took around 24-30 hours to run, whereas a fully 3-D flow with transition to turbulence and fully-developed flow contained within the domain took ~800 hours to complete, given the same time step and number of time steps computed. Parallelisation of the code would be of obvios benefit in the 3-D case, and is a work in progress for me at the present time. I also have no doubt that the results you obtain from vortex shedding simultions would be greatly improved by adding the extra dimension into the calculation.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Parallel computing quad core	Prad	Main CFD Forum	13	February 9, 2009 14:28
OpenMP and shared memory programming	rt	Main CFD Forum	8	May 4, 2006 12:29