About the OpenMP

August 19, 2007, 20:24

I am solving the 2D compressible inviscid fluid flow containg 1000 by 1000 cells. It requires enormous time to get the result in a single machine. Someone recommended me to use the openMP or MPI to reduce the computational cost. But I am new to that subject. Can anyone give me the way to learn about the openMP? In fact, I don't have enough time to learn it so that I am finding the way to learn it with less effort.

Thanks in advance.

Regards

August 20, 2007, 11:09

I don't know what your computing constraints are, but developing technology is opening another possibility. Computers with quad processors are on the market now and (desktop) chips with eight processors are said to be on the way. The paradigm is to organize the software into multiple threads which can run simultaneously. There is a single memory and a single operating system, simplifying the communication problem found with multi-computer/multi-operating system configurations. NASA is said to be getting a new super computer, expected to be the 64th most powerful in the world, with 2000 processors but one shared memory and running under a single Linux operating system.

At the same time, Intel has released into the public domain some software called "Threading Building Blocks" or TBB. This is a set of C++ classes that are supposed to simplify the use of parallelism. Often a program has loops which could be computed in parallel. The TBB uses what is called "task-based parallelism." For instance you might replace a "for" statement with a "parallel_for". A task scheduler automatically splits the loop into a number of pieces depending on the number of processors available. The same code will run on one-, two-, four-, eight-core, etc machines without further intervention. (A single core processor is used to debug that portion of the code that does not involve parallelism). As for balancing, if one core runs out of things to do, it will "steal" tasks from the queue of another processor that is busy.

The Intel home page for this is http://osstbb.intel.com . There are a number of articles about TBB on the net, for instance http://www.devx.com/cplus/Article/33334. There is a book "Intel Threading Building Blocks" by James Reinders (O'Reilly).

I have not used TBB, or even downloaded the code to look at it. I have read through the book once to understand the scope of the problems, but don't understnd the details yet.

Another possibility is to use a GPGPU (general purpose graphical processing unit). Nvidia has a new processor out now named "Tesla". This is one of their graphics boards but without connections for a display monitor. Nvidia claims it will do .5 teraflops ($1500). This uses a SIMD (single instruction multiple data) architechture. They have a C++ compiler called CUDA to make programming "easier". Unfortunately, Tesla is single precision, but I think I saw somewhere that they will have a 64 bit product by the end of the year. There is even a company that claims they will code your application for you to run on a GPGPU.

ATI has graphic processors that can be used in a similar way. I have not seen it mentioned in print, but you can bet that AMD bought ATI to get this GPGPU technology to compete with Intel in the commodity supercomputer field, not to build graphics support chips.

August 20, 2007, 12:34

10 years ago I tried OpenMP on an 8 CPU SGI. It was pretty good for the first few additional CPUs, but ulimately the drop off was disappointing. Better to run multiple serial jobs on the hardware than force through a single parallel one. Now that Intel has finally caught up, the same will probably be true.

August 20, 2007, 18:31

Along with all the good stuff that Jonas mentions, in the past there was the High Performance Fortran (HPF) initiative. In the future we will possibly see the Fortress language and compiler under development at Sun Microsystems. But all these things take time to investigate and recode for. OpenMP and MPI are more widely established, and will be faster for you to recode to. There was some good book that introduced both, that I cannot recall the title or author of. Just search on amazon.

August 23, 2007, 08:03

May I ask what is your intended target? How many processors and what type of interconnection ? Is it NUMA (Non-uniform memory access)? How many problem you need to solve? Are you develop some kind of code that will be used by others very often or is it just a coding for specific problem?

OpenMP is much easier to program but hardly scalable.

If all the cores are sharing the same bus (Intel architecture), you will eventually get bandwith saturation. If each core has there own memory and share it by NUMA (AMD architecture) then condition is much better.

If you are going to write it from scratch, you should take a look at scalapack. I never work on I'm pretty sure learning it is much faster than writing your own MPI.

By the way, for OpenMP, please check www.openmp.org there are alot of good tutorial there.

August 28, 2007, 05:47

I would say OpenMP is as scalable, as anything else on given architecture. It just depends how you program it.

The best 'commodity' platform today is Opteron/AMD with its NUMAcc memory subsystem. It allows you to get perfect speedup with shared memory/OpenMP (VERY easy to program) provided that you allocate the data properly: on NUMA architectures, every CPU should mostly access its local memory.

This is very easily achievable for all kinds of FD schemes on structured meshes - you simply split your domain into blocks (eg. by Z dimension index) and place each block on a separate CPU using so called 'first touch'. Then, with OpenMP, you program all the loops in a normal (sequential) way, adding OpenMP compiler directives.

MPI will take you much more time, since you need to program all the communication explicitly, which is generally a pain. I would only do it, if available SMP/Opteron platform was not fast enough.

Cheers,

Marcin

August 19, 2007, 20:24	About the OpenMP	#1
jinwon park Guest Posts: n/a	I am solving the 2D compressible inviscid fluid flow containg 1000 by 1000 cells. It requires enormous time to get the result in a single machine. Someone recommended me to use the openMP or MPI to reduce the computational cost. But I am new to that subject. Can anyone give me the way to learn about the openMP? In fact, I don't have enough time to learn it so that I am finding the way to learn it with less effort. Thanks in advance. Regards

August 20, 2007, 11:09	Re: About the OpenMP	#2
Jonas Holdeman Guest Posts: n/a	I don't know what your computing constraints are, but developing technology is opening another possibility. Computers with quad processors are on the market now and (desktop) chips with eight processors are said to be on the way. The paradigm is to organize the software into multiple threads which can run simultaneously. There is a single memory and a single operating system, simplifying the communication problem found with multi-computer/multi-operating system configurations. NASA is said to be getting a new super computer, expected to be the 64th most powerful in the world, with 2000 processors but one shared memory and running under a single Linux operating system. At the same time, Intel has released into the public domain some software called "Threading Building Blocks" or TBB. This is a set of C++ classes that are supposed to simplify the use of parallelism. Often a program has loops which could be computed in parallel. The TBB uses what is called "task-based parallelism." For instance you might replace a "for" statement with a "parallel_for". A task scheduler automatically splits the loop into a number of pieces depending on the number of processors available. The same code will run on one-, two-, four-, eight-core, etc machines without further intervention. (A single core processor is used to debug that portion of the code that does not involve parallelism). As for balancing, if one core runs out of things to do, it will "steal" tasks from the queue of another processor that is busy. The Intel home page for this is http://osstbb.intel.com . There are a number of articles about TBB on the net, for instance http://www.devx.com/cplus/Article/33334. There is a book "Intel Threading Building Blocks" by James Reinders (O'Reilly). I have not used TBB, or even downloaded the code to look at it. I have read through the book once to understand the scope of the problems, but don't understnd the details yet. Another possibility is to use a GPGPU (general purpose graphical processing unit). Nvidia has a new processor out now named "Tesla". This is one of their graphics boards but without connections for a display monitor. Nvidia claims it will do .5 teraflops ($1500). This uses a SIMD (single instruction multiple data) architechture. They have a C++ compiler called CUDA to make programming "easier". Unfortunately, Tesla is single precision, but I think I saw somewhere that they will have a 64 bit product by the end of the year. There is even a company that claims they will code your application for you to run on a GPGPU. ATI has graphic processors that can be used in a similar way. I have not seen it mentioned in print, but you can bet that AMD bought ATI to get this GPGPU technology to compete with Intel in the commodity supercomputer field, not to build graphics support chips.

August 20, 2007, 12:34	Re: About the OpenMP	#3
Steve Guest Posts: n/a	10 years ago I tried OpenMP on an 8 CPU SGI. It was pretty good for the first few additional CPUs, but ulimately the drop off was disappointing. Better to run multiple serial jobs on the hardware than force through a single parallel one. Now that Intel has finally caught up, the same will probably be true.

August 20, 2007, 18:31	Re: About the OpenMP	#4
Ananda Himansu Guest Posts: n/a	Along with all the good stuff that Jonas mentions, in the past there was the High Performance Fortran (HPF) initiative. In the future we will possibly see the Fortress language and compiler under development at Sun Microsystems. But all these things take time to investigate and recode for. OpenMP and MPI are more widely established, and will be faster for you to recode to. There was some good book that introduced both, that I cannot recall the title or author of. Just search on amazon.

August 23, 2007, 08:03	Re: About the OpenMP	#5
Arpiruk Guest Posts: n/a	May I ask what is your intended target? How many processors and what type of interconnection ? Is it NUMA (Non-uniform memory access)? How many problem you need to solve? Are you develop some kind of code that will be used by others very often or is it just a coding for specific problem? OpenMP is much easier to program but hardly scalable. If all the cores are sharing the same bus (Intel architecture), you will eventually get bandwith saturation. If each core has there own memory and share it by NUMA (AMD architecture) then condition is much better. If you are going to write it from scratch, you should take a look at scalapack. I never work on I'm pretty sure learning it is much faster than writing your own MPI. By the way, for OpenMP, please check www.openmp.org there are alot of good tutorial there.

August 28, 2007, 05:47	Re: About the OpenMP	#6
Marcin Guest Posts: n/a	I would say OpenMP is as scalable, as anything else on given architecture. It just depends how you program it. The best 'commodity' platform today is Opteron/AMD with its NUMAcc memory subsystem. It allows you to get perfect speedup with shared memory/OpenMP (VERY easy to program) provided that you allocate the data properly: on NUMA architectures, every CPU should mostly access its local memory. This is very easily achievable for all kinds of FD schemes on structured meshes - you simply split your domain into blocks (eg. by Z dimension index) and place each block on a separate CPU using so called 'first touch'. Then, with OpenMP, you program all the loops in a normal (sequential) way, adding OpenMP compiler directives. MPI will take you much more time, since you need to program all the communication explicitly, which is generally a pain. I would only do it, if available SMP/Opteron platform was not fast enough. Cheers, Marcin

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
How to parallelize this fortran code by openmp	ronac	Main CFD Forum	1	May 11, 2016 02:12
OpenMP	maz	Main CFD Forum	5	May 3, 2013 01:00
OpenMP in Junction Box Routine	Hannes_Kiel	CFX	10	September 21, 2010 13:51
OpenMP and fortran	John Deas	Main CFD Forum	0	May 17, 2007 16:53
Parallel computing and OpenMP	ganesh	Main CFD Forum	7	October 27, 2006 10:15