CFD Online Logo CFD Online URL
Home > Forums > Main CFD Forum

About the OpenMP

Register Blogs Members List Search Today's Posts Mark Forums Read

LinkBack Thread Tools Display Modes
Old   August 19, 2007, 20:24
Default About the OpenMP
jinwon park
Posts: n/a
I am solving the 2D compressible inviscid fluid flow containg 1000 by 1000 cells. It requires enormous time to get the result in a single machine. Someone recommended me to use the openMP or MPI to reduce the computational cost. But I am new to that subject. Can anyone give me the way to learn about the openMP? In fact, I don't have enough time to learn it so that I am finding the way to learn it with less effort.

Thanks in advance.


  Reply With Quote

Old   August 20, 2007, 11:09
Default Re: About the OpenMP
Jonas Holdeman
Posts: n/a
I don't know what your computing constraints are, but developing technology is opening another possibility. Computers with quad processors are on the market now and (desktop) chips with eight processors are said to be on the way. The paradigm is to organize the software into multiple threads which can run simultaneously. There is a single memory and a single operating system, simplifying the communication problem found with multi-computer/multi-operating system configurations. NASA is said to be getting a new super computer, expected to be the 64th most powerful in the world, with 2000 processors but one shared memory and running under a single Linux operating system.

At the same time, Intel has released into the public domain some software called "Threading Building Blocks" or TBB. This is a set of C++ classes that are supposed to simplify the use of parallelism. Often a program has loops which could be computed in parallel. The TBB uses what is called "task-based parallelism." For instance you might replace a "for" statement with a "parallel_for". A task scheduler automatically splits the loop into a number of pieces depending on the number of processors available. The same code will run on one-, two-, four-, eight-core, etc machines without further intervention. (A single core processor is used to debug that portion of the code that does not involve parallelism). As for balancing, if one core runs out of things to do, it will "steal" tasks from the queue of another processor that is busy.

The Intel home page for this is . There are a number of articles about TBB on the net, for instance There is a book "Intel Threading Building Blocks" by James Reinders (O'Reilly).

I have not used TBB, or even downloaded the code to look at it. I have read through the book once to understand the scope of the problems, but don't understnd the details yet.

Another possibility is to use a GPGPU (general purpose graphical processing unit). Nvidia has a new processor out now named "Tesla". This is one of their graphics boards but without connections for a display monitor. Nvidia claims it will do .5 teraflops ($1500). This uses a SIMD (single instruction multiple data) architechture. They have a C++ compiler called CUDA to make programming "easier". Unfortunately, Tesla is single precision, but I think I saw somewhere that they will have a 64 bit product by the end of the year. There is even a company that claims they will code your application for you to run on a GPGPU.

ATI has graphic processors that can be used in a similar way. I have not seen it mentioned in print, but you can bet that AMD bought ATI to get this GPGPU technology to compete with Intel in the commodity supercomputer field, not to build graphics support chips.

  Reply With Quote

Old   August 20, 2007, 12:34
Default Re: About the OpenMP
Posts: n/a
10 years ago I tried OpenMP on an 8 CPU SGI. It was pretty good for the first few additional CPUs, but ulimately the drop off was disappointing. Better to run multiple serial jobs on the hardware than force through a single parallel one. Now that Intel has finally caught up, the same will probably be true.
  Reply With Quote

Old   August 20, 2007, 18:31
Default Re: About the OpenMP
Ananda Himansu
Posts: n/a
Along with all the good stuff that Jonas mentions, in the past there was the High Performance Fortran (HPF) initiative. In the future we will possibly see the Fortress language and compiler under development at Sun Microsystems. But all these things take time to investigate and recode for. OpenMP and MPI are more widely established, and will be faster for you to recode to. There was some good book that introduced both, that I cannot recall the title or author of. Just search on amazon.
  Reply With Quote

Old   August 23, 2007, 08:03
Default Re: About the OpenMP
Posts: n/a
May I ask what is your intended target? How many processors and what type of interconnection ? Is it NUMA (Non-uniform memory access)? How many problem you need to solve? Are you develop some kind of code that will be used by others very often or is it just a coding for specific problem?

OpenMP is much easier to program but hardly scalable.

If all the cores are sharing the same bus (Intel architecture), you will eventually get bandwith saturation. If each core has there own memory and share it by NUMA (AMD architecture) then condition is much better.

If you are going to write it from scratch, you should take a look at scalapack. I never work on I'm pretty sure learning it is much faster than writing your own MPI.

By the way, for OpenMP, please check there are alot of good tutorial there.

  Reply With Quote

Old   August 28, 2007, 05:47
Default Re: About the OpenMP
Posts: n/a
I would say OpenMP is as scalable, as anything else on given architecture. It just depends how you program it.

The best 'commodity' platform today is Opteron/AMD with its NUMAcc memory subsystem. It allows you to get perfect speedup with shared memory/OpenMP (VERY easy to program) provided that you allocate the data properly: on NUMA architectures, every CPU should mostly access its local memory.

This is very easily achievable for all kinds of FD schemes on structured meshes - you simply split your domain into blocks (eg. by Z dimension index) and place each block on a separate CPU using so called 'first touch'. Then, with OpenMP, you program all the loops in a normal (sequential) way, adding OpenMP compiler directives.

MPI will take you much more time, since you need to program all the communication explicitly, which is generally a pain. I would only do it, if available SMP/Opteron platform was not fast enough.


  Reply With Quote


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Similar Threads
Thread Thread Starter Forum Replies Last Post
How to parallelize this fortran code by openmp ronac Main CFD Forum 1 May 11, 2016 02:12
OpenMP maz Main CFD Forum 5 May 3, 2013 01:00
OpenMP in Junction Box Routine Hannes_Kiel CFX 10 September 21, 2010 13:51
OpenMP and fortran John Deas Main CFD Forum 0 May 17, 2007 16:53
Parallel computing and OpenMP ganesh Main CFD Forum 7 October 27, 2006 10:15

All times are GMT -4. The time now is 00:33.