CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Main CFD Forum

No different caculation speed between serial and openMP code?

Register Blogs Community New Posts Updated Threads Search

Like Tree3Likes
  • 1 Post By flotus1
  • 2 Post By sbaffini

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   May 10, 2020, 06:37
Default No different caculation speed between serial and openMP code?
  #1
New Member
 
Anh Dinh Le
Join Date: Apr 2020
Posts: 24
Rep Power: 6
AnhDL is on a distinguished road
Dear experts,

I wrote my own fortran code for multiphase flow problems. My code was originally in serial form. I submit the batch job file to compiler to use 20cpus for my code. The calculation time was much faster than the ifort command only. Here is the batch job file:
#!/bin/bash
#SBATCH --job-name=testOMP
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=20
#SBATCH --exclusive
#SBATCH --time=0-20:00:00
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
time ./2D-test


Then I modified the serial code to openMP parallel used the !$omp parallel do/end parallel do. However, there was nos different calculation speed between serial and parallel code. The way I modified shown in picture (part of code).

Please give advice. Did I modified the code correctly?
Attached Images
File Type: jpg omp.jpg (123.9 KB, 17 views)
AnhDL is offline   Reply With Quote

Old   May 12, 2020, 17:12
Default
  #2
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
That's not a whole lot of information. And I don't know what to make of this statement:
Quote:
The calculation time was much faster than the ifort command only
What's the point of comparing compile time to run time? And it immediately begs the question: is your job large enough to scale across 20 threads.

There is not too much to work with, and to be honest, I don't have a whole lot of first-hand experience with OpenMP for this type of structured grids. Here is a more general check list for "my OpenMP code doesn't scale". In no particular order, and definitely incomplete.
  • did you remember to compile with an openmp flag?
  • Is your case large enough to scale across N cores?
  • Are you measuring run time in a meaningful way? Using "time" alone on the whole code, you might be measuring a lot of serial setup time. Instrument your code with timing measurements and/or use profilers
  • Are you sure the subroutine you parallelized is the run time bottleneck? Again, Instrument your code with timing measurements and/or use profilers
  • Do you have a load balancing problem? A static schedule for the OpenMP loop with a smaller chunk size can help
  • Is NUMA getting in your way? Setting the loop schedule to static, and applying proper "first touch" initialization can help. Along with binding the OpenMP threads to individual cores. The latter is always a good idea anyway.
  • Did you produce race conditions or some other nasty bugs that hinder parallelization? Intel does provide handy tools like Inspector. Use them!
  • Still not sure what is holding your code back? Intel VTune Amplifier. Last time I checked, both Inspector and VTune were free to download and use.

Further reading: https://moodle.rrze.uni-erlangen.de/...iew.php?id=310
Yes, that's the material for a a full 3-day course. Turning a serial code into OpenMP parallel is the easy part. Getting decent performance and scaling is the harder part.
AnhDL likes this.
flotus1 is offline   Reply With Quote

Old   May 15, 2020, 13:15
Default
  #3
New Member
 
Pedro Costa
Join Date: Jul 2017
Posts: 9
Rep Power: 8
pcosta is on a distinguished road
You are only parallelizing the outer loop. You would need to have something like:
Code:
$OMP DO COLLAPSE(2)
to parallelize the two loops... I don't think that will solve your problem, but you never know.
pcosta is offline   Reply With Quote

Old   May 15, 2020, 20:00
Default
  #4
New Member
 
Anh Dinh Le
Join Date: Apr 2020
Posts: 24
Rep Power: 6
AnhDL is on a distinguished road
Quote:
Originally Posted by pcosta View Post
You are only parallelizing the outer loop. You would need to have something like:
Code:
$OMP DO COLLAPSE(2)
to parallelize the two loops... I don't think that will solve your problem, but you never know.
Pcosta,

I did make the COLLAPSE(2) and (3), the same resulted.
I tried to optimize the serial code structure, the calculation speed up now (a bit).
Still dont know how to improve the parallel run. Maybe the parallel calculation cannot show the improvement for such small number of grid point (279 x 70) or maybe I made the wrong BATCH job for openMP.
AnhDL is offline   Reply With Quote

Old   May 16, 2020, 04:17
Default
  #5
Senior Member
 
sbaffini's Avatar
 
Paolo Lampitella
Join Date: Mar 2009
Location: Italy
Posts: 2,151
Blog Entries: 29
Rep Power: 39
sbaffini will become famous soon enoughsbaffini will become famous soon enough
Send a message via Skype™ to sbaffini
I strongly suggest to avoid testing on a system with a job scheduler which might be shared with others... at least not this kind of test, where you don't know if everything works correctly (even if you might be alone on a given node).

These days there is no single pc, or even cell phone for that matter, that has a single processor anymore, so you are plenty of possibilities when it comes to testing.

Only go to larger machines when you are absolutely sure that everything works on the smaller ones and, of course, when you know how to use their job schedulers.

In this specific case, however, the size of the problem looks absolutely small to achieve any relevant speedup in parallel. This would be true in MPI but it is even more so in openmp.

If you are absolutely true that the only problem might be the size of your task, just throw something bigger at it, like 10 or 100 bigger.
pcosta and AnhDL like this.
sbaffini is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Parallel Fortran 95 code using Openmp Shiv1510 Main CFD Forum 5 February 14, 2020 11:36
openmp with single processor Ravindra Shende Main CFD Forum 2 July 15, 2013 10:51
OpenMP in Junction Box Routine Hannes_Kiel CFX 10 September 21, 2010 13:51


All times are GMT -4. The time now is 15:31.