CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM > OpenFOAM Running, Solving & CFD

OpenFoam and OpenMPI

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   April 20, 2012, 15:04
Default OpenFoam and OpenMPI
  #1
Senior Member
 
Pablo
Join Date: Mar 2009
Posts: 102
Rep Power: 17
pablodecastillo is on a distinguished road
Hello,

I am testing a machine with 64 (2.2 Ghz) cores (16x4), my problem is optimized to run in 8 cores.

If i send 1 job (partition in 8 cores ) , the job need over 1000 second to get 1 second of simulation.

If i send 4 jobs (every with 8 cores) , every job needs 2500 seconds to get 1 second of simulation. The RAM just using 10%.

If i send 6 jobs (every with 8 cores) , every job needs 4000 seconds to get 1 second of simulation. The RAM just using 15%.

If i send 8 jobs ( total 64 cores) , it is really really slow, (20% RAM).

Anybody can give me a clue or reason that what is happening??
It is using Ubuntu 11.10 binary version openfoam 2.1.

Basically 1 job in parallel is working perfectly but low performance when more than 1 job.

Pablo
pablodecastillo is offline   Reply With Quote

Old   April 22, 2012, 13:52
Default
  #2
Senior Member
 
Pablo
Join Date: Mar 2009
Posts: 102
Rep Power: 17
pablodecastillo is on a distinguished road
It seems that updating to 1.5.3 openmpi it is improving , but not enough, no ideas??
pablodecastillo is offline   Reply With Quote

Old   April 23, 2012, 06:31
Default
  #3
Senior Member
 
Olivier
Join Date: Jun 2009
Location: France, grenoble
Posts: 272
Rep Power: 17
olivierG is on a distinguished road
hello,

I am not sure to understand clearly:
- you have one machine (not a cluster), with 64 core: 4 CPu of 16 core each ?
If this is the case, you may be bandwidth limited: try to decompose your case in less than 8 core: 4, 2 or more: 16, and try again with 4/6/8 jobs.

regards,
olivier
olivierG is offline   Reply With Quote

Old   April 23, 2012, 06:44
Default
  #4
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,974
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Greetings to all!

I was going to write about processor affinity, but this seems to have already been discussed on Pablo's other thread: http://www.cfd-online.com/Forums/har...arameters.html

Best regards,
Bruno
__________________

Last edited by wyldckat; April 23, 2012 at 06:44. Reason: talked -> discussed
wyldckat is offline   Reply With Quote

Old   April 23, 2012, 08:28
Default
  #5
Senior Member
 
Pablo
Join Date: Mar 2009
Posts: 102
Rep Power: 17
pablodecastillo is on a distinguished road
Thanks Bruno and Olivier,

At the end it is a problem with the architecture machine, it is a AMD opteron 64 cores ( 4 socket with 16 cores ), it is sharing the floating point unit between 2 cores, so it has only 32 FPU.

It means that for numerical calculations it is like 32 cores, if more the performance is going down.
pablodecastillo is offline   Reply With Quote

Old   April 23, 2012, 08:48
Default
  #6
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,974
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Quote:
Originally Posted by pablodecastillo View Post
It means that for numerical calculations it is like 32 cores, if more the performance is going down.
With proper optimization options when building OpenFOAM and OpenMPI, it might be possible to overcome or minimize that issue! The simplest way is to use Gcc 4.6, preferably one of the latest ones 4.6.2 or 4.6.3.
__________________
wyldckat is offline   Reply With Quote

Old   April 23, 2012, 09:30
Default
  #7
Senior Member
 
Pablo
Join Date: Mar 2009
Posts: 102
Rep Power: 17
pablodecastillo is on a distinguished road
Hi Bruno,

Can you shared that proper optimization options for AMD?
Mine OF was compiled with Gcc 4.6.1, default options.

Pablo
pablodecastillo is offline   Reply With Quote

Old   April 23, 2012, 17:00
Default
  #8
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,974
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Hi Pablo,

For the 6200 series (I saw this post of yours), see the Gcc table from here: http://developer.amd.com/Assets/Comp...f-62004200.pdf
Caution: Do not use "-ffast-math".

By what I can see, it would be best to use Gcc 4.7.0.
(edit: I probably was thinking of the previous generation of AMD, when I wrote 4.6.3...)

The files that need modifications are:
Code:
wmake/rules/linux64Gcc/cOpt
wmake/rules/linux64Gcc/c++Opt
Add flags to the line that start with "cOPT" and "c++OPT", respectively.

As for installing Gcc 4.7.0... it depends on the Linux distribution you have, because some already have it somewhere; others will require you to do a custom build.

If your gcc and g++ binaries then have different names (e.g. gcc47), see here how you can tweak OpenFOAM to use your version: http://www.cfd-online.com/Forums/ope...tml#post278809 post #2

Best regards,
Bruno
__________________

Last edited by wyldckat; April 23, 2012 at 17:01. Reason: see "edit:"
wyldckat is offline   Reply With Quote

Old   April 23, 2012, 18:02
Default
  #9
Senior Member
 
Pablo
Join Date: Mar 2009
Posts: 102
Rep Power: 17
pablodecastillo is on a distinguished road
Hi Bruno,

This afternoon i added c++OPT = -O3 -mprefer-avx128 -ftree-vectorize -ffast-math (same for cOpt),
I got 20 to 25 better performance on speed , it was with gcc 4.6.

Tomorrow i will try with 4.7 how you are pointing.

Why ffast-math is not a good idea, if the main trouble with this machines is that there is only one FPU for 2 cores?

Pablo
pablodecastillo is offline   Reply With Quote

Old   April 23, 2012, 18:24
Default
  #10
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,974
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Quote:
Originally Posted by pablodecastillo View Post
Why ffast-math is not a good idea, if the main trouble with this machines is that there is only one FPU for 2 cores?
Like the document states:
Quote:
Enable faster, less precise math operations
And quoting from gcc's online manual: http://gcc.gnu.org/onlinedocs/gcc-4....e-Options.html
Quote:
[...] it can result in incorrect output for programs that depend on an exact implementation of IEEE or ISO rules/specifications for math functions. It may, however, yield faster code for programs that do not require the guarantees of these specifications.
__________________
wyldckat is offline   Reply With Quote

Old   April 23, 2012, 18:31
Default
  #11
Senior Member
 
Pablo
Join Date: Mar 2009
Posts: 102
Rep Power: 17
pablodecastillo is on a distinguished road
Hi Bruno,

It seems that -mprefer-avx128 is the mainly factor to improve the speed. Did u get improved speed with 4.7?

Thanks
pablodecastillo is offline   Reply With Quote

Old   April 24, 2012, 05:17
Default
  #12
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,974
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Hi Pablo,

I don't have access to one of the latest AMD CPUs, so I can't test this particular speedup. All I know is that Gcc 4.7 has improved support for this (new) generation of AMD CPUs. And the only other compiler that should support them is Open64.

Best regards,
Bruno
__________________
wyldckat is offline   Reply With Quote

Old   April 24, 2012, 05:27
Default
  #13
Senior Member
 
Pablo
Join Date: Mar 2009
Posts: 102
Rep Power: 17
pablodecastillo is on a distinguished road
Any idea how compile Openfoam with Open64?
pablodecastillo is offline   Reply With Quote

Old   April 24, 2012, 05:33
Default
  #14
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,974
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Quote:
Originally Posted by pablodecastillo View Post
Any idea how compile Openfoam with Open64?
I've never tested it, because:
Open64 is somewhat of an experimental compiler + OpenFOAM is highly demanding in terms of C++ standards = so I've never tried it.

For comparison, the Intel C++ Compiler (ICC) requires OpenFOAM to have some modified templates, adjusted just of ICC. This is because ICC is unable to do everything that Gcc does. Therefore, in this measure of comparison, it's best to stay with Gcc.
__________________
wyldckat is offline   Reply With Quote

Old   June 14, 2012, 13:15
Default
  #15
Senior Member
 
akidess's Avatar
 
Anton Kidess
Join Date: May 2009
Location: Germany
Posts: 1,377
Rep Power: 29
akidess will become famous soon enough
Testing on AMD Opteron(tm) Processor 6134, interFoam, damBreakFine I only get a negligible speedup using Gcc 4.7 compared to Gcc 4.4.6! This is using the stock build options (except for the compiler executable), but I do want to test using the compiler flags suggested above.
__________________
*On twitter @akidTwit
*Spend as much time formulating your questions as you expect people to spend on their answer.
akidess is offline   Reply With Quote

Old   June 14, 2012, 14:28
Default
  #16
Senior Member
 
Pablo
Join Date: Mar 2009
Posts: 102
Rep Power: 17
pablodecastillo is on a distinguished road
Hello,

There is a document from AMD for HPC computing, because to really improve the performance, you must compile with recomended flags and modify the BIOS.
pablodecastillo is offline   Reply With Quote

Old   June 14, 2012, 15:05
Default
  #17
Senior Member
 
akidess's Avatar
 
Anton Kidess
Join Date: May 2009
Location: Germany
Posts: 1,377
Rep Power: 29
akidess will become famous soon enough
Can you elaborate?
__________________
*On twitter @akidTwit
*Spend as much time formulating your questions as you expect people to spend on their answer.
akidess is offline   Reply With Quote

Old   June 14, 2012, 15:22
Default
  #18
Senior Member
 
Pablo
Join Date: Mar 2009
Posts: 102
Rep Power: 17
pablodecastillo is on a distinguished road
If you send me one email, i can send you the AMD paper.
pablodecastillo is offline   Reply With Quote

Old   June 14, 2012, 16:54
Default
  #19
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,974
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Greetings to all!

I googled for "amd hpc gcc flags" (without the quotes) and the first hit was a very interesting tutorial: http://developer.amd.com/documentati...anceGains.aspx

As for the 6100 Opteron series, looks like this is the proposed compiler spec cheat-sheet: http://developer.amd.com/Assets/Comp...f-61004100.pdf
And don't forget: do NOT use ICC for AMD...

Official (shortcut) page for the series: http://developer.amd.com/Magny-Cours

Best regards,
Bruno
__________________
wyldckat is offline   Reply With Quote

Old   June 15, 2012, 04:50
Default
  #20
Senior Member
 
akidess's Avatar
 
Anton Kidess
Join Date: May 2009
Location: Germany
Posts: 1,377
Rep Power: 29
akidess will become famous soon enough
Bruno, I followed this guide: http://developer.amd.com/assets/AMDGCCQuickRef.pdf

I think it's basically the same as the one you posted, but older. On a single core damBreakFine run, using the flags they suggest there (march=amdfam10, mabm, msse4a), ExecutionTime was reduced by 10% (compared to gcc 4.7 without the extra flags).

Quote:
Originally Posted by wyldckat View Post
Greetings to all!

I googled for "amd hpc gcc flags" (without the quotes) and the first hit was a very interesting tutorial: http://developer.amd.com/documentati...anceGains.aspx

As for the 6100 Opteron series, looks like this is the proposed compiler spec cheat-sheet: http://developer.amd.com/Assets/Comp...f-61004100.pdf
And don't forget: do NOT use ICC for AMD...

Official (shortcut) page for the series: http://developer.amd.com/Magny-Cours

Best regards,
Bruno
__________________
*On twitter @akidTwit
*Spend as much time formulating your questions as you expect people to spend on their answer.
akidess is offline   Reply With Quote

Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Superlinear speedup in OpenFOAM 13 msrinath80 OpenFOAM Running, Solving & CFD 18 March 3, 2015 06:36
OpenFOAM v1.6 & OpenMPI & functionObjects bruce OpenFOAM Bugs 7 December 16, 2011 15:37
OpenFOAM v1.6 & OpenMPI & functionObjects bruce OpenFOAM Running, Solving & CFD 1 August 7, 2009 14:15
OpenFOAM 1.5.x package - CentOS 5.3 x86_64 linnemann OpenFOAM Installation 7 July 30, 2009 04:14
OpenFOAM 14 with OpenMPI 12 fhy OpenFOAM Installation 0 July 12, 2007 19:12


All times are GMT -4. The time now is 05:12.