OpenFoam and OpenMPI
Hello,
I am testing a machine with 64 (2.2 Ghz) cores (16x4), my problem is optimized to run in 8 cores. If i send 1 job (partition in 8 cores ) , the job need over 1000 second to get 1 second of simulation. If i send 4 jobs (every with 8 cores) , every job needs 2500 seconds to get 1 second of simulation. The RAM just using 10%. If i send 6 jobs (every with 8 cores) , every job needs 4000 seconds to get 1 second of simulation. The RAM just using 15%. If i send 8 jobs ( total 64 cores) , it is really really slow, (20% RAM). Anybody can give me a clue or reason that what is happening?? It is using Ubuntu 11.10 binary version openfoam 2.1. Basically 1 job in parallel is working perfectly but low performance when more than 1 job. Pablo |
It seems that updating to 1.5.3 openmpi it is improving , but not enough, no ideas??
|
hello,
I am not sure to understand clearly: - you have one machine (not a cluster), with 64 core: 4 CPu of 16 core each ? If this is the case, you may be bandwidth limited: try to decompose your case in less than 8 core: 4, 2 or more: 16, and try again with 4/6/8 jobs. regards, olivier |
Greetings to all!
I was going to write about processor affinity, but this seems to have already been discussed on Pablo's other thread: http://www.cfd-online.com/Forums/har...arameters.html Best regards, Bruno |
Thanks Bruno and Olivier,
At the end it is a problem with the architecture machine, it is a AMD opteron 64 cores ( 4 socket with 16 cores ), it is sharing the floating point unit between 2 cores, so it has only 32 FPU. It means that for numerical calculations it is like 32 cores, if more the performance is going down. |
Quote:
|
Hi Bruno,
Can you shared that proper optimization options for AMD? Mine OF was compiled with Gcc 4.6.1, default options. Pablo |
Hi Pablo,
For the 6200 series (I saw this post of yours), see the Gcc table from here: http://developer.amd.com/Assets/Comp...f-62004200.pdf Caution: Do not use "-ffast-math". By what I can see, it would be best to use Gcc 4.7.0. (edit: I probably was thinking of the previous generation of AMD, when I wrote 4.6.3...) The files that need modifications are: Code:
wmake/rules/linux64Gcc/cOpt As for installing Gcc 4.7.0... it depends on the Linux distribution you have, because some already have it somewhere; others will require you to do a custom build. If your gcc and g++ binaries then have different names (e.g. gcc47), see here how you can tweak OpenFOAM to use your version: http://www.cfd-online.com/Forums/ope...tml#post278809 post #2 Best regards, Bruno |
Hi Bruno,
This afternoon i added c++OPT = -O3 -mprefer-avx128 -ftree-vectorize -ffast-math (same for cOpt), I got 20 to 25 better performance on speed , it was with gcc 4.6. Tomorrow i will try with 4.7 how you are pointing. Why ffast-math is not a good idea, if the main trouble with this machines is that there is only one FPU for 2 cores? Pablo |
Quote:
Quote:
Quote:
|
Hi Bruno,
It seems that -mprefer-avx128 is the mainly factor to improve the speed. Did u get improved speed with 4.7? Thanks |
Hi Pablo,
I don't have access to one of the latest AMD CPUs, so I can't test this particular speedup. All I know is that Gcc 4.7 has improved support for this (new) generation of AMD CPUs. And the only other compiler that should support them is Open64. Best regards, Bruno |
Any idea how compile Openfoam with Open64?
|
Quote:
Open64 is somewhat of an experimental compiler + OpenFOAM is highly demanding in terms of C++ standards = so I've never tried it. For comparison, the Intel C++ Compiler (ICC) requires OpenFOAM to have some modified templates, adjusted just of ICC. This is because ICC is unable to do everything that Gcc does. Therefore, in this measure of comparison, it's best to stay with Gcc. |
Testing on AMD Opteron(tm) Processor 6134, interFoam, damBreakFine I only get a negligible speedup using Gcc 4.7 compared to Gcc 4.4.6! This is using the stock build options (except for the compiler executable), but I do want to test using the compiler flags suggested above.
|
Hello,
There is a document from AMD for HPC computing, because to really improve the performance, you must compile with recomended flags and modify the BIOS. |
Can you elaborate?
|
If you send me one email, i can send you the AMD paper.
|
Greetings to all!
I googled for "amd hpc gcc flags" (without the quotes) and the first hit was a very interesting tutorial: http://developer.amd.com/documentati...anceGains.aspx As for the 6100 Opteron series, looks like this is the proposed compiler spec cheat-sheet: http://developer.amd.com/Assets/Comp...f-61004100.pdf And don't forget: do NOT use ICC for AMD... ;) Official (shortcut) page for the series: http://developer.amd.com/Magny-Cours ;) Best regards, Bruno |
Bruno, I followed this guide: http://developer.amd.com/assets/AMDGCCQuickRef.pdf
I think it's basically the same as the one you posted, but older. On a single core damBreakFine run, using the flags they suggest there (march=amdfam10, mabm, msse4a), ExecutionTime was reduced by 10% (compared to gcc 4.7 without the extra flags). Quote:
|
Hi Anton,
Well, the "march=barcelona" option might give you a few more percentage points, since the "amdfam10" is for the mainstream processors, not Opteron specific ;) Best regards, Bruno PS: This line on the reference guide you followed just cracked me up: Quote:
|
Thanks Bruno. According to the gcc documentation, barcelona and fam10 seem to be equivalent: http://gcc.gnu.org/onlinedocs/gcc/i3...4-Options.html. Am I reading that wrong?
|
Good catch! I was hoping they wouldn't do the exact same optimizations :(
|
All times are GMT -4. The time now is 07:59. |