howto optimize OpenFOAM for Core i7 CPU using extended instruction set
Hi,
I tried to compile and optimize OpenFOAM for some new Core i7 CPUs with AVX2 and FMA. As far as I understand the default settings are using the general x86_64 instruction set. I forced the compiler to optimize for the extended instruction set by adding the -march=corei7 flag in /wmake/rules/linux64Gcc/c++Opt and /wmake/rules/linux64Gcc/cOpt. The compiler successfully used the settings, my first benchmarks did not show any noticeable effect though. I've been using a single thread for my cases in order to rule out MPI wait times and measure the raw CPU performance. I've got two questions regarding this issue: 1. Is this the best or correct way to set the compiler flags? 2. What performance gain can be expected from optimized binaries? Many Thanks Cutter |
Greetings Cutter,
In theory, AVX should increase performance in mathematical operations, for any application, after compiling with the necessary options. But I'm not sure if and how much OpenFOAM takes advantage of this, although this is usually optimized by the compiler either way. In addition, it also depends on the GCC version you're using. It's also possible that you're using GCC version that is new enough and already does this optimization by default, which would explain why you don't notice any performance increase with and without the option. Therefore, please provide the following details:
Best regards, Bruno |
Hi all,
I have the same experience as Cutter. I have tried over time with many openfoam versions, gcc, CPUs and operating system, without getting any measurable improvement from the machine-specific optimisation. Last test a few weeks ago, with gcc 4.9.2 on a very recent hardware with two different CPUs. The march option was correctly applied in both cases, the compilation itself took much longer, about 3 times longer, but the running time of the motorBike tutorial was almost exactly the same, both for mesh and solution. It would be interesting to know if anyone has a different experience and could point out the compiler options used. Best regards, Francesco |
Hi,
thanks to both of you for the initial feedback! I'm currently targeting the following two CPU models (obtained via cat /proc/cpuinfo and g++ --version): * Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz, g++ (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7) * Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz, g++ (GCC) 4.8.2 20140120 (Red Hat 4.8.2-16) I'm currently doing the research on the first of the two machines, which is running on a Fedora release 19 (Schrödinger’s Cat) with KDE desktop installation: Code:
$ uname -a Code:
g++ -dM -E -x c /dev/null | grep -i -e avx -e fma Code:
g++ -march=core-avx2 -dM -E -x c /dev/null | grep -i -e avx -e fma Code:
$ g++ -march=native -dM -E -x c /dev/null | grep -i -e avx -e fma Best Regards Cutter |
Hi Cutter,
Nice checks! Now we know the compiler is doing its job, or at least is enabling the set of instructions specific to the CPUs, as I think we all expected. Now the questions are: is it able to use them when compiling OpenFOAM? Does this make any difference to the execution time? Francesco |
Hi, Francesco,
Recently, I compared the performance of OF with icc and gcc. The two configurations are: #1. Icc 15.0.0, OpenFOAM-2.4.0, runs on E5-2680v3@2.5 GHz, compiled with -xHost -O3 flag, OS: CentOS 6.5 x64, RAM DDR4 #2. Gcc-4.8.1, OpenFOAM-2.3.0, runs on E5-2697v2@2.7 GHz, compiled with the default -m64 flag, OS: CentOS 7.0 x64, RAM DDR3 NOTE a): "-xHost will cause icc/icpc or icl to check the cpu information and find the highest level of extended instructions support to use." NOTE b): E5-2680v3 supports AVX2.0 instructions while E5-2697v2 doesn't. I run the cavity flow case in $FOAM_TUT/incompressible/icoFoam/cavity without modifying any files in it, (using only one process.) Results: The Icc configuration (#1) takes 0.16s The Gcc configuration (#2) takes 0.15s You see, almost the same! Hope this testing helps, -- Lianhua Quote:
|
Greetings to all!
I've had this thread on my to-do list and I haven't reached a solution yet. Nonetheless, I've done some basic tests that can at least give us a way to get the feeling for the scale up we can hope for. The repository is available here: https://github.com/wyldckat/avxtest The source code does not depend on OpenFOAM, needs only GCC (4.7 or newer) for building it and the summary results were as follows (using an AMD A10-7850K):
As for OpenFOAM, I still need to look into this in more detail. The compiler should be able to vectorize things on its own, but it seems that the code must be prepared in a way that the compiler can understand "oh, this I can vectorize like so and so". Best regards, Bruno |
All times are GMT -4. The time now is 16:38. |