[Other] Comparison of OpenFOAM on i7, Xeon@32 cores, Xeon Phi Knights Landing, Tesla K20m

ma-tri-x · September 28, 2016, 08:48

Hey users, thought this might be interesting for you.

I compared a simple DamBreak case for interFoam with 100x100x100 cells and 115 time steps on the following machines:

- intel core i7, 3.4 ghz, 4 cores with OpenFOAM 2.3.0
- Tesla K20M with RapidCFD
- intel Xeon 2.2Ghz, 32 cores
- Xeon Phi Knights Landing 64 cores

so this compares more or less all kinds of currently available architectures.
RapidCFD and OpenFOAM2.3.0 are comparable, I think, in their usage. RapiCFD is I think the port of Openfoam 2.3.0 to CUDA based language.

Here's the values ("machine" <computation time [s]>):
"OF, Knights Landing Xeon Phi at 64 cores" 142.17
"OF, CPU Xeon 2.0 Ghz at 32 cores" 309.28
"Rapid-CFD, Tesla K20M at 100 W/250 W" 558.05
"OF, CPU core i7 3.40 Ghz at 4 cores" 1687.29

So the knights landing is quite ahead. Seems that OpenFOAM was already prepared for vectorization? At least to some extend. For the KNL OF was compiled with cray-mpi and Icc, Icpc 17.xxxx and vectorization flag "-xmic-avx512" instead of "-mmic".

I tried to find the market prices, but I don't guarantee for accuracy:
- Xeon Phi KNL: 6000 € + "mother hardware"
- 32 cores Xeon: > 4000 € (don't have an exact clue)
- tesla k20m: seems to be not on stock anymore. k20: $2500
- core i7: about $330 + mother hardware.

Computation times are attached as a histogram. Also the case file blockMeshDict that I used:

Code:

convertToMeters 1;

vertices
(
    (0 0 0)  // Vertex bld = 0 
    (1 0 0)  // Vertex brd = 1 
    (1 0 -1)  // Vertex frd = 2 
    (0 0 -1)  // Vertex fld = 3 

    (0 1 0)  // Vertex blt = 4 
    (1 1 0)  // Vertex brt = 5 
    (1 1 -1)  // Vertex frt = 6 
    (0 1 -1)  // Vertex flt = 7 
);

blocks
(
    hex (0 1 2 3   4 5 6 7) (100 100 100) simpleGrading (1 1 1)
);

edges
(
);

boundary
(
    Wall
    {
        type wall;
        faces
        (
            (0 1 2 3)
            (4 7 6 5)
        (3 7 4 0)
        (1 5 6 2)
        (3 2 6 7)
        (0 4 5 1)
        );
    }
);

mergePatchPairs
(
);

wyldckat · September 28, 2016, 16:37

Hi ma-tri-x,

Many thanks for the report and tests!

If you could test building with OpenFOAM-dev on the Knights Landing, you should see a considerable improvement! Paul Edwards from Intel has been working directly with the OpenFOAM Foundation to improve performance even further!
Look for the abstract "Performance Optimization of OpenFOAM on the new Intel® Xeon Phi™ Processor" on the Agenda page for the 4th Annual OpenFOAM User Conference 2016: http://www.esi-group.com/company/eve...ce-2016/agenda

Also, you can see dedicated rules for the KNL in OpenFOAM-dev: "linux64IccKNL" and "linux64GccKNL"

By the way, which model of KNL are you using? Is it the one that is directly installed on the motherboard or the PCI-E card edition?

Best regards,
Bruno

ma-tri-x · September 29, 2016, 06:26

Hi wyldckat!

Thanks for the quick reply! Yes I was thinking whether I should go to cologne, but my schedule won't make it possible.

As far as I know it's the newest integrated version of KNL. For sure not the PCIe version. It was part of the HPC of the HLRN.

Good news that OpenFOAM is going to be optimized for the KNL!

Sumeet Patil · April 3, 2017, 06:00

Hii,

Have anyone worked with profiling of OpenFOAM on Intel Xeon Phi ?
Can you help me out ? I'm unable to profile the OpenFOAM solver execution on MIC.

pm11dt · February 27, 2018, 13:46

Quote:

Originally Posted by ma-tri-x

Hey users, thought this might be interesting for you.

I compared a simple DamBreak case for interFoam with 100x100x100 cells and 115 time steps on the following machines:

- intel core i7, 3.4 ghz, 4 cores with OpenFOAM 2.3.0
- Tesla K20M with RapidCFD
- intel Xeon 2.2Ghz, 32 cores
- Xeon Phi Knights Landing 64 cores

so this compares more or less all kinds of currently available architectures.
RapidCFD and OpenFOAM2.3.0 are comparable, I think, in their usage. RapiCFD is I think the port of Openfoam 2.3.0 to CUDA based language.

Here's the values ("machine" <computation time [s]>):
"OF, Knights Landing Xeon Phi at 64 cores" 142.17
"OF, CPU Xeon 2.0 Ghz at 32 cores" 309.28
"Rapid-CFD, Tesla K20M at 100 W/250 W" 558.05
"OF, CPU core i7 3.40 Ghz at 4 cores" 1687.29

So the knights landing is quite ahead. Seems that OpenFOAM was already prepared for vectorization? At least to some extend. For the KNL OF was compiled with cray-mpi and Icc, Icpc 17.xxxx and vectorization flag "-xmic-avx512" instead of "-mmic".

I am assuming you have decomposed the cases in the following way: Xeon phi -np 64, Xeon 2.0 Ghz -np 32, Core i7 -np 4.

I which case how you can you say the Xeon phi case is faster when you're running on twice as processors (compared to Xeon 2.0 Ghz)!? Obviously its going to be twice as fast on twice as many processors!

I'm looking more depth into speed up on xeon phi KNL based environments for OpenFOAM. I'm not seeing anything like the kinds of speed ups mentioned in literature just yet even when compiling with special flags etc...

ma-tri-x · March 6, 2018, 06:35

Quote:

I which case how you can you say the Xeon phi case is faster when you're running on twice as processors (compared to Xeon 2.0 Ghz)!? Obviously its going to be twice as fast on twice as many processors!

Obviously, you haven't got much experience with speedups on different machines. Speedups are not determined by the amount of cores. If so, why is it then not like:
64*1.1Ghz = 70.4
32*2.0Ghz = 64
--> almost same speed but KNL has proven double speed.

Or even:
4*3.4Ghz = 13.6
32*2.0Ghz = 64
--> Factor of 4.7
but 1687.29/309.28 = 5.46
??

It depends on what you want to compare and how the software is capable of using the hardware ressources (a processor is not only determined by Ghz number and amount of cores). It even also depends on the compiler and its flags, sometimes also on the Operating System.

pm11dt · March 10, 2018, 09:00

Quote:

Originally Posted by ma-tri-x

Obviously, you haven't got much experience with speedups on different machines. Speedups are not determined by the amount of cores. If so, why is it then not like:
64*1.1Ghz = 70.4
32*2.0Ghz = 64
--> almost same speed but KNL has proven double speed.

Or even:
4*3.4Ghz = 13.6
32*2.0Ghz = 64
--> Factor of 4.7
but 1687.29/309.28 = 5.46
??

It depends on what you want to compare and how the software is capable of using the hardware ressources (a processor is not only determined by Ghz number and amount of cores). It even also depends on the compiler and its flags, sometimes also on the Operating System.

You make a good point however you didnt make your basis of comparison clear at all in your post. You didn't even mention the clock speed of the KNL cores.

I mean ultimately the wouldnt best comparison to make would be to base it on performance for a system of given FLOPS?

Also in addition to your point, your right, but additionally you would expect the KNL to run slow when running on 64 cores (compared to 32 on Xeon) becuase of the MPI lag involved and communication overheads.

And I am aware of the vectorision architecture of the KNL nodes and compiling with -O3 optimisation and such.

I hope to do some serious work on this subject and potentially even publish. It will be interesting to see if the same performance boost is seen across multiple Xeon phi compute nodes (scale up).

My work generally involves cases running on 240 cores or more (standard 2.5Ghz core xeons) so this is an area I am invested in.

September 28, 2016, 16:37		#2
wyldckat Retired Super Moderator Bruno Santos Join Date: Mar 2009 Location: Lisbon, Portugal Posts: 10,975 Blog Entries: 45 Rep Power: 128	Hi ma-tri-x, Many thanks for the report and tests! If you could test building with OpenFOAM-dev on the Knights Landing, you should see a considerable improvement! Paul Edwards from Intel has been working directly with the OpenFOAM Foundation to improve performance even further! Look for the abstract "Performance Optimization of OpenFOAM on the new Intel® Xeon Phi™ Processor" on the Agenda page for the 4th Annual OpenFOAM User Conference 2016: http://www.esi-group.com/company/eve...ce-2016/agenda Also, you can see dedicated rules for the KNL in OpenFOAM-dev: "linux64IccKNL" and "linux64GccKNL" By the way, which model of KNL are you using? Is it the one that is directly installed on the motherboard or the PCI-E card edition? Best regards, Bruno __________________ OpenFOAM: FAQ \| Getting started Forum: How to get help, to post code/output and forum guide Read this before sending me PM

September 29, 2016, 06:26		#3
ma-tri-x Member Join Date: Sep 2013 Posts: 46 Rep Power: 12	Hi wyldckat! Thanks for the quick reply! Yes I was thinking whether I should go to cologne, but my schedule won't make it possible. As far as I know it's the newest integrated version of KNL. For sure not the PCIe version. It was part of the HPC of the HLRN. Good news that OpenFOAM is going to be optimized for the KNL! wyldckat likes this.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
[OpenFOAM.org] OpenFOAM build on Intel Xeon Phi	asaijo	OpenFOAM Installation	31	July 26, 2017 10:35
OpenFOAM profiling on Intel Xeon and Xeon Phi processors	Sumeet Patil	OpenFOAM Programming & Development	3	April 28, 2017 14:19
Running OpenFoam in parallel on xeon phi	bala_gk1988	OpenFOAM Running, Solving & CFD	1	July 28, 2015 16:16
Superlinear speedup in OpenFOAM 13	msrinath80	OpenFOAM Running, Solving & CFD	18	March 3, 2015 05:36
New OpenFOAM Forum Structure	jola	OpenFOAM	2	October 19, 2011 06:55

April 3, 2017, 06:00		#4
Sumeet Patil New Member Sumeet Patil Join Date: Oct 2016 Location: Pune Posts: 9 Rep Power: 9	Hii, Have anyone worked with profiling of OpenFOAM on Intel Xeon Phi ? Can you help me out ? I'm unable to profile the OpenFOAM solver execution on MIC.