CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > OpenFOAM

Intelbs MPI and performance tools in OpenFOAM

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Display Modes
Old   December 7, 2007, 07:18
Default Hi, I'm working as an appli
  #1
New Member
 
Hans-Joachim Plum
Join Date: Mar 2009
Posts: 8
Rep Power: 8
hplum is on a distinguished road
Hi,

I'm working as an applications engineer in Intel and was involved in running OpenFOAM on Intel platforms, in particular check a replacement of Open MPI with Intel's product MPI.

I saw OpenFOAM (simpleFoam) running over 30% faster with Intel's MPI, as compared to OpenMPI, on crucial benchmarks of an important enduser of OpenFOAM, and Intel coop partner.

Also were this way the performance analysis tools of Intel's enabled with OpenFOAM.

Is this interesting for you? Have you ever thought about a model to link OpenFOAM with
commerical libaries of MPI?

Would appreciate yout interest in this story ...
hplum is offline   Reply With Quote

Old   December 7, 2007, 23:40
Default Hello Hans, It was interest
  #2
Member
 
a a saha
Join Date: Mar 2009
Posts: 67
Rep Power: 8
asaha is on a distinguished road
Hello Hans,

It was interesting to note that intel's mpi runs faster thatn openmpi implementation in OpenFoam. I would be keen to check the same performance improvement on Opteron based platforms. It would be great if you can help me in getting intel's mpi version for OpenFoam 1.4.1.
asaha is offline   Reply With Quote

Old   December 8, 2007, 08:41
Default Hans, when you say 30% faster
  #3
Senior Member
 
Srinath Madhavan (a.k.a pUl|)
Join Date: Mar 2009
Location: Edmonton, AB, Canada
Posts: 697
Rep Power: 11
msrinath80 is on a distinguished road
Hans, when you say 30% faster I would like to know what is the basis you use for that comparison. Do you mean to imply that the parallel speedup for say a 2-CPU job is 30% higher when using Intel's MPI as opposed to OpenMPI? If it's just the solution time, then I don't think there is anything surprising there as Intel's compilers/libraries are optimized to work exceptionally well on Intel platforms (which by the way are rarely used in any of the clusters at my university). On a related note, what exactly is your stand on using multi-core systems (dual/quad/octa) for parallel CFD computing. Is Intel aware that memory-bandwidth is the real bottleneck when switching to multi-core systems?
msrinath80 is offline   Reply With Quote

Old   December 10, 2007, 07:43
Default Hello, to respond to the co
  #4
New Member
 
Hans-Joachim Plum
Join Date: Mar 2009
Posts: 8
Rep Power: 8
hplum is on a distinguished road
Hello,

to respond to the comments of a a saha, Srinath:

- Intel MPI is a commercial product, not just
as easy to get as OpenMPI. That's why I was wondering whether the OpenFOAM creators could think of a model to support other MPI's for building that users might have a license of

- The over 30% comes >>just<< out of the MPI.
It compares 2 runs on exactly the same Cluster
(here an Intel 4 core cluster with 16 nodes,
that is 64 processors) OpenMPI vs IntelMPI -
all timings except the MPI coincide, only the MPI makes the difference.
hplum is offline   Reply With Quote

Old   December 10, 2007, 12:38
Default Hello Hans, I am not sure I
  #5
Senior Member
 
Srinath Madhavan (a.k.a pUl|)
Join Date: Mar 2009
Location: Edmonton, AB, Canada
Posts: 697
Rep Power: 11
msrinath80 is on a distinguished road
Hello Hans,

I am not sure I understand your answer. When you compare the speedup (for example see [1]) I wish to know if Intel's MPI gives a 30% increase when compared to Open MPI.

Secondly, you mention that you ran the tests on 16 nodes (each node featuring a quad core CPU). How did you assign the processes:

i) Did you schedule one MPI process per node so that each process would then communicate through an interconnect. If so, which interconnect (gigabit, infiniband, quadrics etc.)?

ii) Did you schedule MPI processes by filling each node and then moving to the next one? In this case, for 2 and 4 processes, all program instances would run on the same quad-core node. Only when moving to 6 or 8 processes, would the interconnect be used.

iii) What case did you run (as in how big)? How much RES memory did the serial run consume? How many time steps did you run the case for?


[1] http://www.cfd-online.com/OpenFOAM_D...es/1/4626.html
msrinath80 is offline   Reply With Quote

Old   December 11, 2007, 00:37
Default I have run my parallel cases o
  #6
Member
 
a a saha
Join Date: Mar 2009
Posts: 67
Rep Power: 8
asaha is on a distinguished road
I have run my parallel cases on both Xeon and Opteron based machines. The parallel performance with Opteron based machines are far superior to Xeon based machines with OpenMPI.
asaha is offline   Reply With Quote

Old   December 11, 2007, 05:51
Default Hi, the comparisons are as
  #7
New Member
 
Hans-Joachim Plum
Join Date: Mar 2009
Posts: 8
Rep Power: 8
hplum is on a distinguished road
Hi,

the comparisons are as follows:

I run simpleFoam 2 times; in both runs
>> everything<< coincides:

- test case, compilers, compiler flags,
run command, the cluster to run on,
mapping of processes to the
parallel nodes of the cluster

>> except <<

- I link Intel's MPI in the first, OpenMPI in
the second run

Then, the first run, just by managing the message
passing better, gives 30% better runtime (not speedup), e.g 230 s instead of 300 s wall clock time

This happens throughout all different styles of mapping, be in 1 process per node, 2, or 4.

I'm afraid I cannot disclose details of the test case as it's confidential.
hplum is offline   Reply With Quote

Old   December 11, 2007, 14:34
Default Thank you Hans. As I mentioned
  #8
Senior Member
 
Srinath Madhavan (a.k.a pUl|)
Join Date: Mar 2009
Location: Edmonton, AB, Canada
Posts: 697
Rep Power: 11
msrinath80 is on a distinguished road
Thank you Hans. As I mentioned before, if there is no improvement in the speedup, I doubt anyone will be that interested. Throw in a few more processors and you will always get better runtimes if the speedup stays close to linear. Besides, to my knowledge the intel architecture is rarely used in clusters anymore. We had a very prominent Xeon based cluster in our university, but that was like 3-4 years ago. Now everything has been changed to AMD (hypertransport technology) and/or IBM POWER (far superior memory bandwidth, generous L3 cache etc.).
msrinath80 is offline   Reply With Quote

Old   December 12, 2007, 05:47
Default Srinath, I'm sorry but I don't
  #9
Senior Member
 
Francesco Del Citto
Join Date: Mar 2009
Location: Zürich Area, Switzerland
Posts: 213
Rep Power: 9
fra76 is on a distinguished road
Srinath, I'm sorry but I don't agree with you.
If you can save 30% of computational time just by changing the MPI library, it's a huge improvement. Even if the parallel speedup doesn't change. Saving 30% of times means running 30% more simulations in the same time, and in the end it means saving money for buying and mantaining 30% more hardware and having the same performance.

About the cluster, it's not true that Intel CPUs are not used anymore, trust me!

@Hans: what if you have a propretary network (like Myrinet or Quadrics, instead of a Gigabit)?

Francesco
fra76 is offline   Reply With Quote

Old   December 12, 2007, 14:52
Default Hi, Of course it would be n
  #10
Senior Member
 
Jens Klostermann
Join Date: Mar 2009
Posts: 117
Rep Power: 8
jens_klostermann is on a distinguished road
Hi,

Of course it would be nice if somebody has already a license for Intel's mpi to get their licensed software to work with (in this case) OpenFOAM. Since OpenFOAM is under gnu license what keeps intel from supplying an interface between their proprietary mpi and OpenFOAM in source code or binary form (like the NVIDIA drivers for linux)?

One point I didn't get: The 30% walltime reduction was for ethernet interconnect only or is this also valid for infiniband interconnect? If this is only for ethernet their is also a promissing (also up to 30% walltime reduction) "free" alternative: Gamma which is already implemented/supported by OpenFOAM.

@Srinath We did some Benchmarking with Intel and Opteron systems and the Intel systems are for different cases mostly as fast or even faster.

Jens
jens_klostermann is offline   Reply With Quote

Old   December 12, 2007, 15:21
Default Francesco, I respect your opin
  #11
Senior Member
 
Srinath Madhavan (a.k.a pUl|)
Join Date: Mar 2009
Location: Edmonton, AB, Canada
Posts: 697
Rep Power: 11
msrinath80 is on a distinguished road
Francesco, I respect your opinion. My observations are merely based on experience and discussions with people who have been working on High Performance Computing (HPC) for quite some time. AMD still rules over Intel when you factor in the price and power consumption and compare the performance. IBM still rules over both AMD and Intel when it comes to processors suited for HPC. The price of IBM servers are of course exorbitant.

I'm no AMD fanboy. I still respect Intel and their processors but only when it comes to desktop use. In fact I chose Intel over AMD when I bought my Fujitsu notebook simply because Intel supports free software 3D graphics drivers. Nevertheless I will elaborate the reason for my skepticism. Without mentioning the size of the test case there really is no reason to get exited over 30% improvement. Intel is famous for posting benchmarks of commercial CFD codes (e.g Fluent) and claiming superiority over AMD. However, their benchmarks are based on relatively small test cases. Increase the size of the problem and Intel struggles to match the performance that AMD can deliver (thanks to its hypertransport technology which reduces the FSB bottleneck by providing separate path to memory and and all other PC components through the motherboard chipset). This is also why Intel processors have higher L2 cache in an attempt to offset the loss in performance that comes from having to use the Front Side Bus (which manages both both memory and I/O communications) every time. I chose to believe neither Intel nor AMD when it came to benchmarking. I did all the tests myself (some which I could get around to summarizing, I posted in this form). In majority of the parallel tests I saw that AMD gave much better results. Let me see Intel give me a better speedup at reasonable price and I will gladly recommend it.

As regards to Intel-MPI I will admit outright that I am biased towards open solutions. The very fact that Intel releases its very own compiler and MPI libraries is clear evidence that its processors have some performance pathways that are not documented so that gcc and other free alternatives cannot exploit it. In other words, Intel (like any other company including AMD) wants to get additional revenue by promoting use of its compilers etc. Nothing surprising there, eh. I'm sure there are folks who love to get the best of both worlds (free and commercial) as long as it benefits them. But then again this is the consequence of practical choices, isn't it. I choose open solutions not just because they are free but also because they promote growth and productivity more than commercial alternatives
msrinath80 is offline   Reply With Quote

Old   December 13, 2007, 08:19
Default All, interesting discussion
  #12
New Member
 
Hans-Joachim Plum
Join Date: Mar 2009
Posts: 8
Rep Power: 8
hplum is on a distinguished road
All,

interesting discussion. In terms of judging the 30%, I agree with Francesco del Citto - what counts is the run time and #simulations per time you can run - speedup is a largely over estimated (and also abused) measure. I can easily
make an application 2x slower but improve its parallel speedup ....

I also understand scepticism - let me ensure that measurements were done with significant test cases, but sorry no more details possible

As to the interconnect used for those 30%: it was an Infiniband ...
hplum is offline   Reply With Quote

Old   December 13, 2007, 08:41
Default Hans, thank you for this st
  #13
caw
Member
 
Christian Winkler
Join Date: Mar 2009
Location: Mannheim, Germany
Posts: 63
Rep Power: 8
caw is on a distinguished road
Hans,

thank you for this statement. IMHO run time on a given number of CPUs is the only measure that counts. It goes with speedup hand in hand anyway.

Talking about Intel-MPI: I suppose you will have difficulties introducing this into an open source community...
But what about a contibution, lets say a "Open-Foam-Intel-MPI-Special-Edition"....that would be nice, right? ;-))

Kind regards
Christian
caw is offline   Reply With Quote

Old   December 13, 2007, 09:45
Default Christian, thanks for the c
  #14
New Member
 
Hans-Joachim Plum
Join Date: Mar 2009
Posts: 8
Rep Power: 8
hplum is on a distinguished road
Christian,

thanks for the comment.
Well, it could be like OpenFOAM's installs
provide for a branch linking other MPI-s -
if a client has this library, it works, if not, it doesn't, so he continues to use OpenMPI - actually pretty easy. Due to the nice encapsulation in the libPstream.so this could keep 99.9% of the code / compilation untouched.
hplum is offline   Reply With Quote

Old   December 13, 2007, 11:30
Default That's true. I compiled OpenFO
  #15
Senior Member
 
Francesco Del Citto
Join Date: Mar 2009
Location: Zürich Area, Switzerland
Posts: 213
Rep Power: 9
fra76 is on a distinguished road
That's true. I compiled OpenFOAM on a propetary network using their MPI library. It's really very easy!
And the performance improvement was fantastic, in my case, especially on small cells/cpu number.
fra76 is offline   Reply With Quote

Old   December 16, 2007, 15:51
Default Just a small suggestion. Th
  #16
Senior Member
 
Alberto Passalacqua
Join Date: Mar 2009
Location: Ames, Iowa, United States
Posts: 1,894
Rep Power: 26
alberto will become famous soon enoughalberto will become famous soon enough
Just a small suggestion.

There are various simple test cases in the literature, with all details needed to reproduce them. For example, a simple but computationally intensive test case is a direct numerical simulation in a channel flow, with predetermined flow conditions and solver settings.

This kind of test case is easily scalable, adaptable to high computational resources, and not covered by secrecy agreements.

This would allow Intel to make results public, with detailed information and specific hints on how to get them, increasing its credibility.

With kind regards,
Alberto
__________________
Alberto Passalacqua

GeekoCFD - A free distribution based on openSUSE 64 bit with CFD tools, including OpenFOAM. Available as live DVD/USB, hard drive image and virtual image.
OpenQBMM - An open-source implementation of quadrature-based moment methods
alberto is offline   Reply With Quote

Old   December 16, 2007, 15:58
Default Just some link of test cases:
  #17
Senior Member
 
Alberto Passalacqua
Join Date: Mar 2009
Location: Ames, Iowa, United States
Posts: 1,894
Rep Power: 26
alberto will become famous soon enoughalberto will become famous soon enough
Just some link of test cases:

- Ercoftac database: http://cfd.mace.manchester.ac.uk/cgi-bin/cfddb/ezdb.cgi?ercdb+search+retrieve+&& &*%%%%dm=Line

- iCFD database (cases with detailed results too) http://cfd.cineca.it/cfd
__________________
Alberto Passalacqua

GeekoCFD - A free distribution based on openSUSE 64 bit with CFD tools, including OpenFOAM. Available as live DVD/USB, hard drive image and virtual image.
OpenQBMM - An open-source implementation of quadrature-based moment methods
alberto is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Converting OpenFOAM to VHDL Performance Testing bclodfe2uiucedu OpenFOAM 15 March 23, 2010 08:54
Quadcores does Intelbs performance justify itbs cost kar OpenFOAM 0 October 4, 2008 14:13
FSI tools Benjamon Main CFD Forum 0 June 2, 2008 06:40
CFD tools Dmitri Main CFD Forum 1 June 7, 2006 06:01


All times are GMT -4. The time now is 14:44.