CFD Online Discussion Forums

CFD Online Discussion Forums (
-   Main CFD Forum (
-   -   p3 vs p4 for CFD: winner p3 (

Clif Upton August 8, 2002 17:58

p3 vs p4 for CFD: winner p3
we've been benchmarking our new p4-2.26 GHz, 1 Gb DDRAM vs an older laptop with p3-800 MHz, 512 Mb SDRAM and vs a p4 1.4 gHz, 512 Mb RDRAM. All computers were by Dell.

We designed a suite of CFD problems of different size, with the largest requiring slightly more than 512 mb so that the p3 and p4-1.5 had to page a little.

The results are pretty amazing. If for smaller problem the CPUs scaled roughly proportional to the processor speed, e.g. p3-800 was ~2.6 times slower than p4-2.26, as the problem size got larger p3 became 2 times(!) faster than p4-1.5 and almost 50% faster than p4-2.26.

We tried to explain the performance of p4-1.5 by the smaller cache (256 vs 512 on the other computers), but what would explain a computer with a 3x slower processor and slower memory beating the brand new p4 is beyond comprehension.

Any ideas of experience?

andy August 9, 2002 05:37

Re: p3 vs p4 for CFD: winner p3
Paging for a CFD simulation! I know the efficiency of the Intel CISC chips is dreadful (i.e. the number of effective operations you get per clock tick for real problems) but surely this a bit extreme.

* In cache benchmark performance although widely discussed and important for shifting hardware is almost wholly irrelevant for real CFD predictions in my experience.

* What has always mattered for big CFD predictions since microprocessors replaced "proper" processors is getting the values from memory to the microprocessor. This involves the performance of the main board and support chips which generally does not seem to get much attention. (I know effectively nothing about this for your hardware).

If my recollection is not letting me down (don't trust it), I think the PIII collects signficantly more values from memory each visit than the P4 and that the product of number of values * memory speed is actually in favour of the PIII compared to the P4. Hence, if most of the extra values collected from memory are used then the PIII should go faster than the P4 when you are running a large job and the processors are mainly idling waiting for values from memory.

What is actually happening is probably slightly more involved that the above but it may well be at the heart of what is being observed.

steve August 9, 2002 09:44

Re: p3 vs p4 for CFD: winner p3
We have some 1Ghz P3's and various speeds of P4 Xeon's and everything we have run shows that the P4's give speeds that are even better than the clock rate difference (ie a 2ghz P4 is more than 2x times the P3). I think that the extra cache (Xeon) makes alot of the difference, at least for our code.

Clif Upton August 9, 2002 10:16

Re: p3 vs p4 for CFD: winner p3
Steve - how much cache do your Xeons have? Their standard setup has the same 512 Kb as any p4 faster than 2 GHz would. thx

Triton August 9, 2002 12:55

Re: p3 vs p4 for CFD: winner p3
You may provide more details of your benchmark project, such as array size, compiler, compiler optimization options. However, based on your description, It could be due the different cache size.

Clif Upton August 9, 2002 15:51

Re: p3 vs p4 for CFD: winner p3
We ran problems from 140,000 grid cells up to 1.2M cells. The performance was not only grid, but also the problem structure dependent, as expected. For 140,000 problems the p4-2.26 was indeed nearly 3x faster than the p3-800. However at 1.2M it was 50% slower.

Cache in both the p3 and p4 is the same: 512 Kb. On Intel's advice we have just recompiled the solver with their latest v6 compiler: it did help: now p4-2.26 is only 5-10% slower...

Triton August 10, 2002 10:45

Re: p3 vs p4 for CFD: winner p3
If you're using intel's compiler, you have chance to exploit the full horse power of p3 and p4.

For P3, try the following compiler options: ifc -O3 -xK -tpp6 -o executable_file source.f90

For P4, use ifc -O3 -xW -tpp7 -o executable_file source.f90

steve August 11, 2002 11:42

Re: p3 vs p4 for CFD: winner p3
Clif, I looked again at the machines that we ran a set of tests on and was surprised to find that the 1Ghz P3 and 1.7 Ghz P4 both had 256Kb of cache. I had thought the P4 had more, but was mistaken. The 1.7Ghz P4 was always significantly faster (esp on a big problem) than the P3. For what its worth, all our code is compiled with Absoft fortran rather than Intel's compiler. Steve

Clif Upton August 12, 2002 11:02

Re: p3 vs p4 for CFD: winner p3
We are using Intel compiler v 5. We have just upgraded to v 6 last friday to see if that would help. It did: now p4 2.26 runs not 50% slower than p3 800, but about the same...

We are talking to Intel to see if there is something that we missed. In all previous transitions from p to p2 from p2 to p3 there has never been anything like that. Processors simply scaled with their frequency.

Thank you for your input.

Jonas Larsson August 13, 2002 02:34

Re: p3 vs p4 for CFD: winner p3
This is not in line with our observations - we have tested both commercial codes and in-house codes (compiled both with AbSoft Fortran and Intel's new Fortran compiler). The P4 runs all our CFD codes very well and often runs faster per MHz than our P3s. Note that the memory is very critical for the P4. PC800 RDRAM is still best. If you have PC600 RDRAM it will affect performance significantly.

Clif Upton August 13, 2002 15:32

Re: p3 vs p4 for CFD: winner p3
Jonas - we were using Intel 5 and now trying Intel 6. the latter helped a lot by now, except that we have been unable to compile the software for parallel computation (the same code was compiled fine in v5). Here is a trivial sample code for which Intel 6 reports an error:


program test

integer N, fi

real eps, dt

fi = 100

open( fi, file = 'input.dat' )

read( fi, '(i8)' ) N

read( fi, '(e13.5)' ) dt

read( fi, '(e13.5)' ) eps

close( fi )

end !----------------------------------------------------------------------!


100 ! N 1.00000E-02 ! dt 1.00000E-05 ! eps

----- make file to compile it:

@echo off ifl -ogood test.for ifl -obad -Qopenmp -Qfpp2 test.for

All times are GMT -4. The time now is 16:31.