CFD Online Discussion Forums

CFD Online Discussion Forums (http://www.cfd-online.com/Forums/)
-   CD-adapco (http://www.cfd-online.com/Forums/cd-adapco/)
-   -   benchmark results (http://www.cfd-online.com/Forums/cd-adapco/52654-benchmark-results.html)

stefan August 24, 2001 02:55

benchmark results
 
These are the results of a benchmark I did with STAR v3.15. I used a simple case with approx. 500000 cells (see below).

Machines:

SGI Origin 2000, 8 * R10000, 195 MHz

Linux PC with P3, 1 GHz

Linux PC Cluster, 8 * P4, 1.7 Ghz, 100 MBit Network

Results for serial run (times as reported by STAR as "ELAPSED TIME" in the .run file):

R10000: 24473 s

P3: 16638 s

P4: 4841 s

This means that (for this case) the P4 is 5 times faster than the R10000 and 3.4 times faster than the P3!

Results for HPC:

R10000:

CPUs----time----Speedup to serial

2------13926----1.76

4-------6887----3.55

8-------3009----8.13

P4:

CPUs----time----Speedup to serial

2-------2504----1.93

4-------1332----3.63

6-------1034----4.68

8--------901----5.37 (optimized metis decomposition failed, used basic)

For the cluster, the problem seems to be too small to get an adequate speedup with more than 4 CPUs. This should be better for a problem with more cells and equations.

As it would be interesting to compare these results with other machines, you are invited to run the benchmark yourself and post the results here.

The case I used can easily be set up using the following commands (delete the blank lines):

*set ncel 32

*set nvrt ncel + 1

vc3d 0 1 ncel 0 1 ncel 0 1 ncel

*set coff ncel * ncel * ncel

*set voff nvrt * nvrt * nvrt

cset news cran 0 * coff + 1 1 * coff$cgen 3 voff cset,,,vgen 1 1 0 0

cset news cran 2 * coff + 1 3 * coff$cgen 3 voff cset,,,vgen 1 0 0 1

cset news cran 4 * coff + 1 5 * coff$cgen 3 voff cset,,,vgen 1 0 1 0

cset news cran 6 * coff + 1 7 * coff$cgen 3 voff cset,,,vgen 1 1 0 0

cset news cran 8 * coff + 1 9 * coff$cgen 3 voff cset,,,vgen 1 0 0 -1

cset news cran 10 * coff + 1 11 * coff$cgen 3 voff cset,,,vgen 1 0 -1 0

cset news cran 12 * coff + 1 13 * coff$cgen 3 voff cset,,,vgen 1 1 0 0

vmer all

n

vcom all

y

cset all

axis z

view -1 -1 1

plty ehid

cplo

live surf

cset news gran,,0.01

cplo

bzon 1 all

cset news gran 6.99

cplo

bzon 2 all

cset news shell

cdele cset

ccom all

y

dens const 1000

lvis const 1e-3

rdef 1 inlet

0.025

rdef 2 outlet

split 1

iter 10000

geom,,1e-2

prob,,

The run converges in 383 (or 384) iterations (using single precision).

(I had some difficulties to post this message in a readable form. The tables and commands were concatenated to single strings, blanks and tabs werde deleted. So I had to add the "-"s and the blank lines. How can this be avoided?)

Jonas Larsson August 24, 2001 08:33

Re: benchmark results
 
Very interesting results, thanks for sharing them with us!

Anbout how to post preformatted text - take a look at question 3 here.


Stephan Meyer August 31, 2001 06:25

Re: benchmark results
 
Here are some results obtained on an IBM RS/6000 SP-SMP with POWER3-II 375 MHz CPUs. All computations have been done on 8-way Nighthawk II nodes.

Serial run: 9400 elapsed time, 8753 CPU time

Values and comparisons below are elapsed times.

CPUs---time----Speedup to serial

2------4190-----2.24

4------1924-----4.88

8-------930----10.11

steve September 10, 2001 09:48

Re: benchmark results
 
Your results are quite interesting and I am glad you are sharing them. I think you drew one possibly invalid conclusion concerning scalability, although, as you say, running a much larger job might help. The P4 cluster is using 100mb ethernet as its message passing media. Its just not good enough for 8 machines of that speed. When running parallel, everything has to be balanced or the slowest component will be the choke point. In this case its the ethernet. There are 2 relatively cheap things you can do to improve the performance (assuming you have not done these already):

1)put a 2nd ethernet card on each machine and dedicate that to MPI while the first does general network activity (nfs, ftp, X...)

2)Connect your cards to an etherswitch rather than a hub.

If this does not work, then its likely that you need a better (and far more expensive) message passing media to get good scaling.

Steve


All times are GMT -4. The time now is 11:23.