CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > Siemens

benchmark results

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   August 24, 2001, 03:55
Default benchmark results
  #1
stefan
Guest
 
Posts: n/a
These are the results of a benchmark I did with STAR v3.15. I used a simple case with approx. 500000 cells (see below).

Machines:

SGI Origin 2000, 8 * R10000, 195 MHz

Linux PC with P3, 1 GHz

Linux PC Cluster, 8 * P4, 1.7 Ghz, 100 MBit Network

Results for serial run (times as reported by STAR as "ELAPSED TIME" in the .run file):

R10000: 24473 s

P3: 16638 s

P4: 4841 s

This means that (for this case) the P4 is 5 times faster than the R10000 and 3.4 times faster than the P3!

Results for HPC:

R10000:

CPUs----time----Speedup to serial

2------13926----1.76

4-------6887----3.55

8-------3009----8.13

P4:

CPUs----time----Speedup to serial

2-------2504----1.93

4-------1332----3.63

6-------1034----4.68

8--------901----5.37 (optimized metis decomposition failed, used basic)

For the cluster, the problem seems to be too small to get an adequate speedup with more than 4 CPUs. This should be better for a problem with more cells and equations.

As it would be interesting to compare these results with other machines, you are invited to run the benchmark yourself and post the results here.

The case I used can easily be set up using the following commands (delete the blank lines):

*set ncel 32

*set nvrt ncel + 1

vc3d 0 1 ncel 0 1 ncel 0 1 ncel

*set coff ncel * ncel * ncel

*set voff nvrt * nvrt * nvrt

cset news cran 0 * coff + 1 1 * coff$cgen 3 voff cset,,,vgen 1 1 0 0

cset news cran 2 * coff + 1 3 * coff$cgen 3 voff cset,,,vgen 1 0 0 1

cset news cran 4 * coff + 1 5 * coff$cgen 3 voff cset,,,vgen 1 0 1 0

cset news cran 6 * coff + 1 7 * coff$cgen 3 voff cset,,,vgen 1 1 0 0

cset news cran 8 * coff + 1 9 * coff$cgen 3 voff cset,,,vgen 1 0 0 -1

cset news cran 10 * coff + 1 11 * coff$cgen 3 voff cset,,,vgen 1 0 -1 0

cset news cran 12 * coff + 1 13 * coff$cgen 3 voff cset,,,vgen 1 1 0 0

vmer all

n

vcom all

y

cset all

axis z

view -1 -1 1

plty ehid

cplo

live surf

cset news gran,,0.01

cplo

bzon 1 all

cset news gran 6.99

cplo

bzon 2 all

cset news shell

cdele cset

ccom all

y

dens const 1000

lvis const 1e-3

rdef 1 inlet

0.025

rdef 2 outlet

split 1

iter 10000

geom,,1e-2

prob,,

The run converges in 383 (or 384) iterations (using single precision).

(I had some difficulties to post this message in a readable form. The tables and commands were concatenated to single strings, blanks and tabs werde deleted. So I had to add the "-"s and the blank lines. How can this be avoided?)
  Reply With Quote

Old   August 24, 2001, 09:33
Default Re: benchmark results
  #2
Jonas Larsson
Guest
 
Posts: n/a
Very interesting results, thanks for sharing them with us!

Anbout how to post preformatted text - take a look at question 3 here.

  Reply With Quote

Old   August 31, 2001, 07:25
Default Re: benchmark results
  #3
Stephan Meyer
Guest
 
Posts: n/a
Here are some results obtained on an IBM RS/6000 SP-SMP with POWER3-II 375 MHz CPUs. All computations have been done on 8-way Nighthawk II nodes.

Serial run: 9400 elapsed time, 8753 CPU time

Values and comparisons below are elapsed times.

CPUs---time----Speedup to serial

2------4190-----2.24

4------1924-----4.88

8-------930----10.11
  Reply With Quote

Old   September 10, 2001, 10:48
Default Re: benchmark results
  #4
steve
Guest
 
Posts: n/a
Your results are quite interesting and I am glad you are sharing them. I think you drew one possibly invalid conclusion concerning scalability, although, as you say, running a much larger job might help. The P4 cluster is using 100mb ethernet as its message passing media. Its just not good enough for 8 machines of that speed. When running parallel, everything has to be balanced or the slowest component will be the choke point. In this case its the ethernet. There are 2 relatively cheap things you can do to improve the performance (assuming you have not done these already):

1)put a 2nd ethernet card on each machine and dedicate that to MPI while the first does general network activity (nfs, ftp, X...)

2)Connect your cards to an etherswitch rather than a hub.

If this does not work, then its likely that you need a better (and far more expensive) message passing media to get good scaling.

Steve
  Reply With Quote

Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Another Benchmark andras OpenFOAM 2 July 10, 2008 09:35
Benchmark Ford Prefect Main CFD Forum 6 July 3, 2006 11:29
CFX cylinder or sphere benchmark results Mel CFX 1 August 8, 2005 19:47
benchmark ernie FLUENT 0 July 9, 2005 03:17
new P4 vs. AMD and ifc vs. g77 benchmark George Bergantz Main CFD Forum 3 July 9, 2002 04:04


All times are GMT -4. The time now is 04:38.