CFD Online Discussion Forums

CFD Online Discussion Forums (http://www.cfd-online.com/Forums/)
-   CFX (http://www.cfd-online.com/Forums/cfx/)
-   -   Parallel speed up (http://www.cfd-online.com/Forums/cfx/18992-parallel-speed-up.html)

Soren May 21, 2002 06:38

Parallel speed up
 
Hi,

Does anyone have experiance with dual processor computer and CFX-5.5 under linux ?

What kind of speed-up is common compared to singel processor ?

Thanks a lot.

Regards Soren


Holidays May 21, 2002 11:01

Re: Parallel speed up
 
I seem to remember that you can obtain a fairly linar relationship assuming you solve a large problem to dissolve the effects of the partitioning (I saw a CFX presentation), but contact your vendor since CFX is very likely to have done the comparison.

Neale May 21, 2002 12:02

Re: Parallel speed up
 
CFX-5.5 gets speedups of 1.6-1.8, depending on the problem size, in Linux. This is better on high end workstations, where 1.9-2.1 are typical. The memory and cache architectures on Intel/AMD Linux boxes are just not good enough to get comparable speedups.

Neale


Soren May 22, 2002 02:14

Re: Parallel speed up
 
Hi

Thanks for the reply.

I know that under Windows NT/2k/XP the parallel performance of a dual processor computer is very bad.

The speed up is about 1.1 to 1.2.

Thats why I am looking at Linux.

Any comment ?

Regards

Soren


Astrid May 22, 2002 03:41

Re: Parallel speed up
 
Using CFX 5.5 on a Pentium-IV with WinNT, we obtained a speed-up of about 1.8-2.0. But, we have only tested it up to 4 PC's.

Astrid

Soren May 22, 2002 04:43

Re: Parallel speed up
 
Hi Astrid

It the computer singel or dual processor ?

Regards

Soren

cfd guy May 22, 2002 07:44

Re: Parallel speed up
 
I use TASCflow and CFX-5.5 in a Dual PIII PC. I've noted a speed-up about 1.4 - 1.6 in CFX-5 and 1.6 - 1.8 in TASCflow, depending the problem size. I only ran local parallel with two partitions.
cfd guy

Neale May 22, 2002 15:46

Re: Parallel speed up
 
Linux seems to generally do a better job at dynamic process managment (i.e., multitasking) so you see slightly better speedups there usually. I've typically seen on the order of 1.4-1.6 on NT, and 1.6-1.8 on Linux for CFX-5.5.

Neale.


Neale May 22, 2002 15:49

Re: Parallel speed up
 
Astrid,

Do you mean you ran a 4 process job on 4 PCs and only got 1.8 -> 2.0 speedup??? What problem size were you running? For a 4 process job you would need at least 400,000->600,000 elements to see a decent speedup.

Neale

Jens May 23, 2002 02:11

Re: Parallel speed up
 
Hi

I am curious about these speed-up. I am running indoor and HVAC problems with mesh size from 400k-2.000k on a Win NT box with dual P4 processor.

The speed up I am getting is below 1.2.

Are you appling something special ?

Thanks

Regards

Jens

Robin May 23, 2002 11:22

Re: Parallel speed up
 
Hi Jens,

How much RAM usage do you have. For a 2 million node problem, I'd be surprised if you were not running into swap space. In this case, you will see the best speedup if you run it on multiple systems, at least enough to get it all in RAM and out of swap.

Robin

Jens May 23, 2002 13:43

Re: Parallel speed up
 
Hi

I have benchmark using a HVAC problem with 600.000 cell. The speed up was 1.15 on a dual P4 with 1.2 Gb Ram.

Any hints ?

Regards

Jens

Neale May 24, 2002 10:05

Re: Parallel speed up
 
How were you calculating the speedup? You should use the CFD start and finish times in the output file.

600,000 cells means roughly 120,000 nodes (for a tet grid I assume), which should only take about 180MB-200MB for uvwp-k-eps. So, swapping probably isn't an issue.

Make sure you do your performance measurements on a "clean" machine. i.e., you aren't running anything else or doing anything else other than the CFD calculation.

Neale.

cfd guy May 27, 2002 11:56

Architetures Benchmark
 
Hi Jens,
As this discussion is very interesting, I'd like to propose you the following benchmark. It would be very interesting that users could share their speedups information. I've built a very simple case (rectangular channel) with approximately 960K cells (hybrid mesh with inflation). I've performed this definition file in a SUN Workstation running on Solaris 8 with 4 processors. Here's some data about this case:
3D, Turbulent (k-eps), Incompressible(Air) and Steady State flow. Number of Cells: Almost 948,000
Run - - - - - - Speedup Serial -----> 1. 2 proc. -----> 2.08 3 proc. -----> 3.03 4 proc. -----> 4.02


Why don't you test in your NT machine? I could send you the journal file so that you could easily obtain this definition file. If anyone else wants the journal file, please feel free to mail me.


PS1.: Make sure you're not running any other applications in your machine. PS2.: Rebuilding the journal file in my NT machine the resulting mesh has 947,916 elements. However when rebuilding it in my UNIX system, the resulting file has 948,161 elements. I believe that will be no problem at all for benchmarking purposes. PS3: I think it's the simplest case you could ever imagine. It's a simple geometry with no geometric bad angles and no grid interfaces (monoblock). I believe that the speedup also depends on some geometric information.

Kind regards, cfd guy

cfd guy May 28, 2002 13:02

Re: Architetures Benchmark
 
About my previous post. I've tested two coarser grids in comparison with the 1st one. Here the results:
Grid 2: 468.500 elements. SpeedUp: (2 processes) = 1.92 SpeedUp: (3 processes) = 2.74 SpeedUp: (4 processes) = 3.41
Grid 3: 109.400 elements. SpeedUp: (2 processes) = 1.49 SpeedUp: (3 processes) = 2.15 SpeedUp: (4 processes) = 2.80
I'm not trying to find out the optimal mesh size for this problem, but it seems that in this case, the minimum number of elements for each processor must be greater than 200k cells to obtain a linear relation between the speedup and the number of processes.
cfd guy

Astrid May 30, 2002 16:38

Re: Parallel speed up
 
Soren,

We used 4 distributed parallel PC's in a 100bt network.

Astrid

Astrid May 30, 2002 16:44

Re: Parallel speed up
 
Sorry, I was wrong. I didn't mean to confuse you.

We ran a job with 1.5M elements on 1 standalone PC and on 2 distributed parallel PC's. Then, speed up was 1.8-2.0. With 4 PC's, speed up was about 3.6.

Astrid

Robin May 30, 2002 16:47

Re: Architetures Benchmark
 
cfd guy,

The number of nodes is more relevant to parallel efficiency. Can you post the number of nodes in your mesh rather than elements?

Typically, the best efficiency is achieved when the number of nodes per partition is greater than 100k. At less than 20k per partition the trend may reverse, taking longer with added partitions (due to increased communication).

Robin

Neale May 31, 2002 12:26

Re: Architetures Benchmark
 
Actually, it's ok to quote by elements as well. They are related anyways (roughly 1:1 for hex grids, and 5-6:1 for tet/hybrid grids). In fact the assembly really scales with the number of elements anyways as the CFX-5 solver uses an element based assembly.

I'm not suprised by the results though, as 50,000 vertices per partition translates into 200k elements on a tet/hybrid grid. This is what we see in parallel results as well.

Neale.



All times are GMT -4. The time now is 11:15.