CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   CFX (https://www.cfd-online.com/Forums/cfx/)
-   -   Simulation speed increase (https://www.cfd-online.com/Forums/cfx/80559-simulation-speed-increase.html)

Attesz September 29, 2010 10:50

Simulation speed increase
 
Hi all,

I'm interesting in the effect of the Memory Allocation Factor in Solver Manager, Define Run Advanced Tabs. When I set higher values it speeds up the simulation?

My other question is about the number of cores. I'm using a HP Workstation machine with 4 core INTEL Xeon and 16GBs of RAM. My simulation is a steady stade turbomachninery case with 2.5millions of elements. I experienced that when I use 2 cores it is faster than using 4 cores. Is there a fist-rule to select the cores correctly, or I should do benchmarks in every cases?

Thanks in advance,
Attila

TX_Air September 29, 2010 13:31

Attila,

What Xeon processor are you using? I use Xeon 5650. It is a 6 core with 24GB RAM. I have notice the same on my machine. There is no marginal speedup from 4 to 6 core. That has been the case for most of the simulation ranging from 800k cell to 2 million.

Most of the time speedup or speed is bottlenecked by BUS speed. So even if you have good RAM and Processor, adding extra core does not help just because BUS is not capable of handling I/O.

joey2007 September 29, 2010 15:43

1. Speed up by memory allocation. Service told me some time ago that it may introduce a slighty better performance.

2. Much more impact: the multiple core issue. The cores share the band width to the memory. Depending on the machine this may limit the scalabilty. Possibly it is much faster to use two cores on two machines than four cores on one machine

joey2007 September 29, 2010 15:45

upps. Point two was already answered. I am sorry.

Attesz September 30, 2010 05:25

Quote:

Originally Posted by TX_Air (Post 277154)
Attila,

What Xeon processor are you using? I use Xeon 5650. It is a 6 core with 24GB RAM. I have notice the same on my machine. There is no marginal speedup from 4 to 6 core. That has been the case for most of the simulation ranging from 800k cell to 2 million.

Most of the time speedup or speed is bottlenecked by BUS speed. So even if you have good RAM and Processor, adding extra core does not help just because BUS is not capable of handling I/O.

Hello,
Im using X5450 3GHz 4 cores. Thank's for your reply!

Attesz September 30, 2010 05:26

Quote:

Originally Posted by joey2007 (Post 277166)
1. Speed up by memory allocation. Service told me some time ago that it may introduce a slighty better performance.

2. Much more impact: the multiple core issue. The cores share the band width to the memory. Depending on the machine this may limit the scalabilty. Possibly it is much faster to use two cores on two machines than four cores on one machine

Thanks Joey!

CFD in my blood October 7, 2010 06:51

Quote:

Originally Posted by TX_Air (Post 277154)
Attila,

What Xeon processor are you using? I use Xeon 5650. It is a 6 core with 24GB RAM. I have notice the same on my machine. There is no marginal speedup from 4 to 6 core. That has been the case for most of the simulation ranging from 800k cell to 2 million.

Most of the time speedup or speed is bottlenecked by BUS speed. So even if you have good RAM and Processor, adding extra core does not help just because BUS is not capable of handling I/O.

Hi..

What you are saying is true. But in the event that bus speed is not your limiting factor, there is also another dimension to it. I use an 8 core, 32GB RAM machine. I use 6 cores for my solver and not all 8. Primarily, the CPU needs some "free" cores to help in writing the result files into the hard drive. If you allocate all the cores for your solver, the solver will temporarily need to stop every time it writes a result file or updates monitoring points data into a file. This slows down the solver.

Also, I have found that in parallelization, always use even number of cores. For some odd reason (which I am not sure why) even number of cores work faster than odd number of cores. For example a solver working on 6 cores in an 8 core machine, works faster than if I use 5 cores or even 7 cores for that matter.

Meanwhile, using 6 cores on an 8 core machine is faster than using 2 or 4 cores. So as a rule of thumb I tend to use the highest possible 'even' number of cores which is lower than the maximum cores present in the system.

I hope this helps...

Good Luck..

Attesz October 7, 2010 07:08

--------------------

CFD in my blood October 7, 2010 07:36

Hi Attila,

I understand what you mean when you run multiple processes and your data is definitely helpful. Thanks! When I have a mesh which is less than 1-2 million, and I have multiple cases to run, I tend to use 2 cores for each run. This way I use a total of 6 cores for running 3 different load cases. This works best provided you have enough RAM and hard disk space.

However, I did not understand quite clearly your other data. You mentioned it took more time (almost double) for every iteration as you doubled the number of cores. For example when you went from 1 core to 4 cores, it took 5.88 min in 1 core and 18 min in 4 cores for just 1 iteration?!?! Is the data correct or is it because you have moving mesh. Because, I think that moving mesh and parallelization don't gel very well.

Attesz October 7, 2010 08:20

Quote:

Originally Posted by CFD in my blood (Post 278258)
However, I did not understand quite clearly your other data. You mentioned it took more time (almost double) for every iteration as you doubled the number of cores. For example when you went from 1 core to 4 cores, it took 5.88 min in 1 core and 18 min in 4 cores for just 1 iteration?!?! Is the data correct or is it because you have moving mesh. Because, I think that moving mesh and parallelization don't gel very well.

Yes, on 4 cores it takes about 3 times more to solve only one iteration, it is correct. My machine is a workstation, but not only for computation, so i'm working with WindowsXP. The simulation doesn't have moving mesh, just a simple rotating frame. I think that the slow-down caused by the slowing effect of the communication between the processors (just guessing, i'm not an IT expert). Here we have a linux cluster also, and the phenomenon is the same: when i use 4 cores for 1 process instead of 2 cores, the simulation time slightly increases.
On your machine, your simulation accelerates if you use more CPU's when running with 1-2 millions of elements?

Regards,
Attesz

CFD in my blood October 7, 2010 08:35

Yes. In my experience, when I use 1-2 million elements, the simulation runs faster if I use multiple cores as opposed to a single core. This is when all the cores belong to the same CPU. However, the slow communication problem occurs when I use multiple CPUs in a cluster connected by LAN cables. So the communication between different CPUs becomes the limiting factor. I have not yet witnessed slow communication problem when I use mulitple cores in the same CPU.

Attesz October 7, 2010 08:40

Hmm, that's interesting. The communication slowdown using LAN cables is comprehensible, of course.
Maybe in my case it's caused by the rotating frame really. Anyway, it was useful to discuss this.

Regards,
Attesz

FoxTwo October 7, 2010 08:50

just to mention, if you compare the cpu time between two seperated iterations, you have to divide the time by the used cores! to compare the speedup/time saving it is better to run a simulation with a defined number of iteration and compare the duration of the hole simulation!

Attesz October 7, 2010 08:56

Alexander,
thanks for the notice, I did almost the same. The numbers what I've shared before means 1 iteration time need computed from the 99.th and 100.th iterations of a single process.

FoxTwo October 7, 2010 09:04

well, then i donīt understand why the simulation with two cores is slower than when you use only one on a 4 core-machine? in my experience, a parallel run (2 cpus/cores) is always faster!

Attesz October 7, 2010 09:06

I don't understand it as well :)

FoxTwo October 7, 2010 09:10

can you plz write down here the header with the cpu-time (CPU SECONDS = ...) of the two iterations which are you comparing? and that for all core-cases you have done yet? i have to see it with my own eyes.

Attesz October 7, 2010 09:19

1 procesess on 4 cores


================================================== ====================
OUTER LOOP ITERATION = 762 ( 99) CPU SECONDS = 2.366E+05 (1.085E+05)
----------------------------------------------------------------------
| Equation | Rate | RMS Res | Max Res | Linear Solution |
+----------------------+------+---------+---------+------------------+
| U-Mom | 1.36 | 8.0E-04 | 5.1E-01 | 1.2E-02 OK|
| V-Mom | 1.19 | 2.6E-04 | 1.6E-01 | 1.5E-02 OK|
| W-Mom | 1.26 | 2.1E-04 | 1.4E-01 | 1.7E-02 OK|
| P-Mass | 1.70 | 1.4E-05 | 1.0E-02 | 9.1 3.5E-02 OK|
+----------------------+------+---------+---------+------------------+
| H-Energy | 0.74 | 1.2E-04 | 6.0E-02 | 6.1 5.6E-02 OK|
+----------------------+------+---------+---------+------------------+
| K-TurbKE | 1.13 | 2.6E-04 | 1.6E-01 | 6.0 1.8E-02 OK|
| O-TurbFreq | 1.22 | 2.2E-05 | 7.9E-03 | 12.5 4.6E-05 OK|
+----------------------+------+---------+---------+------------------+
================================================== ====================
OUTER LOOP ITERATION = 763 ( 100) CPU SECONDS = 2.377E+05 (1.096E+05)
----------------------------------------------------------------------
| Equation | Rate | RMS Res | Max Res | Linear Solution |
+----------------------+------+---------+---------+------------------+
| U-Mom | 0.70 | 5.6E-04 | 4.5E-01 | 2.0E-02 OK|
| V-Mom | 0.83 | 2.2E-04 | 1.3E-01 | 1.6E-02 OK|
| W-Mom | 0.84 | 1.7E-04 | 9.7E-02 | 1.8E-02 OK|
| P-Mass | 0.60 | 8.6E-06 | 5.3E-03 | 9.1 3.6E-02 OK|
+----------------------+------+---------+---------+------------------+
| H-Energy | 1.30 | 1.5E-04 | 8.6E-02 | 6.1 5.5E-02 OK|
+----------------------+------+---------+---------+------------------+
| K-TurbKE | 0.83 | 2.2E-04 | 1.2E-01 | 6.0 1.9E-02 OK|
| O-TurbFreq | 0.84 | 1.9E-05 | 7.4E-03 | 12.5 4.7E-05 OK|
+----------------------+------+---------+---------+------------------+


one process on 1 core (i've started it now, so it will be much faster)

================================================== ====================
OUTER LOOP ITERATION = 930 ( 1) CPU SECONDS = 3.780E+05 (4.566E+01)
----------------------------------------------------------------------
| Equation | Rate | RMS Res | Max Res | Linear Solution |
+----------------------+------+---------+---------+------------------+
| U-Mom | 7.31 | 3.6E-03 | 3.8E-01 | 1.5E-01 ok|
| V-Mom |14.93 | 3.6E-03 | 1.3E-01 | 1.3E-01 ok|
| W-Mom |99.99 | 2.2E-02 | 1.3E+00 | 1.5E-02 OK|
| P-Mass |99.99 | 1.6E-03 | 1.2E-01 | 4.9 5.8E-02 OK|
+----------------------+------+---------+---------+------------------+
+--------------------------------------------------------------------+
| ****** Notice ****** |
| A wall has been placed at portion(s) of an OUTLET |
| boundary condition (at 20.0% of the faces, 24.6% of the area) |
| to prevent fluid from flowing into the domain. |
| The boundary condition name is: Outlet. |
| The fluid name is: Air Ideal Gas. |
| If this situation persists, consider switching |
| to an Opening type boundary condition instead. |
+--------------------------------------------------------------------+
| H-Energy |58.25 | 6.0E-03 | 1.5E-01 | 5.8 7.7E-02 OK|
+----------------------+------+---------+---------+------------------+
| K-TurbKE |26.76 | 5.4E-03 | 1.9E-01 | 5.8 3.8E-02 OK|
| O-TurbFreq |99.99 | 3.0E-03 | 7.6E-02 | 12.3 1.6E-04 OK|
+----------------------+------+---------+---------+------------------+
================================================== ====================
OUTER LOOP ITERATION = 931 ( 2) CPU SECONDS = 3.785E+05 (4.598E+02)
----------------------------------------------------------------------
| Equation | Rate | RMS Res | Max Res | Linear Solution |
+----------------------+------+---------+---------+------------------+


and one process on 2 cores
================================================== ====================
OUTER LOOP ITERATION = 1174 ( 57) CPU SECONDS = 6.893E+05 (3.785E+04)
----------------------------------------------------------------------
| Equation | Rate | RMS Res | Max Res | Linear Solution |
+----------------------+------+---------+---------+------------------+
| U-Mom | 1.00 | 1.4E-04 | 2.5E-02 | 2.2E-02 OK|
| V-Mom | 0.98 | 1.7E-04 | 7.0E-02 | 1.9E-02 OK|
| W-Mom | 0.98 | 9.0E-05 | 2.3E-02 | 2.2E-02 OK|
| P-Mass | 0.99 | 3.2E-06 | 4.6E-04 | 9.0 3.6E-02 OK|
+----------------------+------+---------+---------+------------------+
| H-Energy | 0.95 | 5.6E-05 | 2.3E-02 | 5.9 5.7E-02 OK|
+----------------------+------+---------+---------+------------------+
| K-TurbKE | 0.97 | 5.1E-05 | 4.6E-03 | 5.8 1.9E-02 OK|
| O-TurbFreq | 1.06 | 2.2E-05 | 3.3E-03 | 12.4 1.4E-04 OK|
+----------------------+------+---------+---------+------------------+
================================================== ====================
OUTER LOOP ITERATION = 1175 ( 58) CPU SECONDS = 6.900E+05 (3.854E+04)
----------------------------------------------------------------------
| Equation | Rate | RMS Res | Max Res | Linear Solution |
+----------------------+------+---------+---------+------------------+
| U-Mom | 1.06 | 1.5E-04 | 3.0E-02 | 2.2E-02 OK|
| V-Mom | 0.88 | 1.5E-04 | 3.8E-02 | 2.4E-02 OK|
| W-Mom | 1.09 | 9.8E-05 | 3.3E-02 | 2.2E-02 OK|
| P-Mass | 0.96 | 3.1E-06 | 3.2E-04 | 9.0 3.7E-02 OK|
+----------------------+------+---------+---------+------------------+
| H-Energy | 0.92 | 5.2E-05 | 1.4E-02 | 5.9 5.9E-02 OK|
+----------------------+------+---------+---------+------------------+
| K-TurbKE | 0.97 | 5.0E-05 | 4.5E-03 | 5.8 2.0E-02 OK|
| O-TurbFreq | 0.80 | 1.8E-05 | 2.5E-03 | 12.4 1.2E-04 OK|
+----------------------+------+---------+---------+------------------+

FoxTwo October 7, 2010 09:35

so if you are just showing us the same case, me and my calculator are getting the following results:

core; simulation-time per iteration (s)

1; 414
2; 345
4; 275

well itīs a little speedup by increasing the cores, not quite good, but it is there!

Attesz October 7, 2010 09:48

Oh I see now. The CPU TIME is the cumulated time of the cores. I didn't know it. :S thanks. Than my values are meaningless.

FoxTwo October 7, 2010 09:54

and now the earth is rotating again in a right direction ;)

Attesz October 7, 2010 10:09

:) :) thanks

Lance October 7, 2010 10:28

There is an expert parameter which allows you to output wall clock time in the output, should be useful in your case.
Code:

Wall clock time = t

Attesz October 7, 2010 11:14

ok, i'll try it, thanks!

Attesz October 13, 2010 09:55

================================================== ====================
OUTER LOOP ITERATION = 1662 ( 217) CPU SECONDS = 9.807E+05 (2.258E+05)
----------------------------------------------------------------------
| Equation | Rate | RMS Res | Max Res | Linear Solution |
+----------------------+------+---------+---------+------------------+
| U-Mom | 1.16 | 5.1E-04 | 3.6E-01 | 4.8E-03 OK|
| V-Mom | 1.04 | 2.2E-04 | 1.2E-01 | 7.4E-03 OK|
| W-Mom | 1.18 | 1.8E-04 | 1.0E-01 | 7.3E-03 OK|
| P-Mass | 2.59 | 1.5E-05 | 1.1E-02 | 5.0 5.0E-02 OK|
+----------------------+------+---------+---------+------------------+
| H-Energy | 0.70 | 9.1E-05 | 7.8E-02 | 6.1 1.1E-02 OK|
+----------------------+------+---------+---------+------------------+
| K-TurbKE | 1.13 | 1.1E-04 | 6.1E-02 | 6.0 2.3E-03 OK|
| O-TurbFreq | 1.06 | 4.1E-05 | 2.4E-02 | 12.6 4.3E-05 OK|
+----------------------+------+---------+---------+------------------+

Execution terminating: STP file found.



CFD Solver finished: Wed Oct 13 14:21:43 2010
CFD Solver wall clock seconds: 7.5005E+04

Here, the wall clock seconds means the required time for the solution only, or the partitioning etc is involved?


All times are GMT -4. The time now is 22:25.