Simulation speed increase
Hi all,
I'm interesting in the effect of the Memory Allocation Factor in Solver Manager, Define Run Advanced Tabs. When I set higher values it speeds up the simulation? My other question is about the number of cores. I'm using a HP Workstation machine with 4 core INTEL Xeon and 16GBs of RAM. My simulation is a steady stade turbomachninery case with 2.5millions of elements. I experienced that when I use 2 cores it is faster than using 4 cores. Is there a fist-rule to select the cores correctly, or I should do benchmarks in every cases? Thanks in advance, Attila |
Attila,
What Xeon processor are you using? I use Xeon 5650. It is a 6 core with 24GB RAM. I have notice the same on my machine. There is no marginal speedup from 4 to 6 core. That has been the case for most of the simulation ranging from 800k cell to 2 million. Most of the time speedup or speed is bottlenecked by BUS speed. So even if you have good RAM and Processor, adding extra core does not help just because BUS is not capable of handling I/O. |
1. Speed up by memory allocation. Service told me some time ago that it may introduce a slighty better performance.
2. Much more impact: the multiple core issue. The cores share the band width to the memory. Depending on the machine this may limit the scalabilty. Possibly it is much faster to use two cores on two machines than four cores on one machine |
upps. Point two was already answered. I am sorry.
|
Quote:
Im using X5450 3GHz 4 cores. Thank's for your reply! |
Quote:
|
Quote:
What you are saying is true. But in the event that bus speed is not your limiting factor, there is also another dimension to it. I use an 8 core, 32GB RAM machine. I use 6 cores for my solver and not all 8. Primarily, the CPU needs some "free" cores to help in writing the result files into the hard drive. If you allocate all the cores for your solver, the solver will temporarily need to stop every time it writes a result file or updates monitoring points data into a file. This slows down the solver. Also, I have found that in parallelization, always use even number of cores. For some odd reason (which I am not sure why) even number of cores work faster than odd number of cores. For example a solver working on 6 cores in an 8 core machine, works faster than if I use 5 cores or even 7 cores for that matter. Meanwhile, using 6 cores on an 8 core machine is faster than using 2 or 4 cores. So as a rule of thumb I tend to use the highest possible 'even' number of cores which is lower than the maximum cores present in the system. I hope this helps... Good Luck.. |
--------------------
|
Hi Attila,
I understand what you mean when you run multiple processes and your data is definitely helpful. Thanks! When I have a mesh which is less than 1-2 million, and I have multiple cases to run, I tend to use 2 cores for each run. This way I use a total of 6 cores for running 3 different load cases. This works best provided you have enough RAM and hard disk space. However, I did not understand quite clearly your other data. You mentioned it took more time (almost double) for every iteration as you doubled the number of cores. For example when you went from 1 core to 4 cores, it took 5.88 min in 1 core and 18 min in 4 cores for just 1 iteration?!?! Is the data correct or is it because you have moving mesh. Because, I think that moving mesh and parallelization don't gel very well. |
Quote:
On your machine, your simulation accelerates if you use more CPU's when running with 1-2 millions of elements? Regards, Attesz |
Yes. In my experience, when I use 1-2 million elements, the simulation runs faster if I use multiple cores as opposed to a single core. This is when all the cores belong to the same CPU. However, the slow communication problem occurs when I use multiple CPUs in a cluster connected by LAN cables. So the communication between different CPUs becomes the limiting factor. I have not yet witnessed slow communication problem when I use mulitple cores in the same CPU.
|
Hmm, that's interesting. The communication slowdown using LAN cables is comprehensible, of course.
Maybe in my case it's caused by the rotating frame really. Anyway, it was useful to discuss this. Regards, Attesz |
just to mention, if you compare the cpu time between two seperated iterations, you have to divide the time by the used cores! to compare the speedup/time saving it is better to run a simulation with a defined number of iteration and compare the duration of the hole simulation!
|
Alexander,
thanks for the notice, I did almost the same. The numbers what I've shared before means 1 iteration time need computed from the 99.th and 100.th iterations of a single process. |
well, then i donīt understand why the simulation with two cores is slower than when you use only one on a 4 core-machine? in my experience, a parallel run (2 cpus/cores) is always faster!
|
I don't understand it as well :)
|
can you plz write down here the header with the cpu-time (CPU SECONDS = ...) of the two iterations which are you comparing? and that for all core-cases you have done yet? i have to see it with my own eyes.
|
1 procesess on 4 cores
================================================== ==================== OUTER LOOP ITERATION = 762 ( 99) CPU SECONDS = 2.366E+05 (1.085E+05) ---------------------------------------------------------------------- | Equation | Rate | RMS Res | Max Res | Linear Solution | +----------------------+------+---------+---------+------------------+ | U-Mom | 1.36 | 8.0E-04 | 5.1E-01 | 1.2E-02 OK| | V-Mom | 1.19 | 2.6E-04 | 1.6E-01 | 1.5E-02 OK| | W-Mom | 1.26 | 2.1E-04 | 1.4E-01 | 1.7E-02 OK| | P-Mass | 1.70 | 1.4E-05 | 1.0E-02 | 9.1 3.5E-02 OK| +----------------------+------+---------+---------+------------------+ | H-Energy | 0.74 | 1.2E-04 | 6.0E-02 | 6.1 5.6E-02 OK| +----------------------+------+---------+---------+------------------+ | K-TurbKE | 1.13 | 2.6E-04 | 1.6E-01 | 6.0 1.8E-02 OK| | O-TurbFreq | 1.22 | 2.2E-05 | 7.9E-03 | 12.5 4.6E-05 OK| +----------------------+------+---------+---------+------------------+ ================================================== ==================== OUTER LOOP ITERATION = 763 ( 100) CPU SECONDS = 2.377E+05 (1.096E+05) ---------------------------------------------------------------------- | Equation | Rate | RMS Res | Max Res | Linear Solution | +----------------------+------+---------+---------+------------------+ | U-Mom | 0.70 | 5.6E-04 | 4.5E-01 | 2.0E-02 OK| | V-Mom | 0.83 | 2.2E-04 | 1.3E-01 | 1.6E-02 OK| | W-Mom | 0.84 | 1.7E-04 | 9.7E-02 | 1.8E-02 OK| | P-Mass | 0.60 | 8.6E-06 | 5.3E-03 | 9.1 3.6E-02 OK| +----------------------+------+---------+---------+------------------+ | H-Energy | 1.30 | 1.5E-04 | 8.6E-02 | 6.1 5.5E-02 OK| +----------------------+------+---------+---------+------------------+ | K-TurbKE | 0.83 | 2.2E-04 | 1.2E-01 | 6.0 1.9E-02 OK| | O-TurbFreq | 0.84 | 1.9E-05 | 7.4E-03 | 12.5 4.7E-05 OK| +----------------------+------+---------+---------+------------------+ one process on 1 core (i've started it now, so it will be much faster) ================================================== ==================== OUTER LOOP ITERATION = 930 ( 1) CPU SECONDS = 3.780E+05 (4.566E+01) ---------------------------------------------------------------------- | Equation | Rate | RMS Res | Max Res | Linear Solution | +----------------------+------+---------+---------+------------------+ | U-Mom | 7.31 | 3.6E-03 | 3.8E-01 | 1.5E-01 ok| | V-Mom |14.93 | 3.6E-03 | 1.3E-01 | 1.3E-01 ok| | W-Mom |99.99 | 2.2E-02 | 1.3E+00 | 1.5E-02 OK| | P-Mass |99.99 | 1.6E-03 | 1.2E-01 | 4.9 5.8E-02 OK| +----------------------+------+---------+---------+------------------+ +--------------------------------------------------------------------+ | ****** Notice ****** | | A wall has been placed at portion(s) of an OUTLET | | boundary condition (at 20.0% of the faces, 24.6% of the area) | | to prevent fluid from flowing into the domain. | | The boundary condition name is: Outlet. | | The fluid name is: Air Ideal Gas. | | If this situation persists, consider switching | | to an Opening type boundary condition instead. | +--------------------------------------------------------------------+ | H-Energy |58.25 | 6.0E-03 | 1.5E-01 | 5.8 7.7E-02 OK| +----------------------+------+---------+---------+------------------+ | K-TurbKE |26.76 | 5.4E-03 | 1.9E-01 | 5.8 3.8E-02 OK| | O-TurbFreq |99.99 | 3.0E-03 | 7.6E-02 | 12.3 1.6E-04 OK| +----------------------+------+---------+---------+------------------+ ================================================== ==================== OUTER LOOP ITERATION = 931 ( 2) CPU SECONDS = 3.785E+05 (4.598E+02) ---------------------------------------------------------------------- | Equation | Rate | RMS Res | Max Res | Linear Solution | +----------------------+------+---------+---------+------------------+ and one process on 2 cores ================================================== ==================== OUTER LOOP ITERATION = 1174 ( 57) CPU SECONDS = 6.893E+05 (3.785E+04) ---------------------------------------------------------------------- | Equation | Rate | RMS Res | Max Res | Linear Solution | +----------------------+------+---------+---------+------------------+ | U-Mom | 1.00 | 1.4E-04 | 2.5E-02 | 2.2E-02 OK| | V-Mom | 0.98 | 1.7E-04 | 7.0E-02 | 1.9E-02 OK| | W-Mom | 0.98 | 9.0E-05 | 2.3E-02 | 2.2E-02 OK| | P-Mass | 0.99 | 3.2E-06 | 4.6E-04 | 9.0 3.6E-02 OK| +----------------------+------+---------+---------+------------------+ | H-Energy | 0.95 | 5.6E-05 | 2.3E-02 | 5.9 5.7E-02 OK| +----------------------+------+---------+---------+------------------+ | K-TurbKE | 0.97 | 5.1E-05 | 4.6E-03 | 5.8 1.9E-02 OK| | O-TurbFreq | 1.06 | 2.2E-05 | 3.3E-03 | 12.4 1.4E-04 OK| +----------------------+------+---------+---------+------------------+ ================================================== ==================== OUTER LOOP ITERATION = 1175 ( 58) CPU SECONDS = 6.900E+05 (3.854E+04) ---------------------------------------------------------------------- | Equation | Rate | RMS Res | Max Res | Linear Solution | +----------------------+------+---------+---------+------------------+ | U-Mom | 1.06 | 1.5E-04 | 3.0E-02 | 2.2E-02 OK| | V-Mom | 0.88 | 1.5E-04 | 3.8E-02 | 2.4E-02 OK| | W-Mom | 1.09 | 9.8E-05 | 3.3E-02 | 2.2E-02 OK| | P-Mass | 0.96 | 3.1E-06 | 3.2E-04 | 9.0 3.7E-02 OK| +----------------------+------+---------+---------+------------------+ | H-Energy | 0.92 | 5.2E-05 | 1.4E-02 | 5.9 5.9E-02 OK| +----------------------+------+---------+---------+------------------+ | K-TurbKE | 0.97 | 5.0E-05 | 4.5E-03 | 5.8 2.0E-02 OK| | O-TurbFreq | 0.80 | 1.8E-05 | 2.5E-03 | 12.4 1.2E-04 OK| +----------------------+------+---------+---------+------------------+ |
so if you are just showing us the same case, me and my calculator are getting the following results:
core; simulation-time per iteration (s) 1; 414 2; 345 4; 275 well itīs a little speedup by increasing the cores, not quite good, but it is there! |
Oh I see now. The CPU TIME is the cumulated time of the cores. I didn't know it. :S thanks. Than my values are meaningless.
|
and now the earth is rotating again in a right direction ;)
|
:) :) thanks
|
There is an expert parameter which allows you to output wall clock time in the output, should be useful in your case.
Code:
Wall clock time = t |
ok, i'll try it, thanks!
|
================================================== ====================
OUTER LOOP ITERATION = 1662 ( 217) CPU SECONDS = 9.807E+05 (2.258E+05) ---------------------------------------------------------------------- | Equation | Rate | RMS Res | Max Res | Linear Solution | +----------------------+------+---------+---------+------------------+ | U-Mom | 1.16 | 5.1E-04 | 3.6E-01 | 4.8E-03 OK| | V-Mom | 1.04 | 2.2E-04 | 1.2E-01 | 7.4E-03 OK| | W-Mom | 1.18 | 1.8E-04 | 1.0E-01 | 7.3E-03 OK| | P-Mass | 2.59 | 1.5E-05 | 1.1E-02 | 5.0 5.0E-02 OK| +----------------------+------+---------+---------+------------------+ | H-Energy | 0.70 | 9.1E-05 | 7.8E-02 | 6.1 1.1E-02 OK| +----------------------+------+---------+---------+------------------+ | K-TurbKE | 1.13 | 1.1E-04 | 6.1E-02 | 6.0 2.3E-03 OK| | O-TurbFreq | 1.06 | 4.1E-05 | 2.4E-02 | 12.6 4.3E-05 OK| +----------------------+------+---------+---------+------------------+ Execution terminating: STP file found. CFD Solver finished: Wed Oct 13 14:21:43 2010 CFD Solver wall clock seconds: 7.5005E+04 Here, the wall clock seconds means the required time for the solution only, or the partitioning etc is involved? |
All times are GMT -4. The time now is 22:25. |