CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   FLUENT (https://www.cfd-online.com/Forums/fluent/)
-   -   Single Core vs. Multi Core Issue (https://www.cfd-online.com/Forums/fluent/135448-single-core-vs-multi-core-issue.html)

Traction May 13, 2014 14:05

Single Core vs. Multi Core Issue
 
Hey cfd-online community,

i have an issue regarding the computational time difference between single and multi core simulations (4 cores).
In general multi core processing (of course) needs less time per iteration than a single core process.
But now i´m examining a case where my computer runs nearly out of memory - my RAM usage is nearly at 100% and the computational time for a simulation is determined by the time used for reading, writing and transferring data and not by the amount of cpu cores (cpu usage very low).
I noticed, that in this case the time per iteration on a single core environment is about half the time than on a multi core environment.

Do you have an idea why this happens or can someone explain this ?


Regards
Traction

villager May 14, 2014 13:32

Don't clearly understand you situation - is it a personal computer or a cluster? If personal, only "1" and "4" applies. "4" is the most probable in your case.
1) If you often read/write different case/data (not continuing one calculation) - the problem appears because the solution is partitioned to cores (time expences) and then gathered from parts (time expences). Mesh partitioning is such kind of operation when 1 core (host, main, head, master process...) divides the work with algorithmic balancing between many. Then, MPI is used, and these parts are sending through network interface to other cores. Then some kind of MPI receiving function is done by main process (gathering).
Solution: small meshes don't need partitioning/parallelization by domain decomposition (the method widely used in mesh solvers).
2) The other issue could be the interconnect throughput/latency. Relatively low speed + large network traffic generated by FLUENT (small parts of work on each core - iteration finish very quickly) => bad performance. You could even get worse performance that on single core.
Solution: choose proper interconnect. InfiniBand is supported by FLUENT and is very fast - use it instead ethernet, if you have it.
Code:

-pinfiniband
option at startup will help (interconnect should be tuned).
See also solution for "1" (for personal computer - only that solution applies in "2").
3) (for clusters only) The third thing to mention is your data storage system speed. Low speed of storage system + frequent disk r/w => bad performance.
Solution: use good data storage system.
4) If you are out of RAM, then your calculations proceed partially in swap that is hard disk drive space. When you use single core, single data stream is written on the hdd, when you use four - four data streams are written simultaneously. But your hdd couldn't write/read 4 streams simultaneously (assuming you don't have parallel r/w storage system), cylinder heads will go back and forth writing/reading pieces of data. So you wouldn't overcome hdd speed in that case + mind partitioning issues from "1" - you would ever slower your solution by adding another core.
One could correct me, if I'm somewhere wrong.
Solution: increase RAM. Use distributed memory systems (clusters, supercomputers).

Traction May 14, 2014 13:52

I think point 4 is the main problem of my calculation.

With the help of your explanation i start to understand fluent and the connection to required hardware better. Thank you very much !!!


All times are GMT -4. The time now is 06:27.