# Optimum way for running simulation in parallel

 Register Blogs Members List Search Today's Posts Mark Forums Read

 March 8, 2021, 23:02 Optimum way for running simulation in parallel #1 Senior Member   krishna kant Join Date: Feb 2016 Location: Hyderabad, India Posts: 133 Rep Power: 10 Hello All I am running a simulation of multiphase flow in parallel and there is a huge difference between execution time and clock time. I want to know what could be the possible reason for it. I am attaching an instance of my log data and my system info here. Code: ```PIMPLE: iteration 1 Selected 0 split points out of a possible 0. Number of isoAdvector surface cells = 0 isoAdvection: Before conservative bounding: min(alpha) = 0, max(alpha) = 1 + -1 isoAdvection: After conservative bounding: min(alpha) = 0, max(alpha) = 1 + -1 isoAdvection: time consumption = 1% Phase-1 volume fraction = 0 Min(alpha.water) = 0 1 - Max(alpha.water) = 1 solve the reinitialization equation Interpolation routine for interface normal Curvature Calculation Creating isoSurface Interpolating Curvature from iso-surface to cell centers smoothSolver: Solving for Ux, Initial residual = 0.000593322427, Final residual = 1.70711359e-09, No Iterations 3 smoothSolver: Solving for Uy, Initial residual = 0.00260626766, Final residual = 6.52222249e-09, No Iterations 3 smoothSolver: Solving for Uz, Initial residual = 0.000199399075, Final residual = 1.1910404e-09, No Iterations 3 GAMG: Solving for p_rgh, Initial residual = 0.00639449018, Final residual = 3.29043924e-05, No Iterations 3 time step continuity errors : sum local = 3.53346848e-09, global = 4.19805384e-11, cumulative = 3.24236402e-08 GAMG: Solving for p_rgh, Initial residual = 0.000340716515, Final residual = 3.28157441e-06, No Iterations 3 time step continuity errors : sum local = 3.52358875e-10, global = -6.40922139e-12, cumulative = 3.2417231e-08 GAMG: Solving for p_rgh, Initial residual = 4.49729204e-05, Final residual = 7.00905977e-09, No Iterations 15 time step continuity errors : sum local = 7.51373052e-13, global = -1.42068391e-14, cumulative = 3.24172168e-08 smoothSolver: Solving for k, Initial residual = 0.000333755776, Final residual = 6.7287304e-07, No Iterations 1 ExecutionTime = 24298.66 s ClockTime = 121287 s``` Code: ```Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 40 On-line CPU(s) list: 0-39 Thread(s) per core: 2 Core(s) per socket: 10 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 63 Model name: Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz Stepping: 2 CPU MHz: 1200.000 BogoMIPS: 4589.05 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 25600K NUMA node0 CPU(s): 0-9,20-29 NUMA node1 CPU(s): 10-19,30-39``` I am running 10 simulation each using 4 processor. The grid size is approx 22K for a 2D case.

 March 9, 2021, 06:00 #2 New Member   Icaro Amorim de Carvalho Join Date: Dec 2020 Posts: 24 Rep Power: 5 Hi Krishna, I sometimes get confused with the output of 'lscpu' as you used, so I hope I am not saying something wrong. I would suggest you try running these 10 simulations with 2 processors each and compare the cpu time with the clocktime. I say this because I suspect you have actually 20 physical cores, and the way you're running, you're using virtual cores, which OpenFOAM does not take advantage of. Hope that helps. kk415 likes this.

 March 9, 2021, 12:02 #3 Senior Member   Domenico Lahaye Join Date: Dec 2013 Posts: 736 Blog Entries: 1 Rep Power: 17 ClockTime might be higher due to writing/reading from file and due to communication between processors. kk415 likes this.

 March 10, 2021, 10:56 #4 Senior Member   Klaus Join Date: Mar 2009 Posts: 261 Rep Power: 22 - I understand, you have two nodes, each with two sockets, each socket with 10 physical cores? - Make sure you switch off SMT/Hyper-Threading and use only physical cores! - How fast is your link between the two nodes? (InfiniBand or something slower?) - Maybe you use too many cores for your small test case and waste time on "unnecessary" communication (see discussion: MPIRun How many processors) kk415 likes this.

 March 10, 2021, 13:32 #5 Senior Member   Join Date: Apr 2020 Location: UK Posts: 670 Rep Power: 14 Have you tried running top? Just type this from the command line and it will tell you how busy the processors are ... and to check on Domenico's suggestion. For example, if all is working smoothly the processes for each run should be steaming away at 100%CPU ... if they are always far below 100% then there is probably some bottleneck in the communication or you are over loading the cores; if they are at 100% for a while then drop to something small before returning to 100% then there may be a disk writing bottleneck etc etc. kk415 likes this.

 April 11, 2021, 03:33 #6 Senior Member   krishna kant Join Date: Feb 2016 Location: Hyderabad, India Posts: 133 Rep Power: 10 Hello All Thank you for all the suggestions and I apologize for my late reply, I was so involved those days for ICLASS so missed the notification of reply on my email. Now I am running another simulation of 1.125M cells and using 4 processor each. The problem is still pertaining and It seems the problem is of multithreading and CPU utilization. The top command gives me this info: Code: ``` PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 16526 Rajesh 20 0 1822m 1.1g 7932 R 45.6 3.7 959:58.71 interFlowvAMR1 15585 Rajesh 20 0 1830m 993m 7788 R 43.8 3.1 945:34.24 interFlowvAMR1 15892 Rajesh 20 0 1797m 976m 8216 R 43.8 3.1 949:38.86 interFlowvAMR1 15893 Rajesh 20 0 1809m 1.0g 7600 R 43.8 3.2 945:08.07 interFlowvAMR1 16527 Rajesh 20 0 1813m 1.1g 8092 R 43.8 3.7 960:44.33 interFlowvAMR1 16524 Rajesh 20 0 1824m 1.2g 8120 R 42.0 3.7 958:22.57 interFlowvAMR1 15588 Rajesh 20 0 1826m 965m 7760 R 40.1 3.0 944:16.15 interFlowvAMR1 15894 Rajesh 20 0 1813m 1.0g 7844 R 40.1 3.2 947:49.50 interFlowvAMR1 16852 Rajesh 20 0 1815m 1.2g 7792 R 40.1 3.8 956:07.44 interFlowvAMR1 15586 Rajesh 20 0 1821m 1.0g 7824 R 38.3 3.3 946:58.00 interFlowvAMR1 16219 Rajesh 20 0 1808m 1.1g 8196 R 38.3 3.6 953:12.18 interFlowvAMR1 16221 Rajesh 20 0 1826m 1.1g 8180 R 38.3 3.6 954:00.33 interFlowvAMR1 15891 Rajesh 20 0 1817m 1.0g 8192 R 36.5 3.3 948:09.53 interFlowvAMR1 16218 Rajesh 20 0 1830m 1.1g 8220 R 36.5 3.4 947:10.34 interFlowvAMR1 16525 Rajesh 20 0 1795m 1.1g 8144 R 36.5 3.7 958:52.91 interFlowvAMR1 15587 Rajesh 20 0 1824m 1.1g 7600 R 34.7 3.5 945:53.87 interFlowvAMR1 16220 Rajesh 20 0 1830m 1.1g 8032 R 34.7 3.4 948:53.46 interFlowvAMR1 16851 Rajesh 20 0 1862m 1.3g 7760 R 34.7 4.0 955:12.77 interFlowvAMR1 16853 Rajesh 20 0 1830m 1.2g 7568 R 34.7 3.9 958:54.66 interFlowvAMR1 16854 Rajesh 20 0 1831m 1.2g 7772 R 31.0 3.9 956:42.15 interFlowvAMR1``` CPU utilization is only 40%, even though I am using only 20 cpus. So I check my multithreading( Code: `grep -i 'ht' /proc/cpuinfo` ) and got this info Code: ```flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm invpcid_single ssbd pti retpoline ibrs ibpb tpr_shadow vnmi flexpriority ept vpid fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm cqm_llc cqm_occup_llc flush_l1d flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm invpcid_single ssbd pti retpoline ibrs ibpb tpr_shadow vnmi flexpriority ept vpid fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm cqm_llc cqm_occup_llc flush_l1d flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm invpcid_single ssbd pti retpoline ibrs ibpb tpr_shadow vnmi flexpriority ept vpid fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm cqm_llc cqm_occup_llc flush_l1d flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm invpcid_single ssbd pti retpoline ibrs ibpb tpr_shadow vnmi flexpriority ept vpid fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm cqm_llc cqm_occup_llc flush_l1d flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm invpcid_single ssbd pti retpoline ibrs ibpb tpr_shadow vnmi flexpriority ept vpid fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm cqm_llc cqm_occup_llc flush_l1d```

 April 11, 2021, 03:48 #7 Senior Member   krishna kant Join Date: Feb 2016 Location: Hyderabad, India Posts: 133 Rep Power: 10 Is there any command to switch off multithreading in OpenFoam? It is using virtual cpus even if I try to use only physical cpus by limiting to 20 cpus.