CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   CFX (https://www.cfd-online.com/Forums/cfx/)
-   -   using core i7 cpu for parallel solving (https://www.cfd-online.com/Forums/cfx/72228-using-core-i7-cpu-parallel-solving.html)

feizaghaee January 29, 2010 07:13

using core i7 cpu for parallel solving
 
hi. does anybody know how can i use whole capacity of my core i7 cpu (i mean 8 core of cpu). i can work with 4 core of my cpu but task manager shows 50% cpu usage. i want to know when task manager shows 50% usage how many cores are working and can i run a problem with 100% cpu usage. when i run CFX solver i choose MPICH parallel for windows with 4 partitions. can i choose 8 partitions how? my Ansys CFX's version is 11.

jbritton January 29, 2010 07:28

see link: http://www.cfd-online.com/Forums/cfx/72094-cfx-i7.html

feizaghaee January 29, 2010 08:32

this topic dosen't contain whole of my questions. 50% usage of task manager isn't discussed. i want to know how many cores are working.

Michiel January 29, 2010 10:08

50% of 8 is 4... The numbers 4 and 8 are mentioned in the other topic.

jbritton January 29, 2010 10:24

^^^ lol,

Thank you for saving me the effort.

If memory serves me CFX 11 will only run on 2 cores unless you have additional parallel licenses enabling further processors to run. A core can be called a processor in this case.

So 50% load means you running it on 2 cores(4/100*50=2)or (4/2=2), this may well show up as 4 but this is due to hyperthreading.

With hyperthreading the theoretical number of cores is 8 again (8/100*50=4) so yes 4 is 50% of 8

In the writing of this my IQ feels like it has dropped 50% because all of this was discussed in the other thread!

feizaghaee January 29, 2010 15:52

thanks a lot. i asked this question becuase in that topic, 4 cores was mentioned as whole capacity of core i7 and 8 cores was mentioned as result of hyperthreading technology but CFX dosen't take any advantages from this feature. but you said 4 cores are half of the capacity. using 8 cores was mentioned useless. how can i install parallel license. tanks

ghorrocks January 30, 2010 06:31

Installation of parallel licenses is discussed in the CFX manual. If you currently have 4 parallel licenses but wish to waste money by running 8 processes on a single i7 CPU then talk to your CFX rep to purchase extra licenses.

I think I said that in the other thread didn't I? It's deja vu all over again. (That was meant to be a bad joke by the way)

feizaghaee January 30, 2010 08:45

tanks a lot for your kind attention

murx August 10, 2011 06:17

I want to get back to this topic again.
In my case licenses are not the constraint. I have enough licenses available to run CFX on 8 cores.

Glenn, in the other thread you said
Quote:

CFX has not been complied to use hyperthreading. Don't use it. Just leave the virtual cores sit idle or even better disable them in the BIOS.
I absolutely understand that using all 8 virtual cores does not give me a huge advantage. But if I let 4 of the virtual cores sit idle, isn't that a huge disadvantage compared to disabled hyperthreading? Because... like soembody said in this thread... Windows taskmanager shows only 50% CPU load.

So if I have enough licenses available, isn't it better to use all 8 cores? And is there rough estimate how much using 8 cores in hyperthreading is inferior to using 4 cores with hyperthreading disabled?

ghorrocks August 10, 2011 07:03

Do the test yourself then! Run a benchmark case with 4 processes and another with 8. I bet the 8 case is either barely faster than the 4, or more likely to be slower.

murx August 30, 2011 11:00

Ok, I did the test. Here's the results, for those who are interested:
http://img713.imageshack.us/img713/318/coresh.jpg



What I did was a one-phase simulation on a mesh with approx. 2 Mio. elements. I ran the simulation in HP MPI Local Parallel except for the 1core run - that was run in serial mode. I measured the time for 10 outer loops. The value in the chart is the inverse of this time, normalized with the 1-core value.

evcelica August 30, 2011 14:50

great, thanks for doing this. Do you have results with hyperthreading disabled to see how that affects performance?

murx August 31, 2011 04:27

Unfortunately, I don't think I am authorized to change any BIOS settings. But I will reboot and try to do so as soon as my current simulation is finished.

ghorrocks August 31, 2011 07:53

I assume you have a 4 core CPU - in that case the 4 core result will be close enough to the no-hyperthreading result.

A speedup factor of 2 at 4 cores on a modern CPU is not very good. You should be able to do better than that. What CPU do you have? Are you sure you are not running out of memory?

evcelica August 31, 2011 18:35

Glenn, are you saying having hyper-threading on is better for CFX?
(8 with HT is better than 4 w/o) and (4 with HT is equal to 4 w/o)? roughly of course.


I was somewhat disappointed by this scaling too, what are the specs on your memory?

ghorrocks August 31, 2011 19:28

CFX has not been compiled to make use of hyperthreading. On or off it will not make much difference.

My question about memory is because a possible explanation of the poor performance is you have run out of memory and the machine is paging. You will not get a good parallel speed up if it is paging.

murx September 1, 2011 06:56

The CPU is an i7-2600 and the memory size is 8 GB. I do not remember the exact memory usage during those test runs. But on similar runs, the amount of memory used is about 4.5 GB.

I expected a bigger speedup factor, too.

ghorrocks September 1, 2011 22:44

What time are you reporting? The setup and shut down of a simulation does not scale with multi processors, only the solver time. If you look in the output file the solver time is reported after the last iteration, and the total time is reported at the end.

You should only use solver time to work our speed up factors.

murx September 2, 2011 05:49

I actually measured the time by hand using the windows clock, since I did not care for a very high precision :)

My start time was the time when the first iteration step started and the stop time was the time when the last residuals of the 10th iteration step were plotted. So i did not count for the time necessary for mesh partitioning and so on...

ghorrocks September 2, 2011 07:44

OK, but it is easier to use the time recorded in the output fine. Then you don't need to watch text files scroll past. More accurate too.

murx September 4, 2011 06:57

Of course, you are right. But a deviation of lets say 2 seconds caused by taking the time manually at measured times between 2 and 3 minutes, only means an error of 1%.
So I guess this is not the reason for the poor performance improvement. Any other idea what the reason could be?

ghorrocks September 4, 2011 21:11

I do not trust your measurement yet. Please recalculate based on CFD Solver wall clock seconds reported immediately after the iterations are complete.

murx September 5, 2011 08:39

Serial: CFD Solver wall clock seconds: 2.2300E+02

HP MPI Local Parallel, 4 Processes: CFD Solver wall clock seconds: 1.2000E+02 (factor 1.86)

HP MPI Local Parallel, 7 Processes: CFD Solver wall clock seconds: 1.1000E+02 (factor 2.03)

This time, i used a different case with a smaller mesh (~ 600 000 elements).

ghorrocks September 5, 2011 19:47

I see. Forget about the 7 processes model, you are always going to get terrible speedups with hyperthreading. A speedup of 1.86 at 4 processes is not very good. You should be over 3 for modern processors. Have you run the benchmark simulation? That is the reference I use to benchmark solver speed.

But I think you can be pretty sure something is wrong with your setup and is robbing you of multi processor speed.

murx September 8, 2011 04:43

First of all, thanks for your help!

Here are the results for the Benchmark run:
Serial: CFD Solver wall clock seconds: 3.0000E+01
HP MPI Local Parallel, 4: CFD Solver wall clock seconds: 1.6000E+01 (--> factor 1.88)

I checked the memory usage during my last runs and it was never fully used.
Also, I tested another machine. It's a Core i-5 2500 (the one I usually use is a Core i-7 2600). The speedup factor on this machine was only 1.9, too.

I dont know too much about CPUs, but maybe there is some kind of automatic down-clocking when several cores are used ...

ghorrocks September 8, 2011 08:51

If the benchmark problem runs at a similar speed then you definitely have a problem with your set up, it is not the simulation.

Recent Intel CPUs do run a higher clock speed when running single processor, this could explain it. To check this run it 1, 2, 3 and 4 processor. If the 1 processor result is clearly different then this probably explains it.

The benchmark in 30s and 4 processes in 16s is very fast - must be quite a new machine.

So I would recommend trying other multi processor implementations such as MPI, HPMPI, PVM, Intel etc. You may be able to get better speedups from them.

oj.bulmer June 25, 2012 09:55

I have i7-2860QM (4 cores) and 8 GB RAM.
Popular opinion in this thread is that 4 cores give optimum performance.

Question is, would keeping simulation on for hours harmful for life of system? I can feel the system heating up as I see the 4 processors loaded at ~100%.

oj.bulmer June 25, 2012 10:01

I have i7-2860QM (4 cores) and 8 GB RAM.
Popular opinion in this thread is that 4 cores give optimum performance.

Question is, would keeping simulation on for hours harmful for life of system? I can feel the system heating up as I see the 4 processors loaded at ~100%.


Quote:

Originally Posted by ghorrocks (Post 323400)
... So I would recommend trying other multi processor implementations such as MPI, HPMPI, PVM, Intel etc. You may be able to get better speedups from them.

Glenn, given my processor build, which of these (MPI, HPMPI, PVM, Intel ) would you recommend for best performance?

Thanks
OJ

ghorrocks June 25, 2012 20:01

When CPUs run hard they run hot. Well designed systems can handle the load and should still function fine. Poorly designed systems will overheat and cause problems - I think the i7 chips sense their own temeprature and if they are too hot they slow down to stop overheating. This saves the CPU but means you are running at reduced speed.

As for which multi processor implementation is best, the simple answer is to benchmark them all on your system.


All times are GMT -4. The time now is 05:09.