CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   SU2 (https://www.cfd-online.com/Forums/su2/)
-   -   Temperature Problem in Parallel Processing (https://www.cfd-online.com/Forums/su2/118907-temperature-problem-parallel-processing.html)

pdp.aero June 6, 2013 10:12

Temperature Problem in Parallel Processing
 
Dear All,

I am trying to run SU2 with my core i5 laptop but unfortunately after a several iteration, my system will shut down due to thermal increase. I know it may be irrelevant to SU2 forum but I think my be others already had similar experience or hint for solving it.

Best Regards
Payam

knaik June 13, 2013 17:37

Hello Payam,

Unfortunately, this isn't a problem we have seen arise in the past.
As you say, it may well be an issue with your specific machine.
Perhaps other users have had similar experiences with temperature and will be able to offer their solutions here. If you are able to resolve the problem yourself, please do let us know - especially if it is related to running SU^2!

Many thanks,
Kedar

pdp.aero June 14, 2013 11:16

Quote:

Originally Posted by knaik (Post 433896)
Hello Payam,

Unfortunately, this isn't a problem we have seen arise in the past.
As you say, it may well be an issue with your specific machine.
Perhaps other users have had similar experiences with temperature and will be able to offer their solutions here. If you are able to resolve the problem yourself, please do let us know - especially if it is related to running SU^2!

Many thanks,
Kedar

Thanks for your caring response. Yes, I guessed it isn't normal problem.

Actually the problem has been solved somehow. It seems my OS (Ubuntu 12.04) caused arise this problem, As I understood some dynamic CPU function and lack of hardware driver support caused Ubuntu OS have high CPU heat issues especially after heavy load, Therefore I guess there is two kind of solution : 1- using huge powerful Coolpad. 2- Undervolt the CPU.
Since the undervolting the CPU has huge risk, I tried to solve the problem with Coolpad first but it wasn't effective so much. My sensor indicators shows it has 101 (°C) on average when it run in parallel without Coolpad and has 98(°C) with Coolpad. So still it is really high. Hence it seems that I haven't any choice except undervotling the CPU with Linux Processor Hardware Control.


Sincerely,
Payam

knaik June 14, 2013 13:48

Thanks for the update Payam!
What an interesting (albeit unfortunate) problem.
Hopefully you're able to find a solution in the near future that doesn't involve scaling down the voltage.
Best of luck!
-Kedar

shirazbj June 15, 2013 07:14

Interesting.

I have this problem using win7@32. I was thinking maybe my CPU is overlooked a little. Now I know it is common.

pdp.aero June 15, 2013 19:18

Quote:

Originally Posted by shirazbj (Post 434122)
Interesting.

I have this problem using win7@32. I was thinking maybe my CPU is overlooked a little. Now I know it is common.

Hi shirazbij!!!

You shouldn't have this problem in window OS, This isn’t common in windows OS. Actually windows OS automatically reduces or optimizes the CPU voltage and hinder overheating specially for machine which have CPU architecture with integrated graphics (e.g. i3, i5, i7) for overclocking the frequencies. If your machine has dual or quad core and you had thermal problem when you were over loading all your cores, your overheating caused by another problems. I guess you have .NET on your OS which cause mscrosvm.exe eating up your CPU. So when you load all your cores, you are confronting thermal problem. By the way I will post my results on this issue very soon and I will explain it more there.:)

Best Regards
Payam

shirazbj June 16, 2013 00:59

Hi Payam,

How could a win machine doesn't has a .net program? I can't see mscrosvm.exe listed in task manager.

My cpu is i7-870, quad core with 8 threads, but without integrated graphics.

Thanks

pdp.aero June 18, 2013 20:55

Quote:

Originally Posted by knaik (Post 434050)
Thanks for the update Payam!
What an interesting (albeit unfortunate) problem.
Hopefully you're able to find a solution in the near future that doesn't involve scaling down the voltage.
Best of luck!
-Kedar

Thank you Kedar, It is cool and challenging. I undervolt my CPU manually successfully, then perform some speedup test on my laptop and a survey on parallel processing performance by using SU2. In the other word I tune my CPU for running SU2 in parallel with temperature consideration. I got very interesting results. I thought others may face similar issue in future. So I decided to share my survey on this issue in general.
First of the all, Undervolting the CPU has risks. It may cause some hardware or software damage particularly for heavy computational task. But if you manage to perform it correctly your system can run on low temperature and can save more energy. However if you don’t know what you want to do, you shouldn’t do this. If you decided to do it, you will find a very good thread here. I used it as a clue too.
Here it is my procedure summery for undervolting my CPU. My operational system is Ubuntu-12.04 LTS. First you should install kernel PHC (Processor Hardware Control) patch to be able to control your CPU voltage and frequency. Then you should unload your old CPU driver and load the appropriate PHC driver for your CPU. Finally you should find lowest possible voltage that your CPU can run with lowest frequency without crash and gradually increase the voltage to find proper temperature results. During this step you need to do some stress test on your CPU by loading all your cores and see how your temperature will change. Stress test can be performed by using CPUburn. My CPU has 2534Mhz as its maximum frequency. I was finding 1199Mhz as its lowest possible frequency and gradually increase it to the maximum for tuning the voltage. In every frequency and voltage I do stress test for 10-15 minutes. You can follow my results summery.
Code:

Test Number    Processor Frequency  VID (Voltage ID)    Max. Temperature
      1            1199Mhz                  9                      65˚C
      3            1466Mhz                11                      68˚C
      4            1599Mhz                12                      70˚C
      7            1999Mhz                15                      79˚C
      8            2133Mhz                16                      83˚C
      9            2266Mhz                17                      89˚C
      12            2534Mhz                20                      102˚C

I set my CPU frequency by considering the maximum temperature and then run SU2 on that frequency. The temperature was the same as stress test results. Thus SU2 didn’t cause any thermal problems. My maximum speedup for running in parallel with 2 physical cores and 4 threads is 2.04497.It’s really good. Although some acceleration method like Multigrid or GMRES play a great role but parallel processing always is effective. Also aerodynamic coefficient for every test converged to exactly the same number with exactly the same iterations. Moreover I could find the right answer about computational performance of Intel and AMD processors that I always had. You would follow this link to find out how frequency could effects computational performance. Also I find out frequency could play a great role if someone going to develop some dynamic partitioning for parallel processing.
After all this works, I found out there is another easy way to do this by just installing indicator-cpufreq ppa package, but the manual way always better.
The aerodynamic coefficient results for ONERA test case was good but still I have some discrepancy between Cp’s sectional distributions. It will be posted as a new thread soon. Thanks.

Best Regards,
Payam

pdp.aero June 18, 2013 20:57

Quote:

Originally Posted by shirazbj (Post 434230)
Hi Payam,

How could a win machine doesn't has a .net program? I can't see mscrosvm.exe listed in task manager.

My cpu is i7-870, quad core with 8 threads, but without integrated graphics.

Thanks

Everything is always possible :). I meant I guess your Microsoft .NET framework eat up your CPU. It’s just a guess. If you are looking for mscrosvm.exe you will find it in task manager> resource monitor> CPU.
You will able to disable it by navigating to C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727 and entering
Code:

ngen.exe executequeueditems
in your command prompt.

Sincerely,
Payam

pdp.aero July 27, 2014 22:11

I am posting the final solution to my question here. The thermal problem that I had during the simulation made my laptop to be shut down after almost 20 iterations. First, as you are seeing in the first solution stated previously in this thread, I made it working by undervolting the processor. However, when I was upgrading my OS, and re-installing the SU2 3.2.0, this happened again when ATLAS optimizes itself with the processors. I checked out the problem again, this time opened back of the laptop, removed the fan, removed the dusts, and replaced the CPU and GPU's old thermal paste with new one. This problem already posted here as a bug. Cleaning the fan and replacing the thermal paste perfectly worked for me. The maximum CPU temperature reached to 86ºC during ATLAS configuration in comparison with the previous 104ºC that imposed an unexpected shut down to my system.

All in all, if somebody confronting with unexpected shut down followed by the overheat warning, my first advice is cleaning the cooling system, removing the old thermal paste, and using the new one for the CPU and GPU.


All times are GMT -4. The time now is 19:52.