CFD Online Discussion Forums

CFD Online Discussion Forums (
-   CFX (
-   -   New 6-core processors (

BRG May 27, 2010 08:47

New 6-core processors
I would be interested to know whether anybody has tried CFX running in parallel on the new 6-core processors such as L5640? Does the speed-up scale OK when using all 6 cores on one processor?


ghorrocks May 27, 2010 19:00

I have no experience with them but in my experience the SPEC website is a very good guide ( Have a look at the floating point benchmarks. When you compare the normal and rate benchmarks you can get a guide as to speedup factors.

Make sure you understand what the SPEC benchmarks are before using them.

Jergen May 28, 2010 10:19


Originally Posted by BRG (Post 260508)
I would be interested to know whether anybody has tried CFX running in parallel on the new 6-core processors such as L5640? Does the speed-up scale OK when using all 6 cores on one processor?


exactly, i have ran cfx on the intel core i7 (4 actual cors and 4 virtual), the speed is up much higher than the single core. But, i usaully meet the problem, during the Local MPI is starting to run, my solver is usually freezed before getting into the first iteration. i dont know what the cause is????? I dont know why some simulation it could be ran as 8 cores smoothly.

please suggest me, how to fix this :(:(

Michiel May 28, 2010 12:47

Don't run 8 parallel processe on the i7. The processor has only 4 actual cores. You better switch off the virtual core mode, and run 4 processes on 4 cores.

Recently I ordered a workstation with the new Intel six core X5660. Still waiting for the delivery. When i got some results on this machine I will post it on this forum.

user June 1, 2010 12:49

I have a new core i7980X@3.33GHz (6 real cores + 6 smt-cores, running at windows 7 64 bit) here and have speedup-factors from 3.4 to 4.8 with all 6 cores. More tests will follow (deactivating smt-cores, etc.). For comparison: benchmark.def in CFX-examples directory needs about 62 s with 1 core and 18 s with 6 cores, but ist seems to be not a good benchmark...

scott June 1, 2010 21:23

Do you have any larger cases, say 10 million cells that you could do a compare on? Would people be interested in a standard benchmark .def file of decent size that we could test on our machines? Maybe a steady state bluff body type problem using SST, as a simple example.

I could put something together if that is the case.



ghorrocks June 1, 2010 22:40

There used to be a big thread on the CFX-Community website which used the benchmark.def which came with CFX to benchmark machines. It ended up with lots of machines from people all over posting there results. It was a great thread but it got wiped out when ANSYS removed the forum.

It would be really good to get a thread like it running again. The benchmark.def file which comes with CFX, even though very small, is very good. I found it matched the results close enough up to 1M or 2M node benchmarks, and that is was OK for up to about 8 cores. So unless you are looking at more than 8 cores and more than a few million nodes the benchmark.def is fine.

But if you want to look at 10M nodes or 8+ processes then we will need a larger benchmark. If you do it can you find somewhere to put the def files so other people can run it too? I don't know if this forum allows attachments that large, you will probably have to link to elsewhere.

Michiel June 16, 2010 10:14

I finally received the new workstation; HP Z800 with X5660 and 6 GB DDR3 RAM. I run ansys cfx 12.1 under windos 7 64 bit.

I tried the benchmark.def from ansys. For serial computing it toke about 43 seconds to complete. With all 6 cores it toke 14 second. With 4 cores it also was 14 seconds. Total speed up with this case is around 3.
I also tried a larger calculation and came to a speed up of slightly less than 3. Also for this case the difference between 4 or 6 cores was small.

ckleanth July 6, 2010 06:15

you will find that speedup depends of the actual model and most speedup will occur on steady state problems. Transient problems will not scale up the same way as steady state problems.

read/write acces to HD is important if you have for example moving meshes or mesh refinement procedure.

I found that unless you use windows HPC, linux is much better for perpetual performance.

Last I personaly think that the benchmark def file needs modification to increase the allowed simulated steps if you want to use this as bechmark comparison.

ghorrocks July 6, 2010 18:46

In my experience the small benchmark file is a very good predictor of real performance. I think you will find that for real simulations the speedup from 4 to 6 cores will be small (if any speedup at all).

I also disagree that Win HPC or linux is required for good performance. I have extensively tested the speed difference between WinXP and linux and found no difference. The difference is that linux scales much better to large clusters and linux is a more stable platform to run very long simulations (ie several days and longer). But for simulations with only a few machines and up to a few days run time any OS will give the same result.

ckleanth July 7, 2010 07:06

well i speak from personal experience and perhaps the choice of hardware had some influence but I scaled up 2 recent quad core chips and have had much better results solving the same problem using linux 64 (redhat in this case) rather than windows server 64. i have seen plots of liniar(ish) scaleup for large systems but for the mere mortals i guess 8 cores will have to do. btw i have unoficially send my results to ansys and they said they were looking the mpich libraries under windows anyway - not sure if they have some known issues. as a note windows HPC is entirely different story and any who use it must have excelent scaleup results.

ghorrocks July 7, 2010 08:01

Just goes to show you should not take anything for granted with distributed parallel. To get it running nicely you have to do your homework, set it up carefully and check you get what you expect.

All times are GMT -4. The time now is 14:08.