CFD Online Discussion Forums

CFD Online Discussion Forums (
-   CFX (
-   -   CFX11 scaling on quad core Clovertown cpus? (

Joe May 31, 2007 17:27

CFX11 scaling on quad core Clovertown cpus?
CFX11 scales very well on Woodcrest dual core CPUs. As does Fluent.

However, Fluent scales very poorly from 2 to 4 cores within a single MCP / single socket Clovertown.

Does anyone have scaling data for CFX11 on Clovertowns or Xeon 32xx chips?

stu May 31, 2007 20:16

Re: CFX11 scaling on quad core Clovertown cpus?
I will run some comparos for you on my QX6700 ( non Xeon). Do you want me to play with the applications processor affinity, or just let the XP32 OS deal with it?

Off the top of my head a sim I did the other day took 1hr 47 mins on 4 cores and 2hrs 8min of 2 cores. I think it is a problem with a memory bottle neck as there is no on chip memory controler. I will get back to you with hard figures.


Joe June 1, 2007 05:19

Re: CFX11 scaling on quad core Clovertown cpus?
Much obliged. Just what I needed.

"Do you want me to play with the applications processor affinity, or just let the XP32 OS deal with it?" Letting the OS deal with it would be fine.

Also could you let me know what motherboard chipset and memory settings (FSB, memory timings, ...) you are using? Have you ever overclocked the FSB? As the FSB is theoretically the bottleneck keeping the two dual cores fighting for memory access, a mild FSB overclock could have significant benefits (e.g. upping the FSB 10-20%).

If I could suggest a geometry: Simple inlet/outlet tube with 500k cells per partition e.g. 2E6 total. This would eliminate partitioning as a potential issue.

The reason I am asking about this is that on July 22 quad core pricing is going to become very affordable suggesting a possible application within clusters. However, this suitability will be largely dependent on whether the FSB contention issue is manageable.

stu June 2, 2007 07:39

Re: CFX11 scaling on quad core Clovertown cpus?
How about this, send me an email to with a .def file that you would like tested, and I will run it for you.


Joe June 2, 2007 10:19

Re: CFX11 scaling on quad core Clovertown cpus?
I could but it really just needs to be trivially simple e.g.

Tube: D 25 mm x 250 mm long. 2e6 uniformly distributed cells. Steady state SST turbulence model Water Inlet: Fully developed pipe flow, 1 m/s uniform inlet velocity Outlet: Average pressure etc. etc.

i.e. the simplest possible case.

You only need to run each case for ~20 iterations to get a feel for the compute time per iteration.

Thanks again. Looking forward to your results!

Gert-Jan June 2, 2007 16:01

Re: CFX11 scaling on quad core Clovertown cpus?
I would suggest to perform a calculation for a unsteday flow. We have a benchmark with 500k nodes. It is just a cilinder filled with water with a hot bottom and a cold top. Select buoyancy, 100 iterations, convergence criterium 1e-5 and off you go.


Charles June 3, 2007 04:28

Re: CFX11 scaling on quad core Clovertown cpus?
I think many of us would appreciate it if you could also post the results to this forum when you have completed this test. Like I suspect a lot of people in CFD, I'm seriously considering getting some quad-core hardware soon, but there is this doubt that the performance on the Intel quad core hardware scales well enough to justify the additional parallel software license expense. I suspect that the scaling on AMD Barcelona will be much better, when that finally becomes available, but we don't actually KNOW. So any hard information on this topic will be very welcome right now!

stu June 3, 2007 08:47

I have a Dell Precision 390 with a QX6700 (2.66 GHz) quad core CPU, 4 gig 667MHz ram, running XP32 SP2 with 3 gig switch enabled, & CFX 11 with update 1

Performed a multiphase transient sim with 306K elements

CPU Wall Single 39m 26s 39m 49s

MPICH 2 21m 16s 23m 01s

MPICH 3 15m 45s 17m 53s Note 1

MPICH 4 15m 26s 17m 22s

The MPICH 3 result seems strange but I was doing other things at the time.

I hope this helps


Joe June 3, 2007 09:06

Re: Results
Thanks. This directly confirms the poor Intel quad core scaling seen under Fluent.

In practice, you only get to utilise 3 of the 4 cores effectively.

AMDs upcoming Barcelona will probably show better quad core scaling due to its native quad core design and shared L3 cache.

We will be sticking to dual core Intel CPUs in our cluster.

Gert-Jan June 4, 2007 07:01

Re: Results
Thanks stu. Can you provide more details? Number of nodes (instead of elements), type of mesh, memory usage in the run, number of iterations, cpu time and wall clock time.

Thanks in advance, Gert-Jan

stu June 4, 2007 07:46

Re: Results
No probs, I will do the best I can

Domain Name : Default Domain

Total Number of Nodes = 56596

Total Number of Elements = 306824

Total Number of Tetrahedrons = 306824

Total Number of Faces = 15938

Mesh was cfx mesh, therefore pure tets. Iterations 10 (just for benchmarking), memory usage, didn't notice but did not spool to disk, all 100% CPU usage. The times given above are for CPU time and then Wall time, it is just that the format of this forum predates DOS 6.22 hence, it did not display correctly, even after previewing.


Gert-Jan June 4, 2007 10:44

Re: Results
Hmmm. As I said, we use a benchmark which is quite large. It requires 1G of memory, 100 iterations. This to check floating point operations. You can run it if you have time avaialbe. Shall I share this benchmark?


Stu June 4, 2007 18:43

Re: Results
Yeah, no probs, send it to and I will run it when I get time, just tell me what you want as a performance result, the .out file?


All times are GMT -4. The time now is 06:30.