CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > CFX

CFX11 scaling on quad core Clovertown cpus?

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Display Modes
Old   May 31, 2007, 17:27
Default CFX11 scaling on quad core Clovertown cpus?
  #1
Joe
Guest
 
Posts: n/a
CFX11 scales very well on Woodcrest dual core CPUs. As does Fluent.

However, Fluent scales very poorly from 2 to 4 cores within a single MCP / single socket Clovertown.

Does anyone have scaling data for CFX11 on Clovertowns or Xeon 32xx chips?
  Reply With Quote

Old   May 31, 2007, 20:16
Default Re: CFX11 scaling on quad core Clovertown cpus?
  #2
stu
Guest
 
Posts: n/a
I will run some comparos for you on my QX6700 ( non Xeon). Do you want me to play with the applications processor affinity, or just let the XP32 OS deal with it?

Off the top of my head a sim I did the other day took 1hr 47 mins on 4 cores and 2hrs 8min of 2 cores. I think it is a problem with a memory bottle neck as there is no on chip memory controler. I will get back to you with hard figures.

Stu

  Reply With Quote

Old   June 1, 2007, 05:19
Default Re: CFX11 scaling on quad core Clovertown cpus?
  #3
Joe
Guest
 
Posts: n/a
Much obliged. Just what I needed.

"Do you want me to play with the applications processor affinity, or just let the XP32 OS deal with it?" Letting the OS deal with it would be fine.

Also could you let me know what motherboard chipset and memory settings (FSB, memory timings, ...) you are using? Have you ever overclocked the FSB? As the FSB is theoretically the bottleneck keeping the two dual cores fighting for memory access, a mild FSB overclock could have significant benefits (e.g. upping the FSB 10-20%).

If I could suggest a geometry: Simple inlet/outlet tube with 500k cells per partition e.g. 2E6 total. This would eliminate partitioning as a potential issue.

The reason I am asking about this is that on July 22 quad core pricing is going to become very affordable suggesting a possible application within clusters. However, this suitability will be largely dependent on whether the FSB contention issue is manageable.

  Reply With Quote

Old   June 2, 2007, 07:39
Default Re: CFX11 scaling on quad core Clovertown cpus?
  #4
stu
Guest
 
Posts: n/a
How about this, send me an email to cfdstu@gmail.com with a .def file that you would like tested, and I will run it for you.

Stu

  Reply With Quote

Old   June 2, 2007, 10:19
Default Re: CFX11 scaling on quad core Clovertown cpus?
  #5
Joe
Guest
 
Posts: n/a
I could but it really just needs to be trivially simple e.g.

Tube: D 25 mm x 250 mm long. 2e6 uniformly distributed cells. Steady state SST turbulence model Water Inlet: Fully developed pipe flow, 1 m/s uniform inlet velocity Outlet: Average pressure etc. etc.

i.e. the simplest possible case.

You only need to run each case for ~20 iterations to get a feel for the compute time per iteration.

Thanks again. Looking forward to your results!
  Reply With Quote

Old   June 2, 2007, 16:01
Default Re: CFX11 scaling on quad core Clovertown cpus?
  #6
Gert-Jan
Guest
 
Posts: n/a
I would suggest to perform a calculation for a unsteday flow. We have a benchmark with 500k nodes. It is just a cilinder filled with water with a hot bottom and a cold top. Select buoyancy, 100 iterations, convergence criterium 1e-5 and off you go.

Gert-Jan

www.bunova.nl
  Reply With Quote

Old   June 3, 2007, 04:28
Default Re: CFX11 scaling on quad core Clovertown cpus?
  #7
Charles
Guest
 
Posts: n/a
I think many of us would appreciate it if you could also post the results to this forum when you have completed this test. Like I suspect a lot of people in CFD, I'm seriously considering getting some quad-core hardware soon, but there is this doubt that the performance on the Intel quad core hardware scales well enough to justify the additional parallel software license expense. I suspect that the scaling on AMD Barcelona will be much better, when that finally becomes available, but we don't actually KNOW. So any hard information on this topic will be very welcome right now!

  Reply With Quote

Old   June 3, 2007, 08:47
Default Results
  #8
stu
Guest
 
Posts: n/a
I have a Dell Precision 390 with a QX6700 (2.66 GHz) quad core CPU, 4 gig 667MHz ram, running XP32 SP2 with 3 gig switch enabled, & CFX 11 with update 1

Performed a multiphase transient sim with 306K elements

CPU Wall Single 39m 26s 39m 49s

MPICH 2 21m 16s 23m 01s

MPICH 3 15m 45s 17m 53s Note 1

MPICH 4 15m 26s 17m 22s

The MPICH 3 result seems strange but I was doing other things at the time.

I hope this helps

Stu
  Reply With Quote

Old   June 3, 2007, 09:06
Default Re: Results
  #9
Joe
Guest
 
Posts: n/a
Thanks. This directly confirms the poor Intel quad core scaling seen under Fluent.

In practice, you only get to utilise 3 of the 4 cores effectively.

AMDs upcoming Barcelona will probably show better quad core scaling due to its native quad core design and shared L3 cache.

We will be sticking to dual core Intel CPUs in our cluster.
  Reply With Quote

Old   June 4, 2007, 07:01
Default Re: Results
  #10
Gert-Jan
Guest
 
Posts: n/a
Thanks stu. Can you provide more details? Number of nodes (instead of elements), type of mesh, memory usage in the run, number of iterations, cpu time and wall clock time.

Thanks in advance, Gert-Jan
  Reply With Quote

Old   June 4, 2007, 07:46
Default Re: Results
  #11
stu
Guest
 
Posts: n/a
No probs, I will do the best I can

Domain Name : Default Domain

Total Number of Nodes = 56596

Total Number of Elements = 306824

Total Number of Tetrahedrons = 306824

Total Number of Faces = 15938

Mesh was cfx mesh, therefore pure tets. Iterations 10 (just for benchmarking), memory usage, didn't notice but did not spool to disk, all 100% CPU usage. The times given above are for CPU time and then Wall time, it is just that the format of this forum predates DOS 6.22 hence, it did not display correctly, even after previewing.

Stu
  Reply With Quote

Old   June 4, 2007, 10:44
Default Re: Results
  #12
Gert-Jan
Guest
 
Posts: n/a
Hmmm. As I said, we use a benchmark which is quite large. It requires 1G of memory, 100 iterations. This to check floating point operations. You can run it if you have time avaialbe. Shall I share this benchmark?

Gert-Jan
  Reply With Quote

Old   June 4, 2007, 18:43
Default Re: Results
  #13
Stu
Guest
 
Posts: n/a
Yeah, no probs, send it to cfdstu@gmail.com and I will run it when I get time, just tell me what you want as a performance result, the .out file?

Stu
  Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
solving a conduction problem in FLUENT using UDF Avin2407 Fluent UDF and Scheme Programming 1 March 13, 2015 03:02
Superlinear speedup in OpenFOAM 13 msrinath80 OpenFOAM Running, Solving & CFD 18 March 3, 2015 06:36
Parallel computing quad core Prad Main CFD Forum 13 February 9, 2009 15:28
intel quad core with ANSYS CFX-v11.0 (without SP1) Rogerio Fernandes Brito CFX 12 May 30, 2008 02:59
Questions about CPU's: quad core, dual core, etc. Tim FLUENT 0 February 26, 2007 15:02


All times are GMT -4. The time now is 13:52.