CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > CFX

Intel Nehalem's performance with CFX-12?

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Display Modes
Old   April 22, 2010, 04:51
Default Intel Nehalem's performance with CFX-12?
  #1
New Member
 
Join Date: Apr 2010
Posts: 4
Rep Power: 7
Aero75 is on a distinguished road
Hi all,

I have been searching the forum a bit to find an updated threat on this topic.

I am right now investigating what hardware I should get for a larger HPC upgrade. We use ANSYS CFX 12.

I am specifically looking at Intel X5560 quad cores, and I am wondering if anyone on the foum would have experience with how well they (or other CPU's of the 55xx family) scale especially for larger calculations, with say 50-100 cores in parallel or even higher numbers of cores? I am also planning to aquire Infiniband, as we have good experience on our existing compute grid with AMD dual core Opterons... Our typical problem size would probably be about 20 million hexas, but maybe even higher, as more hardware usually pulls in that direction

Best regards,

Aero75
Aero75 is offline   Reply With Quote

Old   April 22, 2010, 05:24
Default
  #2
Member
 
james britton
Join Date: Jan 2010
Posts: 38
Rep Power: 7
jbritton is on a distinguished road
Im not too up on the sever chips but with the new 6 core chip that have come out from both amd and intel would it possibly be worth looking at them.

I think if I remember rightly amd may also have a 12 core server chip, cant remember exactly but vaguely remember hearing something about it
jbritton is offline   Reply With Quote

Old   April 22, 2010, 07:14
Default
  #3
Super Moderator
 
Glenn Horrocks
Join Date: Mar 2009
Location: Sydney, Australia
Posts: 10,672
Rep Power: 84
ghorrocks has a spectacular aura aboutghorrocks has a spectacular aura aboutghorrocks has a spectacular aura about
The choice of what machines to use and how many parallel licenses is different depending on a few things. You want as much speed (ie low run times) for as little money as possible. When you estimate the speed you get with various parallel options and the cost of implementing it (keeping in mind the biggest cost is usually parallel license costs) the optimum setup is a large number of stand-alone machine with a CPU only partly loaded - that is a quad core CPU with only 1 or 2 processes on it. Do the maths - fully loading a CPU with 4 or 6 cores just does not pay.

This has changed with the new 55xx and 56xx chips. They scale far better to 4 or 6 cores than the old Intel CPUs ever did so the equation may have changed - but I doubt it. The software licenses far exceed the hardware costs so save money with less licenses but get the best hardware you can is the normal advice.

The CPU2006 benchmark on the spec.org website (http://www.spec.org/cpu2006/results/ ) has lots of machines benchmarked, including many on the new 55xx and 56xx chips. The fp and fp_rate benchmarks are very good guides to performance - make sure you understand what they are measuring before using them.
ghorrocks is offline   Reply With Quote

Old   April 22, 2010, 07:33
Default
  #4
Member
 
james britton
Join Date: Jan 2010
Posts: 38
Rep Power: 7
jbritton is on a distinguished road
Do you have any experience with SSD's, have seen the ANSYS literature on them and the speed increase appears to help more as the amount of cores used is increased, but considering the size and cost of the drives are they worth looking at atm?
jbritton is offline   Reply With Quote

Old   April 22, 2010, 07:44
Default
  #5
Super Moderator
 
Glenn Horrocks
Join Date: Mar 2009
Location: Sydney, Australia
Posts: 10,672
Rep Power: 84
ghorrocks has a spectacular aura aboutghorrocks has a spectacular aura aboutghorrocks has a spectacular aura about
CFD loads the hard drives lightly in general, meaning little speed is gained by using faster hard drives. CFD needs fast CPUs and fast memory. Hard drives and graphics don't make much of a difference.

I note in your case (due to the large number of node) you will need a fast network, and I see you are already looking at Infiniband. For most smaller clusters ethernet is fine but yours is too big for ethernet.

Also 20M node simulations are not very big for a system as you describe. Either you plan to run a few jobs that size at the same time or your physics is tricky which slows things down. About 0.5-1M nodes per CPU is pretty normal.
ghorrocks is offline   Reply With Quote

Old   April 22, 2010, 12:55
Default
  #6
New Member
 
Join Date: Apr 2010
Posts: 4
Rep Power: 7
Aero75 is on a distinguished road
Hi,

Thanks for your answers.

Yes, the physics is in a way a bit tricky. We run simulations with very large outer domains, meanwhile we still need to resolve the flow on very large structures with a y+ = 1 boundary layer resolvement. Typically we need to convect the flow through the domain a few times in order to obtain sufficient convergence. This means that we have to do in the order of 5000 iterations for a steady run...

We also have plans on going more into transient calculations - Both URANS or DES/SAS types of modelling, which will only set even higher demands for more compute power...

Regarding the Nehalems, I have a pretty good feeling. I have googled myself to some nice benchmark results. On our older cluster, I didn't dare to go for quad cores for the reason that you described Glenn. I feel more secure now with the Nehalem chips. Having said that, I still appreciate all the comments and experience I can get from this forum.

Regards,

Aero75
Aero75 is offline   Reply With Quote

Old   April 22, 2010, 21:55
Default
  #7
Senior Member
 
ckleanth's Avatar
 
George
Join Date: Mar 2009
Location: Birmingham, UK
Posts: 257
Rep Power: 9
ckleanth is on a distinguished road
I'm in the process of scouting for appropriate hardware in order to upgrade our old machine that comprises of 2 quad core X3350. I got hold of a loan machine fitted with two quad core X5580 and unless I'm doing something wrong I only get 25-30% faster solution times with the new chip. Furthermore the OS (win server 2008) reports that running a simulation using all 8 cores on the new machine, the max CPU loading reaches 50%. does anyone know why this is happening? is it maybe due to hyper threading and is there a way to rectify this and use all the CPU potential? Anyway I have some numbers and although I was expecting much better speedup with the newer chip (using all 8 cores somehere around 6ish), unless I find the problem these results are poor. The older machine has better speedup curve FFS

cores, X5580, X3350
1, 1.00, 1.00
2, 1.49, 1.49
3, 1.93, 1.92
4, 1.74, 2.09
5, 1.94, 2.19
6, 2.15, 2.42
7, 2.18, 2.47
8, 2.29, 2.58
__________________
Top 4 tips
1. Knowledge is everything and Ignorance is dangerous.
2. Understand your limitations and try to eliminate them.
3. Get yerself a bike and hoon the chuffer. You will soon learn why dogs like to hang their heads out the car window.
4. Please before asking any questions on how to run simulations in CFX, go though all the tutorials
ckleanth is offline   Reply With Quote

Old   April 22, 2010, 22:01
Default
  #8
Super Moderator
 
Glenn Horrocks
Join Date: Mar 2009
Location: Sydney, Australia
Posts: 10,672
Rep Power: 84
ghorrocks has a spectacular aura aboutghorrocks has a spectacular aura aboutghorrocks has a spectacular aura about
Are you REALLY sure it is a dual quad core system? The task manager will report a quad core CPU as having 8 processes because of hyperthreading.

Hyperthreading should not harm the results significantly. The difference between hyperthreading enabled and disabled for 4 processes on a quad core chip should be very small.

But still, the results you have got are disappointing. I suspect something is wrong - can you run the CFX benchmark to see how that goes? Are you sure all the components (memory, motherboard, etc etc) are high quality? Virus checker not doing silly things? Other processes?
ghorrocks is offline   Reply With Quote

Old   April 22, 2010, 22:12
Default
  #9
Senior Member
 
ckleanth's Avatar
 
George
Join Date: Mar 2009
Location: Birmingham, UK
Posts: 257
Rep Power: 9
ckleanth is on a distinguished road
yes glenn, this is a supermicro server two quad core 3.2GHz 24 GB ram, I run many cases, like the benchmark, longer benchmark until convergence, the gear tutorial but disabled saving as i just want to test the cpu. all tests have similar results. the last figures is from the immersed gear tutorial. the only thing installed on the machine is the OS and cfx, no antivirus. this is a loan machine and I only got this yesterday and everything is fresh installed. I'm not that familiar with server 2008 - maybe there's an option i am not aware but the OS environment seems ok, switched the option to focus on running programs then background services - had no affect anyway.... i'm a bit stuck at the mo (and puzzled )
__________________
Top 4 tips
1. Knowledge is everything and Ignorance is dangerous.
2. Understand your limitations and try to eliminate them.
3. Get yerself a bike and hoon the chuffer. You will soon learn why dogs like to hang their heads out the car window.
4. Please before asking any questions on how to run simulations in CFX, go though all the tutorials
ckleanth is offline   Reply With Quote

Old   April 22, 2010, 23:58
Default
  #10
New Member
 
Daniel Paukner
Join Date: Apr 2010
Posts: 17
Rep Power: 7
Pocket is on a distinguished road
If hyperthreading is enabled, you should see 16 cores
[4 real + 4 virtual] * 2 Processors
in your taskmanager.

Check the BIOS whether HT is enabled or not.

If HT is enabled and you only see 8 Cores in the task manager, you are only using one CPU.
Pocket is offline   Reply With Quote

Old   April 23, 2010, 02:35
Default
  #11
Member
 
Ali Torbaty
Join Date: Jul 2009
Location: Sydney, Australia
Posts: 71
Rep Power: 8
AliTr is on a distinguished road
for a large HPC you can look into Dell CX1 or Cray CX1 computers, they come with an option of 8 blades (each has 2x6cores Xeon 5670 2.93 GHz) in total you can go upto 96 cores and minimum 24 GB memory per blade.
to give you an idea, the price for a 3 blade system, 36 cores, 72 GB Ram, is about $48000. (OS: Windows HPC server 2008)
the important fact about this system is the utilization of the InfiniBand technology for blades connection, that significantly reduces the latency between motherboards and can boost the performance several folders in compare to the same number of cores but spreaded across the network.

Cloud computing is also available for CFX (from GridCore) but the $/core/hour is not yet attractive.
AliTr is offline   Reply With Quote

Old   April 23, 2010, 05:17
Default
  #12
Senior Member
 
Arjun
Join Date: Mar 2009
Location: Nurenberg, Germany
Posts: 368
Rep Power: 10
arjun is on a distinguished road
Quote:
Originally Posted by Aero75 View Post
Hi,

Thanks for your answers.

Yes, the physics is in a way a bit tricky. We run simulations with very large outer domains, meanwhile we still need to resolve the flow on very large structures with a y+ = 1 boundary layer resolvement. Typically we need to convect the flow through the domain a few times in order to obtain sufficient convergence. This means that we have to do in the order of 5000 iterations for a steady run...

We also have plans on going more into transient calculations - Both URANS or DES/SAS types of modelling, which will only set even higher demands for more compute power...

Regarding the Nehalems, I have a pretty good feeling. I have googled myself to some nice benchmark results. On our older cluster, I didn't dare to go for quad cores for the reason that you described Glenn. I feel more secure now with the Nehalem chips. Having said that, I still appreciate all the comments and experience I can get from this forum.

Regards,

Aero75
i wonder if something like this would be useful:

"Wall-layer models for large-eddy simulations". U. Piomelli and E. Balaras.

http://terpconnect.umd.edu/~balaras/...ications.shtml

do you think this type of approach might be useful for your case.
arjun is offline   Reply With Quote

Old   April 23, 2010, 05:38
Default
  #13
Senior Member
 
ckleanth's Avatar
 
George
Join Date: Mar 2009
Location: Birmingham, UK
Posts: 257
Rep Power: 9
ckleanth is on a distinguished road
Quote:
Originally Posted by Pocket View Post
If hyperthreading is enabled, you should see 16 cores
[4 real + 4 virtual] * 2 Processors
in your taskmanager.

Check the BIOS whether HT is enabled or not.

If HT is enabled and you only see 8 Cores in the task manager, you are only using one CPU.
it is mate

Quote:
Originally Posted by AliTr View Post
for a large HPC you can look into Dell CX1 or Cray CX1 computers, they come with an option of 8 blades (each has 2x6cores Xeon 5670 2.93 GHz) in total you can go upto 96 cores and minimum 24 GB memory per blade.
to give you an idea, the price for a 3 blade system, 36 cores, 72 GB Ram, is about $48000. (OS: Windows HPC server 2008)
the important fact about this system is the utilization of the InfiniBand technology for blades connection, that significantly reduces the latency between motherboards and can boost the performance several folders in compare to the same number of cores but spreaded across the network.

Cloud computing is also available for CFX (from GridCore) but the $/core/hour is not yet attractive.
the system I'm aiming for is equivalent to the cpu power of one single blade (same processors and memory as used in a blade rack) but in a server unit that can run on its own. the specs that I'm aiming for should be less than 10K for hardware but we don't really need anything more than that; that being hardware or software costs. my only problem is the damn thing wont scale up perpetually as it should. BTW looked at the power saving and changed that to always on 100% for min and max cpu capacity. I am in the progress redoing the benchmarks but from what I can see it didn't change much.
anyone with some suggestions to as what can be the problem?
__________________
Top 4 tips
1. Knowledge is everything and Ignorance is dangerous.
2. Understand your limitations and try to eliminate them.
3. Get yerself a bike and hoon the chuffer. You will soon learn why dogs like to hang their heads out the car window.
4. Please before asking any questions on how to run simulations in CFX, go though all the tutorials
ckleanth is offline   Reply With Quote

Old   April 23, 2010, 10:30
Default
  #14
New Member
 
Dr. Lee Axon
Join Date: Mar 2009
Posts: 3
Rep Power: 8
pointzeroconsulting is on a distinguished road
Quote:
Originally Posted by ckleanth View Post
I'm in the process of scouting for appropriate hardware in order to upgrade our old machine that comprises of 2 quad core X3350. I got hold of a loan machine fitted with two quad core X5580 and unless I'm doing something wrong I only get 25-30% faster solution times with the new chip. Furthermore the OS (win server 2008) reports that running a simulation using all 8 cores on the new machine, the max CPU loading reaches 50%. does anyone know why this is happening? is it maybe due to hyper threading and is there a way to rectify this and use all the CPU potential? Anyway I have some numbers and although I was expecting much better speedup with the newer chip (using all 8 cores somehere around 6ish), unless I find the problem these results are poor. The older machine has better speedup curve FFS

cores, X5580, X3350
1, 1.00, 1.00
2, 1.49, 1.49
3, 1.93, 1.92
4, 1.74, 2.09
5, 1.94, 2.19
6, 2.15, 2.42
7, 2.18, 2.47
8, 2.29, 2.58
Hi,

I have a twin X5550 processor workstation and get a speedup of 7.45 using all 8 cores, so something seems very odd with those X5580 numbers.

In conjunction with my hardware provider I have have done quite a lot of research in finding the sweet spot with the latest technologies - I can provide their details if you want.

Lee.
pointzeroconsulting is offline   Reply With Quote

Old   April 23, 2010, 17:11
Default
  #15
Senior Member
 
ckleanth's Avatar
 
George
Join Date: Mar 2009
Location: Birmingham, UK
Posts: 257
Rep Power: 9
ckleanth is on a distinguished road
Hi Lee,

if you dont mind sharing your info that would be great mate.

Thanks
__________________
Top 4 tips
1. Knowledge is everything and Ignorance is dangerous.
2. Understand your limitations and try to eliminate them.
3. Get yerself a bike and hoon the chuffer. You will soon learn why dogs like to hang their heads out the car window.
4. Please before asking any questions on how to run simulations in CFX, go though all the tutorials
ckleanth is offline   Reply With Quote

Old   April 26, 2010, 06:22
Default
  #16
Senior Member
 
ckleanth's Avatar
 
George
Join Date: Mar 2009
Location: Birmingham, UK
Posts: 257
Rep Power: 9
ckleanth is on a distinguished road
update, with hyperthreading switched off 3.36 speedup using 8 cores which is slightly better but not 6-7 ish. looking for issues with hardware configuration but nothing flags up at the moment. might go for amd chips though and see my luck with those.
__________________
Top 4 tips
1. Knowledge is everything and Ignorance is dangerous.
2. Understand your limitations and try to eliminate them.
3. Get yerself a bike and hoon the chuffer. You will soon learn why dogs like to hang their heads out the car window.
4. Please before asking any questions on how to run simulations in CFX, go though all the tutorials
ckleanth is offline   Reply With Quote

Old   April 27, 2010, 07:43
Default
  #17
Senior Member
 
ckleanth's Avatar
 
George
Join Date: Mar 2009
Location: Birmingham, UK
Posts: 257
Rep Power: 9
ckleanth is on a distinguished road
update 2

ansys have reports for reduced perpertual performance using nehalem architecture cpu's with windows os and currently seek solution to the issue. basically all they are saying at the moment you need to use linux to get the best out of your hardware
__________________
Top 4 tips
1. Knowledge is everything and Ignorance is dangerous.
2. Understand your limitations and try to eliminate them.
3. Get yerself a bike and hoon the chuffer. You will soon learn why dogs like to hang their heads out the car window.
4. Please before asking any questions on how to run simulations in CFX, go though all the tutorials
ckleanth is offline   Reply With Quote

Old   April 27, 2010, 13:48
Default
  #18
New Member
 
Dr. Lee Axon
Join Date: Mar 2009
Posts: 3
Rep Power: 8
pointzeroconsulting is on a distinguished road
Quote:
Originally Posted by ckleanth View Post
update 2

ansys have reports for reduced perpertual performance using nehalem architecture cpu's with windows os and currently seek solution to the issue. basically all they are saying at the moment you need to use linux to get the best out of your hardware

Hi George,

For reference my systems all use Red Hat Enterprise Linux.

Lee.
pointzeroconsulting is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Transient animation performance in CFX 5.5 POST Sjoerd Romkes CFX 8 February 5, 2013 15:53
CFX11 + Fortran compiler ? Mohan CFX 20 March 30, 2011 18:56
Intel Fortran and CFX 11.0 Rogerio Fernandes Brito CFX 4 November 11, 2008 01:27
ANSYS CFX 10.0 Parallel Performance for Windows XP Saturn CFX 4 August 13, 2006 12:27
CFX 4.4 installation problem Pandu Sattvika CFX 1 December 1, 2001 05:07


All times are GMT -4. The time now is 04:32.