CFD Online Discussion Forums

CFD Online Discussion Forums (http://www.cfd-online.com/Forums/)
-   CFX (http://www.cfd-online.com/Forums/cfx/)
-   -   Intel Nehalem's performance with CFX-12? (http://www.cfd-online.com/Forums/cfx/75328-intel-nehalems-performance-cfx-12-a.html)

Aero75 April 22, 2010 04:51

Intel Nehalem's performance with CFX-12?
 
Hi all,

I have been searching the forum a bit to find an updated threat on this topic.

I am right now investigating what hardware I should get for a larger HPC upgrade. We use ANSYS CFX 12.

I am specifically looking at Intel X5560 quad cores, and I am wondering if anyone on the foum would have experience with how well they (or other CPU's of the 55xx family) scale especially for larger calculations, with say 50-100 cores in parallel or even higher numbers of cores? I am also planning to aquire Infiniband, as we have good experience on our existing compute grid with AMD dual core Opterons... Our typical problem size would probably be about 20 million hexas, but maybe even higher, as more hardware usually pulls in that direction;)

Best regards,

Aero75

jbritton April 22, 2010 05:24

Im not too up on the sever chips but with the new 6 core chip that have come out from both amd and intel would it possibly be worth looking at them.

I think if I remember rightly amd may also have a 12 core server chip, cant remember exactly but vaguely remember hearing something about it

ghorrocks April 22, 2010 07:14

The choice of what machines to use and how many parallel licenses is different depending on a few things. You want as much speed (ie low run times) for as little money as possible. When you estimate the speed you get with various parallel options and the cost of implementing it (keeping in mind the biggest cost is usually parallel license costs) the optimum setup is a large number of stand-alone machine with a CPU only partly loaded - that is a quad core CPU with only 1 or 2 processes on it. Do the maths - fully loading a CPU with 4 or 6 cores just does not pay.

This has changed with the new 55xx and 56xx chips. They scale far better to 4 or 6 cores than the old Intel CPUs ever did so the equation may have changed - but I doubt it. The software licenses far exceed the hardware costs so save money with less licenses but get the best hardware you can is the normal advice.

The CPU2006 benchmark on the spec.org website (http://www.spec.org/cpu2006/results/ ) has lots of machines benchmarked, including many on the new 55xx and 56xx chips. The fp and fp_rate benchmarks are very good guides to performance - make sure you understand what they are measuring before using them.

jbritton April 22, 2010 07:33

Do you have any experience with SSD's, have seen the ANSYS literature on them and the speed increase appears to help more as the amount of cores used is increased, but considering the size and cost of the drives are they worth looking at atm?

ghorrocks April 22, 2010 07:44

CFD loads the hard drives lightly in general, meaning little speed is gained by using faster hard drives. CFD needs fast CPUs and fast memory. Hard drives and graphics don't make much of a difference.

I note in your case (due to the large number of node) you will need a fast network, and I see you are already looking at Infiniband. For most smaller clusters ethernet is fine but yours is too big for ethernet.

Also 20M node simulations are not very big for a system as you describe. Either you plan to run a few jobs that size at the same time or your physics is tricky which slows things down. About 0.5-1M nodes per CPU is pretty normal.

Aero75 April 22, 2010 12:55

Hi,

Thanks for your answers.

Yes, the physics is in a way a bit tricky. We run simulations with very large outer domains, meanwhile we still need to resolve the flow on very large structures with a y+ = 1 boundary layer resolvement. Typically we need to convect the flow through the domain a few times in order to obtain sufficient convergence. This means that we have to do in the order of 5000 iterations for a steady run...

We also have plans on going more into transient calculations - Both URANS or DES/SAS types of modelling, which will only set even higher demands for more compute power...

Regarding the Nehalems, I have a pretty good feeling. I have googled myself to some nice benchmark results. On our older cluster, I didn't dare to go for quad cores for the reason that you described Glenn. I feel more secure now with the Nehalem chips. Having said that, I still appreciate all the comments and experience I can get from this forum.

Regards,

Aero75

ckleanth April 22, 2010 21:55

I'm in the process of scouting for appropriate hardware in order to upgrade our old machine that comprises of 2 quad core X3350. I got hold of a loan machine fitted with two quad core X5580 and unless I'm doing something wrong I only get 25-30% faster solution times with the new chip. Furthermore the OS (win server 2008) reports that running a simulation using all 8 cores on the new machine, the max CPU loading reaches 50%. does anyone know why this is happening? is it maybe due to hyper threading and is there a way to rectify this and use all the CPU potential? Anyway I have some numbers and although I was expecting much better speedup with the newer chip (using all 8 cores somehere around 6ish), unless I find the problem these results are poor. The older machine has better speedup curve FFS :mad:

cores, X5580, X3350
1, 1.00, 1.00
2, 1.49, 1.49
3, 1.93, 1.92
4, 1.74, 2.09
5, 1.94, 2.19
6, 2.15, 2.42
7, 2.18, 2.47
8, 2.29, 2.58

ghorrocks April 22, 2010 22:01

Are you REALLY sure it is a dual quad core system? The task manager will report a quad core CPU as having 8 processes because of hyperthreading.

Hyperthreading should not harm the results significantly. The difference between hyperthreading enabled and disabled for 4 processes on a quad core chip should be very small.

But still, the results you have got are disappointing. I suspect something is wrong - can you run the CFX benchmark to see how that goes? Are you sure all the components (memory, motherboard, etc etc) are high quality? Virus checker not doing silly things? Other processes?

ckleanth April 22, 2010 22:12

yes glenn, this is a supermicro server two quad core 3.2GHz 24 GB ram, I run many cases, like the benchmark, longer benchmark until convergence, the gear tutorial but disabled saving as i just want to test the cpu. all tests have similar results. the last figures is from the immersed gear tutorial. the only thing installed on the machine is the OS and cfx, no antivirus. this is a loan machine and I only got this yesterday and everything is fresh installed. I'm not that familiar with server 2008 - maybe there's an option i am not aware but the OS environment seems ok, switched the option to focus on running programs then background services - had no affect anyway.... i'm a bit stuck at the mo :( (and puzzled :mad:)

Pocket April 22, 2010 23:58

If hyperthreading is enabled, you should see 16 cores
[4 real + 4 virtual] * 2 Processors
in your taskmanager.

Check the BIOS whether HT is enabled or not.

If HT is enabled and you only see 8 Cores in the task manager, you are only using one CPU.

AliTr April 23, 2010 02:35

for a large HPC you can look into Dell CX1 or Cray CX1 computers, they come with an option of 8 blades (each has 2x6cores Xeon 5670 2.93 GHz) in total you can go upto 96 cores and minimum 24 GB memory per blade.
to give you an idea, the price for a 3 blade system, 36 cores, 72 GB Ram, is about $48000. (OS: Windows HPC server 2008)
the important fact about this system is the utilization of the InfiniBand technology for blades connection, that significantly reduces the latency between motherboards and can boost the performance several folders in compare to the same number of cores but spreaded across the network.

Cloud computing is also available for CFX (from GridCore) but the $/core/hour is not yet attractive.

arjun April 23, 2010 05:17

Quote:

Originally Posted by Aero75 (Post 255945)
Hi,

Thanks for your answers.

Yes, the physics is in a way a bit tricky. We run simulations with very large outer domains, meanwhile we still need to resolve the flow on very large structures with a y+ = 1 boundary layer resolvement. Typically we need to convect the flow through the domain a few times in order to obtain sufficient convergence. This means that we have to do in the order of 5000 iterations for a steady run...

We also have plans on going more into transient calculations - Both URANS or DES/SAS types of modelling, which will only set even higher demands for more compute power...

Regarding the Nehalems, I have a pretty good feeling. I have googled myself to some nice benchmark results. On our older cluster, I didn't dare to go for quad cores for the reason that you described Glenn. I feel more secure now with the Nehalem chips. Having said that, I still appreciate all the comments and experience I can get from this forum.

Regards,

Aero75

i wonder if something like this would be useful:

"Wall-layer models for large-eddy simulations". U. Piomelli and E. Balaras.

http://terpconnect.umd.edu/~balaras/...ications.shtml

do you think this type of approach might be useful for your case.

ckleanth April 23, 2010 05:38

Quote:

Originally Posted by Pocket (Post 256009)
If hyperthreading is enabled, you should see 16 cores
[4 real + 4 virtual] * 2 Processors
in your taskmanager.

Check the BIOS whether HT is enabled or not.

If HT is enabled and you only see 8 Cores in the task manager, you are only using one CPU.

it is mate

Quote:

Originally Posted by AliTr (Post 256026)
for a large HPC you can look into Dell CX1 or Cray CX1 computers, they come with an option of 8 blades (each has 2x6cores Xeon 5670 2.93 GHz) in total you can go upto 96 cores and minimum 24 GB memory per blade.
to give you an idea, the price for a 3 blade system, 36 cores, 72 GB Ram, is about $48000. (OS: Windows HPC server 2008)
the important fact about this system is the utilization of the InfiniBand technology for blades connection, that significantly reduces the latency between motherboards and can boost the performance several folders in compare to the same number of cores but spreaded across the network.

Cloud computing is also available for CFX (from GridCore) but the $/core/hour is not yet attractive.

the system I'm aiming for is equivalent to the cpu power of one single blade (same processors and memory as used in a blade rack) but in a server unit that can run on its own. the specs that I'm aiming for should be less than 10K for hardware but we don't really need anything more than that; that being hardware or software costs. my only problem is the damn thing wont scale up perpetually as it should. BTW looked at the power saving and changed that to always on 100% for min and max cpu capacity. I am in the progress redoing the benchmarks but from what I can see it didn't change much.
anyone with some suggestions to as what can be the problem? :confused:

pointzeroconsulting April 23, 2010 10:30

Quote:

Originally Posted by ckleanth (Post 255990)
I'm in the process of scouting for appropriate hardware in order to upgrade our old machine that comprises of 2 quad core X3350. I got hold of a loan machine fitted with two quad core X5580 and unless I'm doing something wrong I only get 25-30% faster solution times with the new chip. Furthermore the OS (win server 2008) reports that running a simulation using all 8 cores on the new machine, the max CPU loading reaches 50%. does anyone know why this is happening? is it maybe due to hyper threading and is there a way to rectify this and use all the CPU potential? Anyway I have some numbers and although I was expecting much better speedup with the newer chip (using all 8 cores somehere around 6ish), unless I find the problem these results are poor. The older machine has better speedup curve FFS :mad:

cores, X5580, X3350
1, 1.00, 1.00
2, 1.49, 1.49
3, 1.93, 1.92
4, 1.74, 2.09
5, 1.94, 2.19
6, 2.15, 2.42
7, 2.18, 2.47
8, 2.29, 2.58

Hi,

I have a twin X5550 processor workstation and get a speedup of 7.45 using all 8 cores, so something seems very odd with those X5580 numbers.

In conjunction with my hardware provider I have have done quite a lot of research in finding the sweet spot with the latest technologies - I can provide their details if you want.

Lee.

ckleanth April 23, 2010 17:11

Hi Lee,

if you dont mind sharing your info that would be great mate.

Thanks

ckleanth April 26, 2010 06:22

update, with hyperthreading switched off 3.36 speedup using 8 cores which is slightly better but not 6-7 ish. looking for issues with hardware configuration but nothing flags up at the moment. might go for amd chips though and see my luck with those.

ckleanth April 27, 2010 07:43

update 2

ansys have reports for reduced perpertual performance using nehalem architecture cpu's with windows os and currently seek solution to the issue. basically all they are saying at the moment you need to use linux to get the best out of your hardware :mad:

pointzeroconsulting April 27, 2010 13:48

Quote:

Originally Posted by ckleanth (Post 256445)
update 2

ansys have reports for reduced perpertual performance using nehalem architecture cpu's with windows os and currently seek solution to the issue. basically all they are saying at the moment you need to use linux to get the best out of your hardware :mad:


Hi George,

For reference my systems all use Red Hat Enterprise Linux.

Lee.


All times are GMT -4. The time now is 19:51.