CFD Online Discussion Forums

CFD Online Discussion Forums (
-   Hardware (
-   -   Advice on the technical requirements for a new Fluent Workstation (

Dorit February 15, 2013 19:57

Advice on the technical requirements for a new Fluent Workstation
Hi everyone,

I'm currently looking into buying a new workstation to run Fluent 12 and 14 on it. I would appreciate if someone could give me some information or suggestions on what kind of workstations to buy to reduce the computational time of Fluent.
Here is a rough outline of the simulations I’m planning to do:
- - Hybrid mesh of 5 – 7 million cells used in the sliding mesh configuration in Fluent
- - Transient simulation that is required to run for up to 30 seconds flow time with time steps of approximately 0.001 – 0.0005 seconds
- - Coupled solver, 2nd order discretisation scheme (spatial & temporal)
- Mainly 2 equation turbulence models, i.e. k-ɛ and k-ω models
- - Computational time of up to a month which cannot be interrupted due to the nature of the transient study
- - Parallel computing with Fluent

For choosing a suitable processor that reduces the computational time, is it more important to invest in a better technology (AMD, Intel Core or Intel Xenon) or should I increase the number of cores (quad, six or eight)? For a given number of cores is it better to have all of the cores in one CPU (a six or eight core machine) or to have a dual CPU system (two Quadro core)?
Is the speed of the CPU or the cache of a CPU more critical for a fast simulation? So, between a CPU of 3.3GHz and 15MB of cache or a CPU of 3.6GHz and 10MB of cache, which one is more favourable?
To what degree is the simulation performance dictated by the RAM? Is an increase of the RAM always associated with an increase in the simulation speed or is there a maximum amount of RAM after which the CPU power will limit the simulation speed? Is a 16GB RAM memory sufficient for what I'm trying to do or should I go for 32GB RAM? Is there a difference whether I get ECC or non-ECC RAM for Fluent? Can an ECC RAM reduce the number of crashes and potential simulation errors, such as AMG solver divergences?

I know it's a lot of questions that I've just posted but it'd be great if someone could give me some suggestions on a few of the points I've mentioned.


CapSizer February 16, 2013 13:41

OK, it's a quiet Saturday evening, so I will bite ;-)

First of all, before you start agonizing about the hardware, it is necessary to address the question of the software licenses at your disposal. CFD software is way more expensive than the hardware you will be running on, so this needs to be sorted out first. There is no merit in getting a 16-core workstation if you can only run 8-way parallel. AFAIK, the way that Ansys markets parallel Fluent these days, you buy the "HPC" facility in steps of 8, 32, 128, 512 cores. If you have only paid for one HPC pack, you can only run 8-way parallel, so you have to figure out the best hardware configuration for 8 cores. The next step is the ability to run on 32 cores, which is a really big step. There's not much sense in having 32-core capability but only running say 16 cores. Sort this question out first before committing to any hardware. In my experience, 8 million cells will definitely be much better on 32-way parallel than 8-way, but there may be only relatively small gains if you try to go for more parallel cores than that.

8 million cells, coupled solver, 2-equation TM .... That would fit (I think!) fine in 16 GB of RAM if you are running single precision, but you may run out of memory if you need to use double precision. RAM is inexpensive, so I would say go for 32 GB rather. The effect of total memory system is that you need to have enough, but no more. You can't gain speed by adding more RAM, provided that you have enough to start with. If you run out of memory, everything stops. The computer will try to swap memory to disk, but that is so slow and unresponsive that you will be tempted to pull the plug in order to stop things. Not a good idea, by the way.

I have been advised by hardware experts to use ECC rather than ordinary RAM, but frankly I cannot say that I have ever seen a benefit from using ECC RAM on a single socket machine. Many (most, all?) multi-socket systems require ECC however.

The two characteristics of memory that do make a huge difference to CFD speed are the actual memory clock speed (get the fastest supported by the chipset) and the number of memory channels. Inexpensive single socket systems (AMD FX, Intel i5) use two parallel memory channels (typically, but not always, 4 slots in total). By contrast the current Intel i7 uses four (either 4 or 8 slots in total). When you measure CFD performance, you find that this is what really makes the difference. The server CPU's (Intel Xeon E5 and AMD Socket G34) will use 4 parallel channels per socket. So the nice thing about a dual socket server board is that you have a total of 8 memory slots feeding the CPU's, instead of the 4 that you would get from a single Core i7. Two Core i7 systems linked with GB ethernet will probably be competitive with a dual socket workstation (i.e. 8 memory channels for both systems), and probably cost a bit less, but distributed parallel is always just a little bit of a pain to deal with.

Neither CPU clock speed, nor cache size, nor even architecture, is as significant as the memory system, when it comes to performance in CFD. For example, and AMD FX8150 running either 4 or 8 cores, will be close in performance to the very different Intel Core i5 (4 cores), because both can use the same memory system (two channel 1600 MHz DDR3 as standard, although there are overclocking options). Neither can match the Core i7, with its 4 memory channels. The same effect is likely to be seen when comparing Opterons and Xeons. Yes, you can get 16 cores in an Opteron CPU, but these are fed by 4 memory channels, just like the 8 core Xeon, so don't expect it to be any quicker.

This is not to say, however, that clock speed, core core count and cache are insignificant, but sort the software license and memory questions out first. If your software license requires you to pay per parallel process, get a smaller number of the fastest cores that you can get. If parallel licensing is a flat fee (like Adapco power session) it starts making sense to go for more cores and more memory channels.

If all that you can afford is an 8-process HPC license, think in terms of two linked core i7's, or a Xeon workstation with two E5-2643 CPU's.

Dorit February 17, 2013 21:33

Wow, great reply! I'll need a bit of time to sort through everything and start making sense of it all, before I can reply properly.

But thanks for the answer, it's really helpful!! :)

Dorit February 19, 2013 09:02

Hi Charles,

The workstation that I want to buy will be connected to the University’s network; hence licenses shouldn’t be a problem. And yes, I’ll be using single precision solver. Based on your suggestions I tried to come up with the following specs for a potential Workstation:

CPU: Intel Core i7 3930K
CPU Cooler: be quiet! Dark Rock 2 high end CPU Cooler
Motherboard: Asus P9X79 Pro, Intel X79
RAM: 32GB (8x4GB) Corsair DDR3 Dominator Platinum, PC3-12800 (1600), Non-ECC, Unbuffered, CAS 9-9-9-24, DHX, XMP, 1.5V
Graphics: 512MB NVIDIA Quadro NVS 400 (an upgrade as we dont have the 300 in stock)
Power supply: 750W Corsair Enthusiast Series TX750M, Modular, 85% Eff', 80 PLUS Bronze

Does this set-up make sense to you without wasting money on one part of the PC but ignoring a potential performance bottleneck? Would it make any sense to buy the above machine with a CPU with 8 cores (Xenon) rather than 6 (Core i7), or is the RAM still the limiting factor which won’t allow an increase in performance?
In the description for the 6 core version it says that the machine has 6 physical cores and 6 logical cores. Does this require me to use 6 or 12 Fluent licenses for best performance?
And finally, would it be possible to increase the simulation speed of the suggested setup by over-clocking the CPU and/or RAM? The machine will be running 24/7 for a few weeks so I want to make sure that the cooler is good enough.

Thanks for your suggestions!!

CapSizer February 19, 2013 09:30

Well, it is difficult to comment on the cooler and power supply, as I am unfamiliar with those particular models. However, as a general principle it is a good idea to invest in a very good cooler for the sake of your hearing, and likewise in a power supply that is nice and quiet. If you want to try overclocking (generally well worth it), you definitely need the extra cooling. I have Corsair liquid cooling in one of my machines, and I'm very glad I made that decision. It is quiet and it is cool. Make sure that you blow the cooling fins clean regularly, it can make a big difference.

You should probably look for faster RAM. You can easily get up to 2133 MHz DDR3, and these overclocking boards can easily support that. In fact, the board can go up to 2400, but it may be difficult to find appropriate modules.

I'm not sure the Quadro card makes much sense. You might do better with a mid-range enthusiast's card than with a low-end pro card.

Get a chassis with good airflow.

From what I've read, there is not much to be gained in CFD from trying to use the hyper-threading (the extra 6 logical cores). Advice is normally to switch off hyoer threading in the BIOS. But hey, somebody else is paying for the extra 6 licenses, so try it and let us know what you find.

I think the Xeon CPU will require a different motherboard? I would imagine that you could get more performance from a single 8-core Xeon than a single 6-core i7, but the price difference is likely to be substantial.

Otherwise, looks good to me.

Dorit February 20, 2013 19:03


I've been doing lots more research on what to get and it turns out to be a complete mess.... :s

Of the following CPU's which one would you get, and why?

- Intel Core i7 6 cores @ 3.8 GHz (256KB/core L2 Cache, 12MB L3 Cache), 32GB RAM @ 2400 MHz
- AMD FX-8350 8 cores @ 4.0 GHz (1MB/core L2 Cache, 8MB L3 Cache), 32GB RAM @ 2400 MHz
- AMD Opteron 6272 server CPU 16 cores @ 2.1 GHz (1MB/core L2 Cache, 16MB L3 Cache), 32GB RAM @ 1600 GHz

Thanks again for your opinion!

evcelica February 20, 2013 22:46

absolutely get the Intel one out of those three. no contest

CapSizer February 21, 2013 01:49

Erik is right in broad terms, but you can qualify that statement:

Get the i7 if you want the fastest single socket workstation for this task, you get 4 memory channels with this.

Get the FX8350 if you want to really minimise the amount of money that you spend, while still getting reasonable performance. You only get 2 memory channels here, so the performance will be much lower than the i7.

Get two Opteron CPU's in a dual processor workstation, if somebody else is paying for the extra parallel software licences, and you can afford to spend this much on the hardware. This way you get a total of 8 memory channels, but it comes at a price. Around USD 3800 for a reasonably configured workstation.

In terms of speed, you can reasonably assume that the performance will be roughly equivalent to the number of memory channels at your disposal, so it will be something like 2:1:4 . You can probably expect the cost ratio of the three systems to be something like 2:1:4 as well, but of course you can go and get more accurate numbers for that quite easily. Remember that you may want to factor the cost of power consumption into this as well, if the system is going to run 24/7.

evcelica February 21, 2013 20:53

Yeah you are right, sorry for the short answer.
I believe universities usually have the capability to run up to 4 core parallel, I've seen ANSYS academic licenses set up that way. Maybe you have something different, you might want to find that out. I would be really surprised if you had the capability to run 32-way parallel, if so I'd like to take a few classes there!!!

Dorit February 22, 2013 14:53

Thanks guys. I think I'll go with the core i7. I wanna try hyperthreading to get 12 cores and see what effect it has. As the i7 has 4 memory channels and I want 32GB of RAM (2,400MHz) should I go for 8 x 4GB or 4 x 8GB, or does it even make a difference?

Ps. we can run simulations with however many cores we want, you'll just have to wait a loooong time for your job to be processed if you ask for like 30. 6 or 8 works within a few hours usually :)

evcelica February 22, 2013 18:41

As many cores as you want! Dang you are lucky!
I wouldn't try using all 12, your computer will be unresponsive, 11 would be the most I would go. But I always disable hyperthreading because any speedup is very small and it can somtimes be even slower than with without. You gain Zero benefit in CFX, and probably not that much in Fluent, especially if you are going to have to wait longer for your job to be submitted.

I've heard 4x8GB may be slightly more stable, but I've been using 8x4GB with no problems, so I don't think it matters that much. Just make sure you get RAM thats qualified for your motherboard.

Dorit March 1, 2013 17:46

Right, I just found out that we may be able to stretch our budget a little for the workstation. So I am now considering an 8 and a 12 core machine. One of the workstations has more cores, but has a lower CPU clock speed and a lower CPU performance score in benchmark tests, whereas the other workstation has a lower number of cores but has a higher speed and a better performance score.
  • Xeon E5-2665: 8 core @ 2.4GHz - CPU performance score 12.8
  • Xeon E5-2620 (dual): 2 x 6 core @ 2.0GHz - CPU performance score 8.2
Which of these two workstations would you recommend and why?

To increase the performance I was thinking of overclocking the system. With the two systems I picked, can I overclock CPU, motherboard and/or RAM? If it makes sense to overclock the CPU, will I need a CPU cooler?

Thanks for any thoughts.

evcelica March 1, 2013 20:12

Like Capsizer said earlier:


Originally Posted by CapSizer (Post 409217)
In terms of speed, you can reasonably assume that the performance will be roughly equivalent to the number of memory channels at your disposal.

So the dual CPU system would be much better since it will have 8 memory channels instead of only 4.

Overclocking the Memory:
Every dual socket server board I've seen only supports 1600 MHz RAM. There is one ASUS Z9PE-D8 WS Dual 2011 motherboard that will support up to 2133MHz, but review's on that board don't seem too great.

Overclocking the CPU:
The XEON CPUs have Locked multipliers, so it will be very difficult to overclock, unless you really know what you are doing and have a motherboard that will allow you to adjust the base clock.

This is why some of us chose to use multiple single socket systems in a cluster; so we can overclock memory and CPU to our liking. But running multiple machines is not for everyone, and can be a bit of a pain to setup everything.

Dorit March 2, 2013 05:18


Originally Posted by evcelica (Post 410958)
Like Capsizer said earlier:

So the dual CPU system would be much better since it will have 8 memory channels instead of only 4.

Thank you for your reply. Does it mean that I should get 8 RAMs now that we are having 8 memory channels or 4 RAMs still do the job?

Basically for 32GB, in a dual socket system, 8 x 4GB or 4 x 8GB?


CapSizer March 2, 2013 06:57

Mate, you are going to have a tough time fitting 4 modules into 8 slots. 8 channels, 8 slots, 8 modules.


Originally Posted by Dorit (Post 410991)
Thank you for your reply. Does it mean that I should get 8 RAMs now that we are having 8 memory channels or 4 RAMs still do the job?

Basically for 32GB, in a dual socket system, 8 x 4GB or 4 x 8GB?


Daveo643 March 5, 2013 18:13


evcelica March 6, 2013 16:59

I don't think CFX can use GPU computing at all, and Fluent can only use it for ray tracing in certain radiation models, maybe they've updated it since I last checked.

Your slides are for Ansys Mechanical, it has to be a Tesla or Quaddro card. If you try to use a GeForce card it won't let you and give you a message about your unsupported card. I've tried it in mechanical with my GTX680, and nope it doesn't work.

Daveo643 March 6, 2013 17:42


evcelica March 6, 2013 22:46

I looked at your slides from your other post. Didn't see the slides about Fluent.

You have to understand that ANSYS didn't just find that these GPUs work exceptionally well and are promoting them so that you can get your work done faster. It's a partnership with the GPU manufacturers, which is why ANSYS WILL NOT let you use a perfectly capable cheap GeFORCE card on their simulations, and is the same reason NVidia artificially reduces the double precision compute performance on their non-workstation cards, even though hardware wise they are pretty much identical. You think ANSYS is going to prohibit you from using anything but a Tesla or Quaddro card in mechanical, but let you use whatever you want in FLUENT? Wishful thinking, They and Nvidia would be losing money that way, and that's usually not what businesses set out to do. I can see OPENFOAM letting you use whatever GPU you want, but not greedy old ANSYS. I've never asked explicitly, but I WILL GUARANTEE you that you can't use your GeForce card, just because I know how ANSYS works.

You've got to look into these charts a bit, and think about it on your own, not just look at the graphs ANSYS shows and go "WOW LOOK HOW MUCH FASTER IT IS!!!! We should go out and buy a $3000-$6000 GPU and the $20K ANSYS licences to run it!!!"

You think they are showing the average performance gain here? Of course not, they have hand picked the most favorable solution speedups for problems custom built for the GPUs. You won't see this kind of speedup in your everyday work. There are not a lot of independent ANSYS benchmarks out there, the only one I've seen is here:
It has a CPU only based computer solving in roughly half the time as a computer running a Tesla GPU. Now why doesn't ANSYS that benchmark? In the Fluent benchmark they compare the GPU to an old and slow XEON X5650. Why not to some new generation hardware? Why not make that benchmark available for others to run on their own hardware?

In addition, the memory on these GPUs is usually quite limited. That Fluent Benchmark is 1.2M cells, which would take me no time to solve, so what do I care if it takes me 10 minutes to solve instead of 15, (even though I bet a modern generation CPU only based computer would be very close) I'm interested in solving BIG models that actually take more time to solve than it took me to setup, 30+ million cells which would be much to large for the GPU to handle.

Maybe for some people it might be beneficial, but don't be a lemming and drop a ton of cash on something with very limited advantages. Please, I'm not trying to be a jerk here, just trying to save people from dumping money unnecessarily because companies are trying to hype you up to access your wallet.

If anyone has a Tesla GPU and would like to benchmark against my CPU only based system, I'd be happy to run some benchmarks just so we can see an actual real world test, not a marketing oriented sales pitch by ANSYS and NVIDIA.


Daveo643 March 6, 2013 23:57


All times are GMT -4. The time now is 22:15.