CFD Online Discussion Forums - Advice on the technical requirements for a new Fluent Workstation

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)

- Hardware (https://www.cfd-online.com/Forums/hardware/)

- - Advice on the technical requirements for a new Fluent Workstation (https://www.cfd-online.com/Forums/hardware/113263-advice-technical-requirements-new-fluent-workstation.html)

Geez, a little sensitive are we? I didn't go crying and running away when you gave me your pompous attitude and acted like I know nothing:

Quote:

Originally Posted by Daveo643 (Post 412103)

It's unfortunate that is doesn't work for you, but I don't run Mechanical and neither does the original poster of this thread. I'm going to keep trying and waiting for an authoritative answer on this. Thank you anyway.

That slightly overclocked machine shouldn't account for it being 60% faster than the Tesla. Any one of the upper end Sandy-Bridge E based CPUs should beat the Tesla in that benchmark.

Some guys at my work run Teslas with ANSYS, so I've seen their real world performance or lack thereof in certain cases. I'm sorry, but you come here and recommend something very expensive based on no personal experience, and just some marketing pamphlets you found. Then you get mad at me and stomp off for urging people to think about the real world performance and applications before making such a purchase? Fine, Sayonara buddy.

For what it's worth, I agree with evcelica, and WOW Daveo643 is a sensitive one.

The software to make GPU's a good choice just is not there yet. It seems to me that they show well with certain simulations on structured meshes where you can efficiently organize the simulation in memory, but they are a poor investment for the types of simulations that most of us are doing. You may be able to eek out 1.5x speedup on certain cases, but it often comes at 3x the cost. If you are writing research code for DNS flow in a square channel, then it might make sense to invest in a GPU for the computations. I don't see how it makes sense for anyone solving industrial problems.

The fastest machines for traditional CFD on unstructured meshes use the i7 CPUs with the x79 chipset and high speed memory. As stated earlier in this thread, this is because this is the most memory bandwidth per core available.

Thanks evcelica for linking me to this discusion. :)

I had exactly the same question in other post and now it's clear: There's no support at all for the GForce GPUs and probably never will be. I don't really understand why. They are not so powerful and have less memory (2+GBytes) as the Quadro or Tesla but they could be a solution in cases where there's no much money to invest.

Perhaps there's agreements to support only that hardware... so they can sell it for a higher cost.

I thought this was hilarious and would like to share it.

I was reading one of the marketing pamphlets Daveo643 posted: Boost your productivity through HPC (pdf)

On slide 19 one "customer" of ANSYS and their HPC and GPU solutions is analyzing 3D glasses and says:

“
By optimizing our solver selection and workstation configuration,
and including GPU acceleration, we’ve been able to dramatically
reduce turnaround time from over two days to just an hour. This
enables the use of simulation to examine multiple design ideas and
gain more value out of our investment in simulation.
”

I'm thinking what kind of piece of crap computer did they have before this 77x speedup? They say they upgraded "solver selection and workstation configuration, and including GPU acceleration" so I'm thinking these results are total B.S. and aren't comparable at all, they obviously compared a single core of the biggest P.O.S. computer solving out of core and disk thrashing to a high performance cluster just to make the comparison look good.... then I look at who the quote is from:

-Berhanu Zerayohannes, Senior Mechanical Engineer, NVIDIA

Ha Ha HA..
An NVIDIA engineer saying that NVIDIA Tesla GPUs gave him 77x speedup!!!!

REALLY?!?, marketing at its worst!

Hi everyone,
just a comment on the afore mentioned Hyper-Threading. As far as I know, CFD runs don't benefit from Hyper-Threading. The reason is, that the idea of Hyper-Threading is to make unused CPU-Power accessible.
Say you have a dual-core machine with hyper-threading, than you can run 4 processes that need half of the performance of a single core. Without hyper-Threading you will have difficulties with such a task. So Hyper-Threading is a kind of managing of processes so you can use 100% of your performance with multiple processes. The point of CFD runs is, that a single process uses 100% of a core's performance, so there is no speed up with hyper-threading.
I'm not sure whether the description of hyper-threading is technically correct, but I'm sure of the statement that Hyper-threading doesn't speed up CFD runs. I worked on different projects in university with OpenFoam and CFX and this statement was confirmed by every PhD I asked. On our 12-core cluster no one uses the 24 cores (it does has hyper-threading capability).

Hope it helps

I am running Flow3d Cast for HPDC on a BOXX Technologies Extreme 3D i7 3940 overclocked to 4.5ghz. 32gb of ram with a NVidia quarto 8000 video card and I can run a 134gb flow simulation within 24 hours.

Hi anvaloy,
that's interesting, can you give a little more detail on your simulation? Number of cells and which kind of Simulation you're running (steady,unsteady, RANS...)

Just out of curiosity, I ran OpenFOAM on i7-2600K with 4x 3.8 GHz and it crashed after 1,5 years and fried the mainboard as well. For how long are you running that system over clocked?

Cheers

Quote:

Originally Posted by CapSizer (Post 408196)

OK, it's a quiet Saturday evening, so I will bite ;-)

First of all, before you start agonizing about the hardware, it is necessary to address the question of the software licenses at your disposal. CFD software is way more expensive than the hardware you will be running on, so this needs to be sorted out first. There is no merit in getting a 16-core workstation if you can only run 8-way parallel. AFAIK, the way that Ansys markets parallel Fluent these days, you buy the "HPC" facility in steps of 8, 32, 128, 512 cores. If you have only paid for one HPC pack, you can only run 8-way parallel, so you have to figure out the best hardware configuration for 8 cores. The next step is the ability to run on 32 cores, which is a really big step. There's not much sense in having 32-core capability but only running say 16 cores. Sort this question out first before committing to any hardware. In my experience, 8 million cells will definitely be much better on 32-way parallel than 8-way, but there may be only relatively small gains if you try to go for more parallel cores than that.

8 million cells, coupled solver, 2-equation TM .... That would fit (I think!) fine in 16 GB of RAM if you are running single precision, but you may run out of memory if you need to use double precision. RAM is inexpensive, so I would say go for 32 GB rather. The effect of total memory system is that you need to have enough, but no more. You can't gain speed by adding more RAM, provided that you have enough to start with. If you run out of memory, everything stops. The computer will try to swap memory to disk, but that is so slow and unresponsive that you will be tempted to pull the plug in order to stop things. Not a good idea, by the way.

I have been advised by hardware experts to use ECC rather than ordinary RAM, but frankly I cannot say that I have ever seen a benefit from using ECC RAM on a single socket machine. Many (most, all?) multi-socket systems require ECC however.

The two characteristics of memory that do make a huge difference to CFD speed are the actual memory clock speed (get the fastest supported by the chipset) and the number of memory channels. Inexpensive single socket systems (AMD FX, Intel i5) use two parallel memory channels (typically, but not always, 4 slots in total). By contrast the current Intel i7 uses four (either 4 or 8 slots in total). When you measure CFD performance, you find that this is what really makes the difference. The server CPU's (Intel Xeon E5 and AMD Socket G34) will use 4 parallel channels per socket. So the nice thing about a dual socket server board is that you have a total of 8 memory slots feeding the CPU's, instead of the 4 that you would get from a single Core i7. Two Core i7 systems linked with GB ethernet will probably be competitive with a dual socket workstation (i.e. 8 memory channels for both systems), and probably cost a bit less, but distributed parallel is always just a little bit of a pain to deal with.

Neither CPU clock speed, nor cache size, nor even architecture, is as significant as the memory system, when it comes to performance in CFD. For example, and AMD FX8150 running either 4 or 8 cores, will be close in performance to the very different Intel Core i5 (4 cores), because both can use the same memory system (two channel 1600 MHz DDR3 as standard, although there are overclocking options). Neither can match the Core i7, with its 4 memory channels. The same effect is likely to be seen when comparing Opterons and Xeons. Yes, you can get 16 cores in an Opteron CPU, but these are fed by 4 memory channels, just like the 8 core Xeon, so don't expect it to be any quicker.

This is not to say, however, that clock speed, core core count and cache are insignificant, but sort the software license and memory questions out first. If your software license requires you to pay per parallel process, get a smaller number of the fastest cores that you can get. If parallel licensing is a flat fee (like Adapco power session) it starts making sense to go for more cores and more memory channels.

If all that you can afford is an 8-process HPC license, think in terms of two linked core i7's, or a Xeon workstation with two E5-2643 CPU's.

Your post is very helpful and interesting. Thanks!

You mentioned that usually the memory bandwidth matters. I'm wondering is there a upper limit of it so that it becomes very costly to further increase it and the increase of memory bandwidth won't improve the CFD running speed significantly if the memory bandwidth is already above that limit?

technical requirements for a new Fluent Workstation

ANSYS Mechanical is used for mechanical and structural engineering
analysis/simulation.
The solution is used to compute the
response of a structural system. The equation solvers that
are used to drive the simulation are computat
ional intensive.
The equation solvers run o
n central
processing unit (CPU)
core(s) and in add
ition can run
o
n
graphics processing unit
(s)
(GPU).
The GPU
hardware is p
arallel computer architecture.
The CPU core(s) will continue to be used for all other
computations in a
nd around the equation solver
when GPU hardware is used
.
The large arrays of
equ
ation solvers and datasets used in the
simulation require a large, fast memory system.
The
data storage files accessed during simulation benefit from dedicated, fast storage
I/O systems.
Use as much memory as possible
to minimize the I/O required.
The ap
plication has the ability to
use parallel computing (both shared memory and distributed memory). The distributed
memory
model can run on
a single
machine or across machines/nodes
(cluster)
connected via high speed
interconnect.

get latest updates

Hi guys. I have a core i7 6700k that has 4 cores and run at 4.2 ghz stock. I am planing to overclock it to 4.5 ghz which will lead to higher temps.since i am going to use ansys fluent for long period of time.so do you think it worth it?? Or 4.5 ghz will not have much of diffrence than 4.2.
Tnx

At the very least you need a good watercooling system. I topped out at 4.3 GHz on all cores of an i7 5820. It was quite stable at this point, but if I tried to push it further I would get crashes and its simply not worth the waste of time to restart simulations to put the system to the edge of stability for a small increase in clock speed.

Well i am using masterliquid lite 240 which is from coolermaster, wheb i run IuT( intel extrem utility tool) for stablity state for 3 hours at full load my temps stays below 70 c.is that good??

I think you can save yourself the trouble. If anything, consider clocking it down a bit for lower temperature and reliability. The speed of calculation is likely to be dominated by the speed of your memory system rather than the CPU clockspeed.