CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

Workstation for Ansys Fluent

Register Blogs Members List Search Today's Posts Mark Forums Read

Like Tree3Likes

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   October 27, 2022, 16:28
Default
  #21
Member
 
Matt
Join Date: May 2011
Posts: 36
Rep Power: 13
the_phew is on a distinguished road
Quote:
Originally Posted by evcelica View Post
I couldn't agree more with this. I spent so much time configuring and maintaining the Infiniband clusters I've built in the past, mainly because I thought it was fun at the time. But now I just use a single large workstation, as the extra effort just wasn't worth it in my case.
I didn't know anything about Infiniband networking when I stood up a 2-node cluster I now use for CFD, I just bought some Mellanox adapters and copper interconnects and hooked 'em up. I don't know about other OSs, but RHEL 8.4 pretty much took care of everything automatically. I just had to enable OpenSM on the head node, which you don't even have to do if you have an IB switch.

That said, I discovered after the fact that 100Gbit IB was MASSIVELY overkill for a 2-node cluster, probably even for a 16-node cluster. RHEL's port counter reports less than 10Mbit of traffic over the IB interface even when running the most intensive simulations. I'm sure the lower latency of IB helps simulation speed a bit, but if I had to do it over again I'd probably just stick with 10Gbit ethernet for such a small cluster (or old/cheap QDR IB hardware).

But with the soon-to-be-released EPYC Genoa CPUs having up to 192 cores per 2P node, you can go awfully far with a single node nowadays.
the_phew is offline   Reply With Quote

Old   November 30, 2022, 10:09
Default
  #22
New Member
 
Join Date: Oct 2022
Posts: 19
Rep Power: 2
Beans8 is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
Motherboard
I'm not 100% on this, but the older revisions 1 and 2 might support Milan CPUs after a bios update. That would be a question for Gigabyte support.
But yeah, I would probably want revision 3-4. Just contact the seller in advance and ask them about this. The information they put on their website might just be outdated or a placeholder. Most sellers don't specify revisions at all, so you have to ask anyway.

What I liked about the Gigabyte board in particular:
1) Sufficient cooling for VRMs to work in a workstation case without much hassle. Supermicro are server boards first, relying on server-grade airflow (=loud) to cool the components.
2) Built-in fan control. Supermicro boards are always a pain in the ass to use for workstations. The default fan thresholds will identify normal fans as faulty. And dialing in a fan curve requires some serious effort with 3rd party tools.
Gigabytes solution is just much more elegant, easier to use and works better.

RAM
You need registered ECC memory. There aren't many degrees of freedom here. Maybe you were looking at UDIMM instead?
https://geizhals.de/?cat=ramddr3&xf=..._RDIMM+mit+ECC

CASE
Fractal torrent is not a bad case for air cooling and will likely work fine on account of brute force.
However, you will probably end up using Noctua CPU coolers on these SP3 sockets. They blow bottom-to-top. The top, where you would want to exhaust the heat from CPUs+RAM, is closed off in the Fractal Torrent.
The ideal case for air cooling dual-socket Epyc is the Phanteks Enthoo Pro 2. It has plenty of room for fans in the bottom an the top. It doesn't come with any fans installed though. Arctic F12 PWM (3x bottom) and Arctic F14 PWM (3x top) are a great low-cost option at 5 a piece.

POWER SUPPLY
CORSAIR 1000RMe will work.
Hello,

I am in process of receiving the parts (with 2X AMD EPYC 7573X). I have noticed that the Motherboard has exactly 6 connectors for the fans, although 2 are CPU fans and 4 are system fans. With the configuration of fans proposed, should I follow any guidelines? In particular:

Should I connect two CPU fans and four system fans (randomly chosen, or, for example, one from the Top side and one from the bottom side as CPU fans), or buy a kind of a 1x3 socket to have the six fans as a CPU fans?

Thank you
Beans8 is offline   Reply With Quote

Old   November 30, 2022, 14:17
Default
  #23
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,262
Rep Power: 44
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
If you have the Gigabyte board, it doesn't matter a whole lot where you plug in the fans.
Having the CPU fans connected to the corresponding CPU fan header is still a good idea.
Beyond that, you can set curves for any fan based on any sensor value you want.

What I did was use the rest of the fan headers for whatever fans were closest. And then set a single fan curve for all fans simultaneously, based on memory and CPU temperature.
I.e. same as plugging all fans into a single header. Just without the risk of burning a fan header.
Having 3-4 of these low-power fans on a single header is no problem though, they are rated for much higher currents.
flotus1 is offline   Reply With Quote

Old   December 1, 2022, 04:41
Default
  #24
New Member
 
Join Date: Oct 2022
Posts: 19
Rep Power: 2
Beans8 is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
If you have the Gigabyte board, it doesn't matter a whole lot where you plug in the fans.
Having the CPU fans connected to the corresponding CPU fan header is still a good idea.
Beyond that, you can set curves for any fan based on any sensor value you want.

What I did was use the rest of the fan headers for whatever fans were closest. And then set a single fan curve for all fans simultaneously, based on memory and CPU temperature.
I.e. same as plugging all fans into a single header. Just without the risk of burning a fan header.
Having 3-4 of these low-power fans on a single header is no problem though, they are rated for much higher currents.
Yes, the motherboard will be the GIGABYTE MZ72-HB0.

The fans need adjustments, or the configuration by default it is okay? If not, I understand that I should user Smart Fan 5, right? And what should be the curve?

Thank you
Beans8 is offline   Reply With Quote

Old   December 1, 2022, 14:20
Default
  #25
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,262
Rep Power: 44
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
I don't know what smart fan is.
You get access to fan controls and tons of other useful stuff with the boards IPMI interface. Gigabyte calls it management console if you want to search for the manual.
Maybe just try if you are ok with the default settings before you tinker with that.
flotus1 is offline   Reply With Quote

Old   December 29, 2022, 11:13
Default
  #26
New Member
 
Join Date: Oct 2022
Posts: 19
Rep Power: 2
Beans8 is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
I don't know what smart fan is.
You get access to fan controls and tons of other useful stuff with the boards IPMI interface. Gigabyte calls it management console if you want to search for the manual.
Maybe just try if you are ok with the default settings before you tinker with that.
Thank you for your answer.

I have received everything except from the motherboard and the processors, so I am still waiting.

I have watched some videos and I have noticed that the heat sink does not seem to be included in the processor. Am I right? If yes, do I need a heat sink? Are there any recommendations for the 7573X?

Thank you
Beans8 is offline   Reply With Quote

Old   December 29, 2022, 14:07
Default
  #27
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,262
Rep Power: 44
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Yes, you need CPU coolers. Make sure they fit into whatever case you picked in the end.
Noctua NH-U14S TR4-SP3 would be my first choice for air cooling.
flotus1 is offline   Reply With Quote

Old   December 30, 2022, 17:28
Default
  #28
New Member
 
Join Date: Oct 2022
Posts: 19
Rep Power: 2
Beans8 is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
Yes, you need CPU coolers. Make sure they fit into whatever case you picked in the end.
Noctua NH-U14S TR4-SP3 would be my first choice for air cooling.
I bought the Phanteks Enthoo Pro 2. No problem, right? (I checked that the thickness of the cooler is less than the thickness of the case)
Beans8 is offline   Reply With Quote

Old   December 30, 2022, 18:07
Default
  #29
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,262
Rep Power: 44
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
That will fit.
flotus1 is offline   Reply With Quote

Old   January 2, 2023, 07:57
Default
  #30
New Member
 
Join Date: Oct 2022
Posts: 19
Rep Power: 2
Beans8 is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
That will fit.
Thank you!
Beans8 is offline   Reply With Quote

Old   February 6, 2023, 12:25
Default
  #31
New Member
 
Join Date: Oct 2022
Posts: 19
Rep Power: 2
Beans8 is on a distinguished road
After some time, I have the workstation running. I performed a benchmarking test case with Ansys Fluent in the following conditions:

Mesh: 11.4M
Solver: Simplec in an incompressible LES
Number taken as a reference: Seconds to perform 10 iterations in a time step

Remainder of the components:

2x AMD EPYC 7573X, 16x16 GB of RAM, running Windows 10

The (dissapointing) results are as follows:


Cores Wall time (s)
1 340
2 160
4 87
8 46
16 28
32 20
64 17

Some comments:

- I compared the speed in the case of 4 cores with a workstation of 4 cores that has an Intel(R) Xeon(R) E5-1630 v3 (3.8 GHz), and the one of 2xAMD EPYC 7573X runs twice faster.
- The scaling is worse than I would expect based on the benchmarks of similar processors in OpenFoam.
- I have noticed that the selection of the MPI (Ansys Fluent offers three options, the "default", the intelmpi, and the msmpi) has a strong importance on the results. The ones show above correspond to the "faster" one (msmpi).
- It seems that the workstation has the virtualization turned on, although I selected in any case a maximum of 64 solver processes (not 128, which is the maximum that Ansys Fluent detects). Would the supression of the virtualization help?
- I checked that during the simulations the CPU us running at 3.55 GHZ (the overclocked speed shown in the manual is 3.6 GHz).

In any case, is there anything that I can do to improve the performance? I plan to perform a similar test case in OpenFoam, but 1-2 months later.
Beans8 is offline   Reply With Quote

Old   February 6, 2023, 13:09
Default
  #32
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,262
Rep Power: 44
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
I guess you mean SMT? Virtualization settings won't change anything here.
Best practice is to turn SMT off in bios, i.e. one HWthread per core.
For CFD workloads, using NPS=4 mode is also advised. Auto settings will probably be NPS=1.
With this board, you can go one step further by enabling "ACPI SRAT L3 Cache As NUMA Domain". In my testing, there were low single-digit performance improvements from enabling that, compared to just NPS=4. Your mileage may vary, especially on Windows.

This board also comes with some presets, that may or may not be better than setting individual values. You can find that under workload tuning in the AMD CBS category. I think there is a preset with HPC in its name. Not sure which settings it touches specifically, there is no documentation available.
flotus1 is offline   Reply With Quote

Old   February 6, 2023, 18:36
Default
  #33
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 236
Rep Power: 11
wkernkamp is on a distinguished road
Quote:
Originally Posted by the_phew View Post
I didn't know anything about Infiniband networking when I stood up a 2-node cluster I now use for CFD, I just bought some Mellanox adapters and copper interconnects and hooked 'em up. I don't know about other OSs, but RHEL 8.4 pretty much took care of everything automatically. I just had to enable OpenSM on the head node, which you don't even have to do if you have an IB switch.

That said, I discovered after the fact that 100Gbit IB was MASSIVELY overkill for a 2-node cluster, probably even for a 16-node cluster. RHEL's port counter reports less than 10Mbit of traffic over the IB interface even when running the most intensive simulations. I'm sure the lower latency of IB helps simulation speed a bit, but if I had to do it over again I'd probably just stick with 10Gbit ethernet for such a small cluster (or old/cheap QDR IB hardware).

But with the soon-to-be-released EPYC Genoa CPUs having up to 192 cores per 2P node, you can go awfully far with a single node nowadays.

Setting up a small cluster I found relatively easy. At first I build a cluster with 5x r810 and blew the fuses because the power draw exceeded what I had for the room. The room can get warm too when you are running at full performance. I was able to run four at a time and achieve a 4x speedup compared to one.



Later, I learned that the Dell r810 is actually a bad choice, because it has only two memory channels per cpu when equipped with four cpus.


Right now 16 and 18 core Xeon 26xx v4 cpus are very cheap. The power efficiency and performance per node are much better than the r810. You can put together a four node (128 core total) cluster for about $2500 and achieve performance equal to a top of the line EPYC system. The peak power draw of the cluster will be a manageable ~1500W.


For a very budget limited student or hobbyist, this is attractive, but for a professional environment probably not due to the billable hours associated with set-up and maintenance of four instead of one computers.
wkernkamp is offline   Reply With Quote

Old   February 7, 2023, 06:36
Default
  #34
New Member
 
Join Date: Oct 2022
Posts: 19
Rep Power: 2
Beans8 is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
I guess you mean SMT? Virtualization settings won't change anything here.
Best practice is to turn SMT off in bios, i.e. one HWthread per core.
For CFD workloads, using NPS=4 mode is also advised. Auto settings will probably be NPS=1.
With this board, you can go one step further by enabling "ACPI SRAT L3 Cache As NUMA Domain". In my testing, there were low single-digit performance improvements from enabling that, compared to just NPS=4. Your mileage may vary, especially on Windows.

This board also comes with some presets, that may or may not be better than setting individual values. You can find that under workload tuning in the AMD CBS category. I think there is a preset with HPC in its name. Not sure which settings it touches specifically, there is no documentation available.
Thank you for your help.


Yes, I was referring to the SMT. I have experience with Intel in the past, and I wrote the wrong name.

I applied the modifications proposed (the same as the ones that appear on a guide that I found on the AMD official website) and now the results are as follows:

Cores - Wall time (s) - Wall time HPC optimized (s)
1 - 340 - 353
2 - 160 - 173
4 - 87 - 87
8 - 46 - 45
16 - 28 - 25
32 - 20 - 14
64 - 17 - 10

Still a little bit worse than what I would expect, but the improvement with 32 and 64 nodes is important.

I wil post a benchmark of OpenFoam in the corresponding forum topic once I have it installed and running.
wkernkamp likes this.
Beans8 is offline   Reply With Quote

Old   February 8, 2023, 04:20
Default
  #35
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,262
Rep Power: 44
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
That's a pretty huge improvement from just configuring the system properly.
No need to be disappointed by poor scaling. You get a 35x speedup between 1 and 64 threads. That is within expectation for a CFD workload, and also lines up with similar hardware in the OpenFOAM benchmark thread.
flotus1 is offline   Reply With Quote

Old   February 8, 2023, 14:52
Default
  #36
New Member
 
Join Date: Oct 2022
Posts: 19
Rep Power: 2
Beans8 is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
That's a pretty huge improvement from just configuring the system properly.
No need to be disappointed by poor scaling. You get a 35x speedup between 1 and 64 threads. That is within expectation for a CFD workload, and also lines up with similar hardware in the OpenFOAM benchmark thread.
Now I am not dissapointed, since the increase of performance for high number of cores has been very relevant. In any case, there is always the hope that the performance increase would be 64x instead of 35x
Beans8 is offline   Reply With Quote

Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
choose AMD CPU of workstation for ANSYS fluent jonswap Hardware 1 October 28, 2021 16:50
Fluent Workstation for online rent hares FLUENT 4 December 13, 2016 14:32
32 CPUs Workstation V.S. Cluster for Fluent Anna Tian FLUENT 40 July 17, 2014 01:10
Fluent and Silicon Graphics workstation Swati Mohanty FLUENT 0 September 25, 2006 00:02
workstation for Fluent burley FLUENT 1 January 9, 2000 08:59


All times are GMT -4. The time now is 12:57.