|
[Sponsors] |
![]() |
![]() |
#21 | |
Member
Matt
Join Date: May 2011
Posts: 36
Rep Power: 13 ![]() |
Quote:
That said, I discovered after the fact that 100Gbit IB was MASSIVELY overkill for a 2-node cluster, probably even for a 16-node cluster. RHEL's port counter reports less than 10Mbit of traffic over the IB interface even when running the most intensive simulations. I'm sure the lower latency of IB helps simulation speed a bit, but if I had to do it over again I'd probably just stick with 10Gbit ethernet for such a small cluster (or old/cheap QDR IB hardware). But with the soon-to-be-released EPYC Genoa CPUs having up to 192 cores per 2P node, you can go awfully far with a single node nowadays. |
||
![]() |
![]() |
![]() |
![]() |
#22 | |
New Member
Join Date: Oct 2022
Posts: 19
Rep Power: 2 ![]() |
Quote:
I am in process of receiving the parts (with 2X AMD EPYC 7573X). I have noticed that the Motherboard has exactly 6 connectors for the fans, although 2 are CPU fans and 4 are system fans. With the configuration of fans proposed, should I follow any guidelines? In particular: Should I connect two CPU fans and four system fans (randomly chosen, or, for example, one from the Top side and one from the bottom side as CPU fans), or buy a kind of a 1x3 socket to have the six fans as a CPU fans? Thank you |
||
![]() |
![]() |
![]() |
![]() |
#23 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,262
Rep Power: 44 ![]() ![]() |
If you have the Gigabyte board, it doesn't matter a whole lot where you plug in the fans.
Having the CPU fans connected to the corresponding CPU fan header is still a good idea. Beyond that, you can set curves for any fan based on any sensor value you want. What I did was use the rest of the fan headers for whatever fans were closest. And then set a single fan curve for all fans simultaneously, based on memory and CPU temperature. I.e. same as plugging all fans into a single header. Just without the risk of burning a fan header. Having 3-4 of these low-power fans on a single header is no problem though, they are rated for much higher currents. |
|
![]() |
![]() |
![]() |
![]() |
#24 | |
New Member
Join Date: Oct 2022
Posts: 19
Rep Power: 2 ![]() |
Quote:
The fans need adjustments, or the configuration by default it is okay? If not, I understand that I should user Smart Fan 5, right? And what should be the curve? Thank you |
||
![]() |
![]() |
![]() |
![]() |
#25 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,262
Rep Power: 44 ![]() ![]() |
I don't know what smart fan is.
You get access to fan controls and tons of other useful stuff with the boards IPMI interface. Gigabyte calls it management console if you want to search for the manual. Maybe just try if you are ok with the default settings before you tinker with that. |
|
![]() |
![]() |
![]() |
![]() |
#26 | |
New Member
Join Date: Oct 2022
Posts: 19
Rep Power: 2 ![]() |
Quote:
I have received everything except from the motherboard and the processors, so I am still waiting. I have watched some videos and I have noticed that the heat sink does not seem to be included in the processor. Am I right? If yes, do I need a heat sink? Are there any recommendations for the 7573X? Thank you |
||
![]() |
![]() |
![]() |
![]() |
#27 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,262
Rep Power: 44 ![]() ![]() |
Yes, you need CPU coolers. Make sure they fit into whatever case you picked in the end.
Noctua NH-U14S TR4-SP3 would be my first choice for air cooling. |
|
![]() |
![]() |
![]() |
![]() |
#28 |
New Member
Join Date: Oct 2022
Posts: 19
Rep Power: 2 ![]() |
I bought the Phanteks Enthoo Pro 2. No problem, right? (I checked that the thickness of the cooler is less than the thickness of the case)
|
|
![]() |
![]() |
![]() |
![]() |
#29 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,262
Rep Power: 44 ![]() ![]() |
That will fit.
|
|
![]() |
![]() |
![]() |
![]() |
#30 |
New Member
Join Date: Oct 2022
Posts: 19
Rep Power: 2 ![]() |
||
![]() |
![]() |
![]() |
![]() |
#31 |
New Member
Join Date: Oct 2022
Posts: 19
Rep Power: 2 ![]() |
After some time, I have the workstation running. I performed a benchmarking test case with Ansys Fluent in the following conditions:
Mesh: 11.4M Solver: Simplec in an incompressible LES Number taken as a reference: Seconds to perform 10 iterations in a time step Remainder of the components: 2x AMD EPYC 7573X, 16x16 GB of RAM, running Windows 10 The (dissapointing) results are as follows: Cores Wall time (s) 1 340 2 160 4 87 8 46 16 28 32 20 64 17 Some comments: - I compared the speed in the case of 4 cores with a workstation of 4 cores that has an Intel(R) Xeon(R) E5-1630 v3 (3.8 GHz), and the one of 2xAMD EPYC 7573X runs twice faster. - The scaling is worse than I would expect based on the benchmarks of similar processors in OpenFoam. - I have noticed that the selection of the MPI (Ansys Fluent offers three options, the "default", the intelmpi, and the msmpi) has a strong importance on the results. The ones show above correspond to the "faster" one (msmpi). - It seems that the workstation has the virtualization turned on, although I selected in any case a maximum of 64 solver processes (not 128, which is the maximum that Ansys Fluent detects). Would the supression of the virtualization help? - I checked that during the simulations the CPU us running at 3.55 GHZ (the overclocked speed shown in the manual is 3.6 GHz). In any case, is there anything that I can do to improve the performance? I plan to perform a similar test case in OpenFoam, but 1-2 months later. |
|
![]() |
![]() |
![]() |
![]() |
#32 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,262
Rep Power: 44 ![]() ![]() |
I guess you mean SMT? Virtualization settings won't change anything here.
Best practice is to turn SMT off in bios, i.e. one HWthread per core. For CFD workloads, using NPS=4 mode is also advised. Auto settings will probably be NPS=1. With this board, you can go one step further by enabling "ACPI SRAT L3 Cache As NUMA Domain". In my testing, there were low single-digit performance improvements from enabling that, compared to just NPS=4. Your mileage may vary, especially on Windows. This board also comes with some presets, that may or may not be better than setting individual values. You can find that under workload tuning in the AMD CBS category. I think there is a preset with HPC in its name. Not sure which settings it touches specifically, there is no documentation available. |
|
![]() |
![]() |
![]() |
![]() |
#33 | |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 236
Rep Power: 11 ![]() |
Quote:
Setting up a small cluster I found relatively easy. At first I build a cluster with 5x r810 and blew the fuses because the power draw exceeded what I had for the room. The room can get warm too when you are running at full performance. I was able to run four at a time and achieve a 4x speedup compared to one. Later, I learned that the Dell r810 is actually a bad choice, because it has only two memory channels per cpu when equipped with four cpus. Right now 16 and 18 core Xeon 26xx v4 cpus are very cheap. The power efficiency and performance per node are much better than the r810. You can put together a four node (128 core total) cluster for about $2500 and achieve performance equal to a top of the line EPYC system. The peak power draw of the cluster will be a manageable ~1500W. For a very budget limited student or hobbyist, this is attractive, but for a professional environment probably not due to the billable hours associated with set-up and maintenance of four instead of one computers. |
||
![]() |
![]() |
![]() |
![]() |
#34 | |
New Member
Join Date: Oct 2022
Posts: 19
Rep Power: 2 ![]() |
Quote:
Yes, I was referring to the SMT. I have experience with Intel in the past, and I wrote the wrong name. I applied the modifications proposed (the same as the ones that appear on a guide that I found on the AMD official website) and now the results are as follows: Cores - Wall time (s) - Wall time HPC optimized (s) 1 - 340 - 353 2 - 160 - 173 4 - 87 - 87 8 - 46 - 45 16 - 28 - 25 32 - 20 - 14 64 - 17 - 10 Still a little bit worse than what I would expect, but the improvement with 32 and 64 nodes is important. I wil post a benchmark of OpenFoam in the corresponding forum topic once I have it installed and running. |
||
![]() |
![]() |
![]() |
![]() |
#35 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,262
Rep Power: 44 ![]() ![]() |
That's a pretty huge improvement from just configuring the system properly.
No need to be disappointed by poor scaling. You get a 35x speedup between 1 and 64 threads. That is within expectation for a CFD workload, and also lines up with similar hardware in the OpenFOAM benchmark thread. |
|
![]() |
![]() |
![]() |
![]() |
#36 | |
New Member
Join Date: Oct 2022
Posts: 19
Rep Power: 2 ![]() |
Quote:
![]() |
||
![]() |
![]() |
![]() |
Thread Tools | Search this Thread |
Display Modes | |
|
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
choose AMD CPU of workstation for ANSYS fluent | jonswap | Hardware | 1 | October 28, 2021 16:50 |
Fluent Workstation for online rent | hares | FLUENT | 4 | December 13, 2016 14:32 |
32 CPUs Workstation V.S. Cluster for Fluent | Anna Tian | FLUENT | 40 | July 17, 2014 01:10 |
Fluent and Silicon Graphics workstation | Swati Mohanty | FLUENT | 0 | September 25, 2006 00:02 |
workstation for Fluent | burley | FLUENT | 1 | January 9, 2000 08:59 |