|
[Sponsors] |
![]() |
![]() |
#1 |
New Member
Join Date: Oct 2022
Posts: 19
Rep Power: 2 ![]() |
I am looking for a workstation with the following conditions:
- Software: Ansys Fluent - Lincense constraints: No - Type of simulations and cell count: LES simulations, single phase incompressible flow, SIMPLEC solver. - Cell count: Between 25 and 50 million. - Budget: Around 10.000 € - GPU: Just for post-processing. No ideas of using it for GPU-accelerated simulations - Location: Europe - Plan for the workstation: New parts, and if possible asembled when received. I am looking for the maximum performance at the maximum nubmer of cores; that is, I am not planning to use the workstation for simulating cases with lower cell counts, RANS, and so on. I have done some research, and I have noticed the importance of the balance between CPU speed and memory, but I do not feel myself confident to take a decision. There are some specific recommendations in old posts, but the processors are old as well. In any case, I have some specific questions? - What number of cores should I target? Based on what I have read, I estimate this value to be 40-48. - What should be the RAM memory? I estimate this to be between 128 Gb and 384 Gb. - What family of CPUs should I target? A dual Intel 6342 is good? It is from the same family as other recommended processors. - For the GPU, a NVIDIA A4000 seems to be a general recommendation. Is this okay? I have asked different computer shops, but it seems that the requirements of FVM are not known. In particular, I have received offers including processors such as AMD Threadripper that do not have good opinions here. Thank you |
|
![]() |
![]() |
![]() |
![]() |
#2 | ||||
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,263
Rep Power: 44 ![]() ![]() |
Quote:
But 48-64 would be a reasonable target for dual-socket systems Quote:
Quote:
Though if you can get them from your suppliers, AMD Epyc Milan CPUs would give you a little higher performance. In order of how fast they are: Epyc 7413 (24-core), Epyc 7513 (32-core), Epyc 7543 (32-core, twice the L3 cache) Quote:
|
|||||
![]() |
![]() |
![]() |
![]() |
#3 | |
New Member
Join Date: Oct 2022
Posts: 19
Rep Power: 2 ![]() |
Quote:
|
||
![]() |
![]() |
![]() |
![]() |
#4 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,263
Rep Power: 44 ![]() ![]() |
I'd ask what you mean by "parallel performance"
-Scaling, i.e. how fast this runs with with 64 threads compared to 1 thread -Total performance: how long it takes to solve your models But unfortunately, I can't answer either of these question. It depends... The important thing is that this is the best you can get for around 10000€. |
|
![]() |
![]() |
![]() |
![]() |
#5 | |
New Member
Join Date: Oct 2022
Posts: 19
Rep Power: 2 ![]() |
Quote:
|
||
![]() |
![]() |
![]() |
![]() |
#6 |
Member
Matt
Join Date: May 2011
Posts: 36
Rep Power: 13 ![]() |
For a single workstation, dual 7473X would probably be the best bang-for-buck within your price range, just make sure every DIMM slot is populated. GPU isn't too important if only used for post-processing, just get whatever is cheapest from the last couple generations with enough VRAM.
Although since you claim no core count licensing limit, a cluster comprised of prior-gen 2P nodes (dual EPYC Rome or dual 2nd-gen Xeon Scalable) with 80-96 total cores would be pretty hard to beat. But you'd be limited to the used/surplus market and all the risks therein. |
|
![]() |
![]() |
![]() |
![]() |
#7 | |
New Member
Join Date: Oct 2022
Posts: 19
Rep Power: 2 ![]() |
Quote:
Based on your comments and other feedback that I got, it seems clear that I should focus on the AMD Epyc instead of the Intel Xeon. What I have 100% clear are these two parts: SSD: Kingston KC3000 PCIe 4.0 NVMe SSD M.2 - 1 TB (I have other servers to store data) GPU: NVDIA RTX A2000 12 GB Regarding the processors, considering that maybe I can buy the components separately (and then a colleague builds the workstation) I have some additional doubts: 1) Your suggestions point out the Series 7003: https://www.amd.com/en/processors/epyc-7003-series EPYC 7543: 32 cores, 2.8GHz, 256MB EPYC 7473X: 24 cores, 2.8GHz, 768Mb What about these other options? (Assuming that I can afford to buy them) EPYC 7773X: 64 cores, 2.2GHz, 768MB EPYC 7573X: 32 cores, 2.8GHz, 768MB Maybe the answer is ''they perform better, of course, but they are more expensive'', but it is not clear to me what can be the differences in terms of real performance. If paying 20% more can provide 15% more performance, I can think of this as a real option. In any case, the upgrade from 256 Mb to 768 Mb makes an important difference? Or it depends on the processor? 2) For the RAM memory, I have found different motherboards for the 7003 series: For example, the Gigabyte MZ72, the MSI D4020, and the H11SSL-i. However, the Gigabyte has 16 slots, while the other two have only 8 slots; that is, in the first case I would have 16x16 Gb, while in the second I would have 8x32 Gb. Is there any difference? Or the performance is the same if I meet the requirement of filling all of the channels? Thank you in advance |
||
![]() |
![]() |
![]() |
![]() |
#8 | ||
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,263
Rep Power: 44 ![]() ![]() |
There are reasons why I did not include any Milan-X CPUs in my list of recommendations:
1) They do not fit the budget. The 24-core version retails for 4200€. No way to have a workstation built with 2 of those in the 10000€ range. Not even if you bought the parts and asembled it yourself 2) They are the right tool when constrained by parallel licenses. Due to higher performance per core, they help you make the most out of the limited licenses you have. The first post explicitly stated that licenses are not an issue. Quote:
Quote:
Either way, a list of motherboards that would work: https://geizhals.eu/?cat=mbsp3&xf=16...C7003%7E4921_2 I can recommend the Gigabyte MZ72-HB0 for a workstation. All CPUs mentioned so far have 8 memory channels. You need to populate them all for maximum performance. I.e. 16 DIMMs for 2 CPUs. Which is why I specified that the 256GB of RAM need to be populated as 16x16GB. Leaving half the memory channels empty can be a similar performance hit as using 1 instead of 2 CPUs. Even worse if you ignore memory population guidelines in the motherboard manual. |
|||
![]() |
![]() |
![]() |
![]() |
#9 | |
Member
Matt
Join Date: May 2011
Posts: 36
Rep Power: 13 ![]() |
Quote:
Listed as backordered, but they still list 3 day lead time, so who knows. I have never ordered from this etailer. Heck, you could almost swing 2x32 cores of Milan-X: https://www.ebay.com/itm/17534318207...Bk9SR-yr-sT_YA Depending on grid size, 3D cache seems to offer more speedup than more cores above 3 cores per memory channel. |
||
![]() |
![]() |
![]() |
![]() |
#10 | |
New Member
Join Date: Oct 2022
Posts: 19
Rep Power: 2 ![]() |
Quote:
Thank you very much for your comments. I was thinking of the option of 1x64 as an alternative to 2x32 (or 2x28). My reasoning is "Even if the clock is lower in the case of 1x64, maybe the communication is better with a single CPU instead of with two-CPUs''. But I understand that this is not the case. Between the EPYC 7473X and the EPYC 7573X I can expect an increase of 25% of performance? (Just based on the number of cores) Thank you |
||
![]() |
![]() |
![]() |
![]() |
#11 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,263
Rep Power: 44 ![]() ![]() |
Don't know any exact numbers, there are very few detailed benchmark for the Milan-X CPUs. And the impact of the increased L3 cache is very dependent on the workload.
But...the 7473X and 7573X now retail for about the same amount of money in my part of the world. Around 4200€. So if you are going to buy parts, definitely get the 7573X. Considering the cost of the whole computer, it is probably still worth it even when buying from some SI. If you can stretch your budget far enough, definitely get the 7573X. It is the best CPU for CFD after all. At least until the end of the year when the next generation launches. |
|
![]() |
![]() |
![]() |
![]() |
#12 | |
New Member
Join Date: Oct 2022
Posts: 19
Rep Power: 2 ![]() |
Quote:
I think that I can afford to pay a little more than 10.000 € if this allow me to have a decent workstation. I have done some additional research, and I have some questions: MOTHERBOARD If I understand well, there are two variants of the Gigabyte MZ72-HB0. The 1x and the 3.0/4.0. Based on the information provided on the website, the former was built for the 7002 series, while the latter was built for the 7003 series. I am correct? If yes, I must consider only the 3.0/4.0 variant? I mention this because 100 % of the stores that I have found in Europe have the 1x version. I can contact more companies in any case. As an alternative, the SUPERMICRO H12DSi-NT6 costs more or less the same and it can be used for both 7002 and 7003 series. Is this a good option? RAM With the specifications of 16 Gb DDR4 3200 there are different models whose price ranges from 50 € to 100 €. Is there any additional parameter that I should take into account to buy the RAM? The voltage is something that varies depending on the model, but I am not sure if this makes a difference CASE The Fractal Torrent seems to be valid for the Supermicro and Gigabyte motherboards, and includes five fans. I have read some reviews and this seems to be a good option on terms of cooling. What is your opinion about this case? POWER SUPPLY Based on the calculators that are available on the internet, I have deduced that. I need a power supply of 1000 W. Something like the CORSAIR 1000RMe works? I think that with all of these components I would be ready to assemble them (a colleague can do that, but I should provide all of the components). Thank you |
||
![]() |
![]() |
![]() |
![]() |
#13 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,263
Rep Power: 44 ![]() ![]() |
Motherboard
I'm not 100% on this, but the older revisions 1 and 2 might support Milan CPUs after a bios update. That would be a question for Gigabyte support. But yeah, I would probably want revision 3-4. Just contact the seller in advance and ask them about this. The information they put on their website might just be outdated or a placeholder. Most sellers don't specify revisions at all, so you have to ask anyway. What I liked about the Gigabyte board in particular: 1) Sufficient cooling for VRMs to work in a workstation case without much hassle. Supermicro are server boards first, relying on server-grade airflow (=loud) to cool the components. 2) Built-in fan control. Supermicro boards are always a pain in the ass to use for workstations. The default fan thresholds will identify normal fans as faulty. And dialing in a fan curve requires some serious effort with 3rd party tools. Gigabytes solution is just much more elegant, easier to use and works better. RAM You need registered ECC memory. There aren't many degrees of freedom here. Maybe you were looking at UDIMM instead? https://geizhals.de/?cat=ramddr3&xf=..._RDIMM+mit+ECC CASE Fractal torrent is not a bad case for air cooling and will likely work fine on account of brute force. However, you will probably end up using Noctua CPU coolers on these SP3 sockets. They blow bottom-to-top. The top, where you would want to exhaust the heat from CPUs+RAM, is closed off in the Fractal Torrent. The ideal case for air cooling dual-socket Epyc is the Phanteks Enthoo Pro 2. It has plenty of room for fans in the bottom an the top. It doesn't come with any fans installed though. Arctic F12 PWM (3x bottom) and Arctic F14 PWM (3x top) are a great low-cost option at 5€ a piece. POWER SUPPLY CORSAIR 1000RMe will work. |
|
![]() |
![]() |
![]() |
![]() |
#14 | |
New Member
Join Date: Oct 2022
Posts: 19
Rep Power: 2 ![]() |
Quote:
Regarding the RAM, I was missing the ECC. I have decided to buy the Kingston*DDR4-3200 ECC Reg KSM32RS8/16MFR, and I will follow your suggestions and I will try to buy the revision 3-4 of the Gigabyte motherboard, and the recommended case and fans. I do not know how much time will take to built it and start using the workstation, but once I have it I will post a reply with the performance results in Ansys Fluent. Thank you very much for your help, it has been very useful |
||
![]() |
![]() |
![]() |
![]() |
#15 |
New Member
Join Date: Oct 2022
Posts: 19
Rep Power: 2 ![]() |
By the way, even if I read the post about performances in OpenFoam and other post similar tomine, I still do not catch where is the key of the performance in CFD. This is what I understand:
Loosely speaking, I understand that there are three types of memory: The SSD, the RAM, and the caché. The differences between them (ordered from slower to faster) are of about one order of magnitude. When solving the NS equations in parallel, the SSD only plays the role of saving data post or pre-simulation, and the RAM stores intermediate values in the calculation such as matrices. Based on the above, I have some questions: - What does the caché memory store? - What is the cause of the limited performance of CFD solvers with high number of cores? In particular, the scaling over 16 cores (the value depends on the processor) I was thinking that the reason was the slow memory, but I read in other post that the L3 caché only increases the performance in all number of cores and not the scalability. So having L3 caché of the order of the RAM memory would not improve the scalability? If the answer is no, what should be the change (in an ideal world) to have 100% of scalability at all number of cores? - OpenFOAM and Fluent run (mainly) with implicit solvers. Is the scaling problem solved with explicit solvers? (such as the ones used in high-order methods) - I have read that running with more than around 2x32 cores does not improve performance (at least with limited budget). So what is the procedure followed in supercomputers? (Maybe the answer to this question is the same as the one to the second question) Thank you |
|
![]() |
![]() |
![]() |
![]() |
#16 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,263
Rep Power: 44 ![]() ![]() |
CPU caches are...complicated.
One easy way to look at them: they are just another tier of even faster storage. Before fetching data from RAM (slow compared to CPU cache), the CPU checks if the values can be found in one of the caches for whatever reason. Two of the reasons such a cache hit might occur: 1)The value has been copied into cache previously because it was recently used in a calculation 2)The CPU predicted that the value might be needed soon, and pre-fetched it from RAM into cache. Such cache hits save the time it would take fetch the values from memory when they are already needed. Which in terms of CPU cycles is an extremely long time. Bigger caches means higher probability of cache hits, and thus reduced dependence on slow RAM. Less than ideal intra-node scaling in CFD is mostly caused by a memory bandwidth bottleneck. The CPU cores can chew through the computations faster than the memory can provide the data required for the computations. In addition, memory latency increases when the memory interface is utilized close to maximum bandwidth. Making memory access even slower. Search terms if you want to know more: "roofline model" "arithmetic intensity" "loaded memory latency" Supercomputers -i.e. clusters- just add more CPUs. The limitations here are caused by so-called shared CPU resources. A CPU has a finite amount of them (last level cache, memory bandwidth...) and the CPU cores compete for utilization. They share them. Adding more CPU cores doesn't help when the shared CPU resources are already fully utilized. That's where clusters come in. Each compute node in a cluster is an additional set of shared CPU resources. And only the CPU cores within that node need to compete for them. Intra-node scaling, the type of scaling we discussed up to this point, is limited by the shared CPU resources. Inter-node scaling is completely unrelated. Adding more compute nodes adds shared CPU resources at the same rate. Which is why CFD codes which stop scaling within a node around 4 cores per memory channel, can still scale linearly to thousands of cores when run on several compute nodes. There are no "ideal" CPUs for memory bound workloads because they are a relatively small niche in computing. Development will always focus on maximum density for compute-bound applications first. Caches are expensive. Both in terms of the area they occupy on a die, and in terms of power consumption. And a rule of thumb is: bigger caches are slower. AMD kind of got around this rule with "3D V-cache", but one of the reasons is that their regular L3 caches are relatively high latency to begin with. Same for memory bandwidth. Just adding more channels isn't easy. Not only, but also due to the fact that a fairly large percentage of the ~4000 pins on current-gen server CPUs is allocated to memory. More channels means even more pins. Larger sockets, smaller pins, no more sockets at all because the CPU gets soldered to the board directly. All expensive solutions, or stuff customers aren't willing to accept yet. And you can always use a cluster if you need more performance for memory-bound applications. Well not always, but at least for what we usually need. |
|
![]() |
![]() |
![]() |
![]() |
#17 | |
New Member
Join Date: Oct 2022
Posts: 19
Rep Power: 2 ![]() |
Quote:
If the way of working of clusters (with nodes) is much more effective, it is not possible to build a small cluster, why there are no clusters of let’s say, 16 nodes of 4 cores each? Or that solution is much more expensive than a workstation of 2x32 cores? |
||
![]() |
![]() |
![]() |
![]() |
#18 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,263
Rep Power: 44 ![]() ![]() |
Clusters consisting of nodes with relatively cheap commodity hardware are a thing, yes.
There are downsides of course: 1) The setup is not as trivial as for a single computer. 2) Multiple costs. You need motherboards, PSU and a case for each node. The components can be cheaper, but it still adds up 3) Node interconnects. Good old Ethernet can work for very small clusters. But at some point, inter-node scaling becomes limited by the node interconnect. That's when you need to look into Infiniband and the likes. This hardware is prohibitively expensive when bought new. If you put in the research effort, you can buy used adapters and switches fairly cheap on ebay. 4) Other limitations: what if your meshing process requires a large amount of memory in a single shared memory system. But all you have is 16 nodes with 16GB of memory each. You might get around this with one of the nodes having more memory than the others. But that's just another bit of added cost for the cluster Long story short: it can be cheaper and/or faster than a single node. But you have to put in a lot of research effort first. And you give up some "quality of life features". I would only recommend that for "enthusiasts". People who enjoy tinkering with hardware and software, and don't consider it a waste of time that could be spent with more productive tasks. |
|
![]() |
![]() |
![]() |
![]() |
#19 | |
New Member
Join Date: Oct 2022
Posts: 19
Rep Power: 2 ![]() |
Quote:
|
||
![]() |
![]() |
![]() |
![]() |
#20 | |
Senior Member
Erik
Join Date: Feb 2011
Location: Earth (Land portion)
Posts: 1,132
Rep Power: 22 ![]() |
Quote:
|
||
![]() |
![]() |
![]() |
Thread Tools | Search this Thread |
Display Modes | |
|
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
choose AMD CPU of workstation for ANSYS fluent | jonswap | Hardware | 1 | October 28, 2021 15:50 |
Fluent Workstation for online rent | hares | FLUENT | 4 | December 13, 2016 13:32 |
32 CPUs Workstation V.S. Cluster for Fluent | Anna Tian | FLUENT | 40 | July 17, 2014 00:10 |
Fluent and Silicon Graphics workstation | Swati Mohanty | FLUENT | 0 | September 24, 2006 23:02 |
workstation for Fluent | burley | FLUENT | 1 | January 9, 2000 07:59 |