|
[Sponsors] |
September 27, 2020, 09:02 |
HPC system setup
|
#1 |
Member
SM
Join Date: Dec 2010
Posts: 97
Rep Power: 15 |
I am trying to work out a HPC system with USD 12,000 with a bias towards the EPYC processors.
The configuration I have in mind. Compute node Dual - AMD EPYC 7452 each with 32 Core, 2.35 GHz, 128 MB Cache Memory 256 GB (16GB x 16 Nos.) ECC DDR4 3200MHz 240GB Enterprise SSD Master node Single - AMD EPYC Rome 7252 8-Core 3.1 GHz Memory 128 GB (16GB x 8 Nos.) ECC DDR4 3200MHz 4 X 4TB SATA Enterprise Hard Disk CentOS I intend to start with 1 /2 compute nodes and add on latter as I get budget. Due to budget constraints Gigabit Ethernet instead of Infiniband. I have few questions specific/generic 1. Is ethernet really a bottleneck if I have to run across two compute nodes? 2. In that case is it better to go for a workstation if I don't plan to use more than 64 cores for a single run? 3. Should I have SSD or HDD on compute node? 4. Is single processor on Master node a problem if I just want to launch and manage the compute nodes? 5. Neglecting the price factor is 1 cpu with 64core better than 2 cpus with 32core? Any other comments/suggestions will be highly appreciated. |
|
September 27, 2020, 10:15 |
|
#2 | |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46 |
Quote:
2) Once more: depends on the application. If your application is compute-bound and all you need is 64 cores, there is definitely no need to distribute these cores across multiple machines. If your applications are more on the memory-bound side of the fence, splitting the cores across multiple machines/CPUs could increase performance, despite having to deal with node interconnects. 3) If these drives only need to hold the operating system for each node, it doesn't matter too much. If these drives are supposed to double as fast local storage for each node, SSDs are definitely the way to go. 4) Not a problem at all. In fact, if the master node only needs to handle node access and a central storage system, an Epyc CPU and 128GB of RAM are total overkill, and thus a waste of budget. 5) And again: depends on the type of HPC application you want to run. Compute-bound: a single CPU is fine. Memory bandwidth bound: multiple CPUs are better, because memory bandwidth is a shared resource, that increases with the amount of CPUs. |
||
September 27, 2020, 10:47 |
|
#3 | |||
Member
SM
Join Date: Dec 2010
Posts: 97
Rep Power: 15 |
Many thanks for the fast reply.
Very sorry for forgetting the application! I intend to do LES and eventually DNS of reacting i.e. combustion flows. The software that I intend to use is FLUENT (academic license), OpenFOAM and a FORTRAN MPI code. Quote:
Quote:
Quote:
What can be the cheapest alternative? (in case master node is doing the I/O) |
||||
September 27, 2020, 11:53 |
|
#4 | ||
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46 |
Both Ansys Fluent and OpenFOAM tend to become memory bandwidth bound somewhere between 2-4 cores per memory channel. In which case two 32-core Epyc CPUs are way better than a single 64-core CPU.
Since I don't know anything about your MPI parallel Fortran code, I can not comment on its computational intensity. Side-note on computational intensity (roughly: floating point operations per byte moved from and to memory): it does not correlate with the amount of memory allocated per core. Let alone the amount of memory available per core, which is a hardware metric. Quote:
I don't know about OpenFOAM, but it can surely be configured to write its output to a central storage system. Otherwise, a 256GB SSD seems way too small. While we are on the topic of storage: for LES, you definitely want fast interconnect between the nodes and storage. Don't go below 10Gigabit Ethernet. And a storage solution that can saturate the bandwidth here, which is around 1Gigabyte/s. With only Gigabit Ethernet, frequent writes of result files could slow down the calculation significantly. Quote:
|
|||
September 28, 2020, 08:45 |
|
#5 | |||||
Member
SM
Join Date: Dec 2010
Posts: 97
Rep Power: 15 |
Alex thanks a lot for your time and sharing from your vast experience!
Quote:
Quote:
Or something else ?If then Any literature/links? Quote:
Quote:
And I just found that 10Gbps is not a big price difference from 1Gbps. Quote:
How does it work? can I submit job and still use the node? Can I use PBS, SLURM etc on one of the compute nodes? |
||||||
September 28, 2020, 14:15 |
|
#6 | |||
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46 |
Quote:
https://moodle.rrze.uni-erlangen.de/...ew.php?id=7173 Quote:
For central storage, you should really consult with an IT professional, who can also help you set it up. There are many things to consider, like data integrity, redundancy, backups, speed... On the topic of speed: you won't get the sequential transfer speed needed to saturate 10Gigabit Ethernet out of 5 spinning hard drives. Unless you put them into RAID0. And I can not stress enough how bad of an idea that would be for the only central storage pool in a cluster. Here is what I did: 6x8TB hard drives in RAID6 for mass storage with low performance. Plus 8TB of NVMe storage without redundancy if higher performance is needed. There are much better ways to do it, that's just all I could do with the restriction I had. Again: ask an expert on storage solutions. Maybe your server vendor has some hints for you. He really should. Quote:
You can use any queuing system you like. As far as I know, none of these systems has an inherent restriction against computing on the "head" node, i.e. having no dedicated head node at all. Maybe you are thinking of larger commercial and academic clusters, where the head node is excluded from the queuing system. |
||||
September 30, 2020, 01:46 |
|
#7 | ||||
Member
SM
Join Date: Dec 2010
Posts: 97
Rep Power: 15 |
Quote:
Will go through and see if I can understand it! Quote:
Quote:
In addition to the SSD for OS & applications, for the central storage I had planned 5 X 4TB disks in RAID5. But have to rethink and figure out what to do after the suggestions. Quote:
I plan to use Ganglia/ PBS /SLURM Also is it worth to pay for enterprise OS? Just found this from https://www.ozeninc.com/ansys-system...ents/#tab-id-5 A headless server is a specialized machine meant for the sole purpose of computation. The server form factor, as well as the removed need for graphics capability, allows for maximization of computational ability. This form factor generally requires workstations capable of pre and postprocessing models. While users can manually remote into the machine, copy files over and press solve, the setup and usage of Remote Solve Manager is highly recommended to automate this process. Network speed is an important consideration, especially for transferring large result files. Looks another option if I want to restrict to one node at (i.e. 64 cores) at a time |
|||||
September 30, 2020, 12:02 |
|
#8 | ||
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46 |
Quote:
Quote:
|
|||
October 7, 2020, 14:52 |
|
#9 |
Member
SM
Join Date: Dec 2010
Posts: 97
Rep Power: 15 |
Just found that AMD EPYC 7551 is half the price of 7542.
Both of them have 32 cores. But AMD Epyc 7542 AMD Epyc 7551 Frequency 2.35GHz 2.00 GHz Turbo (1 Core) 3.40 GHz 3.00 GHz Turbo (All Cores) 3.20 GHz 2.55 GHz Architecture Rome (Zen 2) Zen Memory DDR4-3200 DDR4-2666 Memory channels 8 8 ECC Yes Yes L3 Cache 128.00 MB 64.00 MB PCIe version 4.0 3.0 So the flip side of 7551 is lower CPU speed, lower RAM speed, and half size L3 cache. Q. How much L3 cache will make difference? Q. What is all core turbo speed? Do we get that when all cores are loaded heavily? Finally, considering half the cost but same number of cores is it worth to switch to 7551? Or am I missing something? |
|
October 7, 2020, 16:36 |
|
#10 | ||
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46 |
You can get 1st gen Epyc CPUs with 32 cores for 600$ and less on ebay. Retail, not engineering samples.
But there is a good reason why 1st gen suddenly dropped in price when 2nd gen was launched, and prices are still falling: almost nobody wants them, because 2nd gen is just so much better. 1) Instructions per clock is significantly higher. Meaning that at the same core frequency, 2nd gen will be faster. How much depends on the code. 2) Core clock speeds are higher. 3) Support for faster memory, which definitely impacts the codes you intend to run 4) Last not least: simpler NUMA topology. The last one is the main reason why most people don't want 1st gen any more. For software that isn't NUMA-aware (or for operating systems with an over-zealous scheduler ), this can have a huge performance impact. For your applications, it is not a deal-breaker. In fact, configuring 2nd gen Epyc CPUs in NPS4 mode (results in 4 NUMA nodes per CPU, just like 1st gen had natively) is recommended for software that uses MPI+DD. The performance increase vs NPS1 can be around 10%. The downside: operations like e.g. mesh generation can run slower with this many NUMA nodes. I definitely see that on my workstation (2x Epyc 7551). As soon as my grid generator uses more memory than one NUMA node provides, performance drops by about 50%. It's a tradeoff you need to make: longer mesh generation time for large meshes vs longer simulation run times. Quote:
Quote:
In practice: actual clock speed can be higher or lower than that. Depending on the type of code that is run, the motherboard, bios settings, cooling... I have seen a few reports from people with 2nd gen Epyc CPUs that run faster than the advertised all-core turbo frequency. But don't count on that. It's not that important anyway, because with 32 cores, memory bandwidth starts to become a limiting factor. So higher clock speeds do not translate 1:1 to higher performance. You can take a look at the pinned thread in this sub-forum. For the OpenFOAM test case used there, 2nd gen beats 1st gen Epyc by about 25%. Maybe 30% with NPS4. I know the price difference between the CPUs looks like a lot, but in my opinion, it is worth it when you are not on an ultra-tight budget. Look at it this way: The CPUs may cost 100% more, but the total price increase for a whole workstation or cluster node is much less. Last edited by flotus1; October 9, 2020 at 14:15. |
|||
October 14, 2020, 14:06 |
|
#11 |
Member
SM
Join Date: Dec 2010
Posts: 97
Rep Power: 15 |
For the 10G interconnect what type of switch will be better? Managed one or unmanaged?
What other factors one should consider while going for a switch? I assume that number of ports should be equal to master + compute nodes. Any suggestions for a 8 port budget switch? |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
which windows operating system for multi-core system? | Chris Lee | Hardware | 11 | December 14, 2014 07:26 |
need MS HPC Pack to run in "parallel" mode | Chris Lee | SU2 | 0 | November 24, 2014 20:05 |
Operating System for Ansys 13 HPC | makkks | Hardware | 22 | August 7, 2014 06:43 |
plz rply urgent regrding vof model for my system | garima chaudhary | FLUENT | 1 | July 20, 2007 08:37 |
Need ideas-fuel discharge system | Jan | FLUENT | 1 | October 10, 2006 23:05 |