CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   Hardware (https://www.cfd-online.com/Forums/hardware/)
-   -   HPC server for a start-up (https://www.cfd-online.com/Forums/hardware/229922-hpc-server-start-up.html)

killian153 August 31, 2020 17:35

HPC server for a start-up
 
Hello everyone,

We need to build a small HPC server for our start-up to perform (mostly) CFD calculations. I will try to be as clear as possible:

Our needs:
  • We will run simulations with up to 5 million cells (2D/3D steady and DES/LES simulations)
  • We will be 2 intensive users (ANSYS Fluent)
  • We have a budget of maximum 7000-8000€
  • We have 4 HPC packs so up to 512 cores
What I've learned so far (tell me if I'm wrong):
  • We should prioritize total memory bandwidth+ and total cache over number of cores and frequency
  • 4-8Gb RAM/core is recommended
  • RAM scale is really important and RAM should be applied equally to each channel
  • ECC RAM is not necessary if we are ok about losing one or two calculations per year
  • For CFD-post, the good thing is to maximize VRAM
What we think (but we might be wrong):

CPU: 2x AMD EPYC 7302 16C
-> why ? good price/value, good total memory bandwidth, 128MB L3 cache and a reasonnable TDP

GPU: RTX 2070
why ? -> 8Go of VRAM and prices dropped with this Gen as new RTX 3xxx are coming

RAM: 128Go (64Go/CPU) 3200MHz but how to allocate it? 2x-8x8Go would be a good choice?

Storage: SSD (no QLC) would be better, but how much space should we allow?

OS: I read CentOS is a good choice for an HPC server

Server hardware: rack, but what model would you recommend? Something like Lenovo Thinksystem SR665?


Thank you for your answers! :)

flotus1 September 1, 2020 13:14

How do you intend to access this PC? Do you need both users working in a graphical environment at the same time? Is this just the first one of many compute nodes to come, hence 4 HPC packs?

killian153 September 1, 2020 15:48

Hello flotus1,


First, thank you for all your answers on this forum, I learned a lot from you!

Quote:

How do you intend to access this PC?
We will access this via RDP (the HPC server will be connected on our domain and we'll have a Windows Server which will make the link).

Quote:

Do you need both users working in a graphical environment at the same time?
Very good question. My answer would be yes, because at first we'll be 2 and we can easily sync each other to not connect at the same time but when we'll add users, it can be complicated to manage.

Quote:

Is this just the first one of many compute nodes to come, hence 4 HPC pack?
The 4 HPC pack is not a choice, it comes with our start-up offer from ANSYS and we do not need that much. The server will be the only one for at least 2-3 years I think. I can't say at the moment if we'll make multiple nodes (as it's expensive and more complicated to manage).


Thanks!

flotus1 September 1, 2020 16:51

That's way outside of my area of expertise, so I might be completely wrong here. But you should definitely check whether Nvidias consumer GPUs allow several sessions at the same time. The keyword here might be SR-IOV. But again, total shot in the dark on my side :confused:

Anyway, back to basics:
Quote:

We should prioritize total memory bandwidth+ and total cache over number of cores and frequency
More last level cache is nice to have, but it depends on what CPU you have exactly, and there is definitely a point of diminishing returns. One of the reasons for huge L3 caches on AMDs Zen architecture is to mask rather high memory latency. So maybe don't buy Epyc 7532 just to get more cache per core.
Quote:

4-8Gb RAM/core is recommended
I know quite a few software vendors do this kind of recommendation. But with CFD in particular, all you need is enough memory to fit your largest model. GB per core is a rather useless metric.
Quote:

RAM scale is really important and RAM should be applied equally to each channel
No idea what you mean by RAM scale, but I agree that you should aim for a balanced memory population across all channels for maximum performance.
Quote:

ECC RAM is not necessary if we are ok about losing one or two calculations per year
A topic for endless debates, but luckily for us, we don't have to deal with it. Epyc CPUs only run with RDIMM, and there is really no point in sourcing exotic RDIMM without ECC. So just buy reg ECC memory.
Quote:

For CFD-post, the good thing is to maximize VRAM
Enough VRAM is definitely more important than having the absolute highest GPU performance. But again, make sure your GPU can handle two instances at the same time.
Quote:

CPU: 2x AMD EPYC 7302 16C
Either that, or since you have more than enough HPC licenses: Epyc 7352. That's a small increase to the total system cost, but 50% more cores.
Quote:

RAM: 128Go (64Go/CPU) 3200MHz but how to allocate it? 2x-8x8Go would be a good choice?
With two Epyc Rome CPUs, you need 16 DIMMs total. So probably 16x8GB DDR4-3200 reg ECC. Or 16x16GB if you expect your model sizes to increase in the future. Do my simulations run faster when I upgrade to faster hardware? No, I just run larger models ;)
Quote:

OS: I read CentOS is a good choice for an HPC server
It seems to be the standard choice for many HPC clusters. But with effectively one node, you can probably run whatever you are most comfortable with.
Quote:

Server hardware: rack, but what model would you recommend? Something like Lenovo Thinksystem SR665?
No hints on this one. Tell your SI what kind of hardware you need in your system, then they will probably make that choice for you.

trampoCFD September 2, 2020 00:47

64 Cores workstation
 
Hi Killian
I would suggest:
-AMD Epyc Rome procs because of best value for money in your price range,
-16x8GB 3.2GHz DDR4 ECC, for maximum memory bandwidth. That fills each memory slot with the smallest 3.2GHz DDR4 module. 5 M cells is a small model, even when distributed over many cores, 128 GB should be sufficient.
-Run your DES/LES in the cloud if possible. STAR-CCM+ scale well down to roughly 10,000 cells per core, I would hope Fluent is fairly similar, so you should be running your 5M cells LES sim on 500 cores, for a speed up x10 compare to your workstation.
-CentOS is a good option for CFD HPC.
-Not sure about sharing GPU, last time i looked into, Nvidia had a fantastic and incredibly expensive solution and it only worked through a VM/container.
You should be running most of your simulation in batch mode. Using a script is probably ok for 2 users. For more users, a batch job manager is probably a good idea. PBS works very well.
You'll most probably have to manage access for interactive CFD work.

Please have a look at our 64 Cores workstation:
https://trampocfd.com/collections/wo...es-workstation
If you like what you see, Please contact us through PM or contact form on our home page, and we’ll send you a questionnaire to make sure we address all your needs thoroughly.

Best regards,
Gui

killian153 October 8, 2020 14:48

Quote:

But you should definitely check whether Nvidias consumer GPUs allow several sessions at the same time. The keyword here might be SR-IOV. But again, total shot in the dark on my side :confused:
I found out on forums that NVIDIA doesn't allow consumer grades GPUs to be implemented into servers (according to their EULA)?? Is it true?

Hm SR-IOV seems to be for multi-virtualization. We won't virtualize our server, it will be core Linux with multi-users for simulations. So, SR-IOV is not necessary isn't it?

Quote:

More last level cache is nice to have, but it depends on what CPU you have exactly, and there is definitely a point of diminishing returns. One of the reasons for huge L3 caches on AMDs Zen architecture is to mask rather high memory latency. So maybe don't buy Epyc 7532 just to get more cache per core.
Indeed, I was aware of this high memory latency which is the ébad part" about AMD..

Quote:

I know quite a few software vendors do this kind of recommendation. But with CFD in particular, all you need is enough memory to fit your largest model. GB per core is a rather useless metric.
But how do I know how much memory I need to fit my largest model?

Quote:

A topic for endless debates, but luckily for us, we don't have to deal with it. Epyc CPUs only run with RDIMM, and there is really no point in sourcing exotic RDIMM without ECC. So just buy reg ECC memory.
Oh yes, you are right is ECC only so. ok. We'll go with ECC memory.

Quote:

Enough VRAM is definitely more important than having the absolute highest GPU performance. But again, make sure your GPU can handle two instances at the same time.
Would you recommend an RTX 4000 over an RTX 2070? I'm also not really sure if we can fit an consumer grade card in a server (it's heavy!).

Quote:

Either that, or since you have more than enough HPC licenses: Epyc 7352. That's a small increase to the total system cost, but 50% more cores.
I think we'll go with 2x EPYC 7352. It will be useful to have more cores for other users.

I'm also really lost about the kind of storage we should put in the server.. SSD vs HDDs? 500GB, 1TB?

killian153 October 8, 2020 14:58

Quote:

Originally Posted by trampoCFD (Post 781815)
Hi Killian
I would suggest:
-AMD Epyc Rome procs because of best value for money in your price range,
-16x8GB 3.2GHz DDR4 ECC, for maximum memory bandwidth. That fills each memory slot with the smallest 3.2GHz DDR4 module. 5 M cells is a small model, even when distributed over many cores, 128 GB should be sufficient.
-Run your DES/LES in the cloud if possible. STAR-CCM+ scale well down to roughly 10,000 cells per core, I would hope Fluent is fairly similar, so you should be running your 5M cells LES sim on 500 cores, for a speed up x10 compare to your workstation.
-CentOS is a good option for CFD HPC.
-Not sure about sharing GPU, last time i looked into, Nvidia had a fantastic and incredibly expensive solution and it only worked through a VM/container.
You should be running most of your simulation in batch mode. Using a script is probably ok for 2 users. For more users, a batch job manager is probably a good idea. PBS works very well.
You'll most probably have to manage access for interactive CFD work.

Please have a look at our 64 Cores workstation:
https://trampocfd.com/collections/wo...es-workstation
If you like what you see, Please contact us through PM or contact form on our home page, and we’ll send you a questionnaire to make sure we address all your needs thoroughly.

Best regards,
Gui

Hello Guillaume,

Thank you for your answer.

Unfortunately, we can't buy your products as we need EU warranty. It would be too complicated to buy from Australia.

Do you think we should build the server on a tower case or a rack is better?

I see you only put a 500GB NVMe for primary partition and 1TB SSD for main partition on your CFD workstation, is it really enough?

I also have zero idea about sharing GPUs. I mean the server will run on Linux which manages multiple users. So if userA runs a CFD-post, will the userB be capable of running a CFD-post render too?

Best regards

trampoCFD October 8, 2020 19:13

Hi Killian
1/ Our warranty is the parts manufacturers international warranty, the manufacturer ships a new part to you following a remote diagnostic that shows a defective part. Your location makes no difference.
2/ rack VS tower: do you have rack space? otherwise tower is the default.
3/SSD capacity: that totaly depends on your usage. How much data will your largest simulation produce? be carefull if your run transient simulation and generate data for making videos. you could be generating TBs of data with a 10M cells mesh.
4/I can't help with shared usage on a standard CPU. I looked at Nvidia Grid for our cloud solution but never went ahead with it.

best regards
gui@trampoCFD


All times are GMT -4. The time now is 02:19.