CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

HPC server for a start-up

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   August 31, 2020, 18:35
Default HPC server for a start-up
  #1
New Member
 
Killian
Join Date: Nov 2017
Posts: 26
Rep Power: 8
killian153 is on a distinguished road
Hello everyone,

We need to build a small HPC server for our start-up to perform (mostly) CFD calculations. I will try to be as clear as possible:

Our needs:
  • We will run simulations with up to 5 million cells (2D/3D steady and DES/LES simulations)
  • We will be 2 intensive users (ANSYS Fluent)
  • We have a budget of maximum 7000-8000€
  • We have 4 HPC packs so up to 512 cores
What I've learned so far (tell me if I'm wrong):
  • We should prioritize total memory bandwidth+ and total cache over number of cores and frequency
  • 4-8Gb RAM/core is recommended
  • RAM scale is really important and RAM should be applied equally to each channel
  • ECC RAM is not necessary if we are ok about losing one or two calculations per year
  • For CFD-post, the good thing is to maximize VRAM
What we think (but we might be wrong):

CPU: 2x AMD EPYC 7302 16C
-> why ? good price/value, good total memory bandwidth, 128MB L3 cache and a reasonnable TDP

GPU: RTX 2070
why ? -> 8Go of VRAM and prices dropped with this Gen as new RTX 3xxx are coming

RAM: 128Go (64Go/CPU) 3200MHz but how to allocate it? 2x-8x8Go would be a good choice?

Storage: SSD (no QLC) would be better, but how much space should we allow?

OS: I read CentOS is a good choice for an HPC server

Server hardware: rack, but what model would you recommend? Something like Lenovo Thinksystem SR665?


Thank you for your answers!
killian153 is offline   Reply With Quote

Old   September 1, 2020, 14:14
Default
  #2
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,397
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
How do you intend to access this PC? Do you need both users working in a graphical environment at the same time? Is this just the first one of many compute nodes to come, hence 4 HPC packs?
flotus1 is offline   Reply With Quote

Old   September 1, 2020, 16:48
Default
  #3
New Member
 
Killian
Join Date: Nov 2017
Posts: 26
Rep Power: 8
killian153 is on a distinguished road
Hello flotus1,


First, thank you for all your answers on this forum, I learned a lot from you!

Quote:
How do you intend to access this PC?
We will access this via RDP (the HPC server will be connected on our domain and we'll have a Windows Server which will make the link).

Quote:
Do you need both users working in a graphical environment at the same time?
Very good question. My answer would be yes, because at first we'll be 2 and we can easily sync each other to not connect at the same time but when we'll add users, it can be complicated to manage.

Quote:
Is this just the first one of many compute nodes to come, hence 4 HPC pack?
The 4 HPC pack is not a choice, it comes with our start-up offer from ANSYS and we do not need that much. The server will be the only one for at least 2-3 years I think. I can't say at the moment if we'll make multiple nodes (as it's expensive and more complicated to manage).


Thanks!
killian153 is offline   Reply With Quote

Old   September 1, 2020, 17:51
Default
  #4
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,397
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
That's way outside of my area of expertise, so I might be completely wrong here. But you should definitely check whether Nvidias consumer GPUs allow several sessions at the same time. The keyword here might be SR-IOV. But again, total shot in the dark on my side

Anyway, back to basics:
Quote:
We should prioritize total memory bandwidth+ and total cache over number of cores and frequency
More last level cache is nice to have, but it depends on what CPU you have exactly, and there is definitely a point of diminishing returns. One of the reasons for huge L3 caches on AMDs Zen architecture is to mask rather high memory latency. So maybe don't buy Epyc 7532 just to get more cache per core.
Quote:
4-8Gb RAM/core is recommended
I know quite a few software vendors do this kind of recommendation. But with CFD in particular, all you need is enough memory to fit your largest model. GB per core is a rather useless metric.
Quote:
RAM scale is really important and RAM should be applied equally to each channel
No idea what you mean by RAM scale, but I agree that you should aim for a balanced memory population across all channels for maximum performance.
Quote:
ECC RAM is not necessary if we are ok about losing one or two calculations per year
A topic for endless debates, but luckily for us, we don't have to deal with it. Epyc CPUs only run with RDIMM, and there is really no point in sourcing exotic RDIMM without ECC. So just buy reg ECC memory.
Quote:
For CFD-post, the good thing is to maximize VRAM
Enough VRAM is definitely more important than having the absolute highest GPU performance. But again, make sure your GPU can handle two instances at the same time.
Quote:
CPU: 2x AMD EPYC 7302 16C
Either that, or since you have more than enough HPC licenses: Epyc 7352. That's a small increase to the total system cost, but 50% more cores.
Quote:
RAM: 128Go (64Go/CPU) 3200MHz but how to allocate it? 2x-8x8Go would be a good choice?
With two Epyc Rome CPUs, you need 16 DIMMs total. So probably 16x8GB DDR4-3200 reg ECC. Or 16x16GB if you expect your model sizes to increase in the future. Do my simulations run faster when I upgrade to faster hardware? No, I just run larger models
Quote:
OS: I read CentOS is a good choice for an HPC server
It seems to be the standard choice for many HPC clusters. But with effectively one node, you can probably run whatever you are most comfortable with.
Quote:
Server hardware: rack, but what model would you recommend? Something like Lenovo Thinksystem SR665?
No hints on this one. Tell your SI what kind of hardware you need in your system, then they will probably make that choice for you.
flotus1 is offline   Reply With Quote

Old   September 2, 2020, 01:47
Default 64 Cores workstation
  #5
Member
 
Guillaume Jolly
Join Date: Dec 2015
Posts: 63
Rep Power: 10
trampoCFD is on a distinguished road
Send a message via Skype™ to trampoCFD
Hi Killian
I would suggest:
-AMD Epyc Rome procs because of best value for money in your price range,
-16x8GB 3.2GHz DDR4 ECC, for maximum memory bandwidth. That fills each memory slot with the smallest 3.2GHz DDR4 module. 5 M cells is a small model, even when distributed over many cores, 128 GB should be sufficient.
-Run your DES/LES in the cloud if possible. STAR-CCM+ scale well down to roughly 10,000 cells per core, I would hope Fluent is fairly similar, so you should be running your 5M cells LES sim on 500 cores, for a speed up x10 compare to your workstation.
-CentOS is a good option for CFD HPC.
-Not sure about sharing GPU, last time i looked into, Nvidia had a fantastic and incredibly expensive solution and it only worked through a VM/container.
You should be running most of your simulation in batch mode. Using a script is probably ok for 2 users. For more users, a batch job manager is probably a good idea. PBS works very well.
You'll most probably have to manage access for interactive CFD work.

Please have a look at our 64 Cores workstation:
https://trampocfd.com/collections/wo...es-workstation
If you like what you see, Please contact us through PM or contact form on our home page, and we’ll send you a questionnaire to make sure we address all your needs thoroughly.

Best regards,
Gui
trampoCFD is offline   Reply With Quote

Old   October 8, 2020, 15:48
Default
  #6
New Member
 
Killian
Join Date: Nov 2017
Posts: 26
Rep Power: 8
killian153 is on a distinguished road
Quote:
But you should definitely check whether Nvidias consumer GPUs allow several sessions at the same time. The keyword here might be SR-IOV. But again, total shot in the dark on my side
I found out on forums that NVIDIA doesn't allow consumer grades GPUs to be implemented into servers (according to their EULA)?? Is it true?

Hm SR-IOV seems to be for multi-virtualization. We won't virtualize our server, it will be core Linux with multi-users for simulations. So, SR-IOV is not necessary isn't it?

Quote:
More last level cache is nice to have, but it depends on what CPU you have exactly, and there is definitely a point of diminishing returns. One of the reasons for huge L3 caches on AMDs Zen architecture is to mask rather high memory latency. So maybe don't buy Epyc 7532 just to get more cache per core.
Indeed, I was aware of this high memory latency which is the ébad part" about AMD..

Quote:
I know quite a few software vendors do this kind of recommendation. But with CFD in particular, all you need is enough memory to fit your largest model. GB per core is a rather useless metric.
But how do I know how much memory I need to fit my largest model?

Quote:
A topic for endless debates, but luckily for us, we don't have to deal with it. Epyc CPUs only run with RDIMM, and there is really no point in sourcing exotic RDIMM without ECC. So just buy reg ECC memory.
Oh yes, you are right is ECC only so. ok. We'll go with ECC memory.

Quote:
Enough VRAM is definitely more important than having the absolute highest GPU performance. But again, make sure your GPU can handle two instances at the same time.
Would you recommend an RTX 4000 over an RTX 2070? I'm also not really sure if we can fit an consumer grade card in a server (it's heavy!).

Quote:
Either that, or since you have more than enough HPC licenses: Epyc 7352. That's a small increase to the total system cost, but 50% more cores.
I think we'll go with 2x EPYC 7352. It will be useful to have more cores for other users.

I'm also really lost about the kind of storage we should put in the server.. SSD vs HDDs? 500GB, 1TB?
killian153 is offline   Reply With Quote

Old   October 8, 2020, 15:58
Default
  #7
New Member
 
Killian
Join Date: Nov 2017
Posts: 26
Rep Power: 8
killian153 is on a distinguished road
Quote:
Originally Posted by trampoCFD View Post
Hi Killian
I would suggest:
-AMD Epyc Rome procs because of best value for money in your price range,
-16x8GB 3.2GHz DDR4 ECC, for maximum memory bandwidth. That fills each memory slot with the smallest 3.2GHz DDR4 module. 5 M cells is a small model, even when distributed over many cores, 128 GB should be sufficient.
-Run your DES/LES in the cloud if possible. STAR-CCM+ scale well down to roughly 10,000 cells per core, I would hope Fluent is fairly similar, so you should be running your 5M cells LES sim on 500 cores, for a speed up x10 compare to your workstation.
-CentOS is a good option for CFD HPC.
-Not sure about sharing GPU, last time i looked into, Nvidia had a fantastic and incredibly expensive solution and it only worked through a VM/container.
You should be running most of your simulation in batch mode. Using a script is probably ok for 2 users. For more users, a batch job manager is probably a good idea. PBS works very well.
You'll most probably have to manage access for interactive CFD work.

Please have a look at our 64 Cores workstation:
https://trampocfd.com/collections/wo...es-workstation
If you like what you see, Please contact us through PM or contact form on our home page, and we’ll send you a questionnaire to make sure we address all your needs thoroughly.

Best regards,
Gui
Hello Guillaume,

Thank you for your answer.

Unfortunately, we can't buy your products as we need EU warranty. It would be too complicated to buy from Australia.

Do you think we should build the server on a tower case or a rack is better?

I see you only put a 500GB NVMe for primary partition and 1TB SSD for main partition on your CFD workstation, is it really enough?

I also have zero idea about sharing GPUs. I mean the server will run on Linux which manages multiple users. So if userA runs a CFD-post, will the userB be capable of running a CFD-post render too?

Best regards
killian153 is offline   Reply With Quote

Old   October 8, 2020, 20:13
Default
  #8
Member
 
Guillaume Jolly
Join Date: Dec 2015
Posts: 63
Rep Power: 10
trampoCFD is on a distinguished road
Send a message via Skype™ to trampoCFD
Hi Killian
1/ Our warranty is the parts manufacturers international warranty, the manufacturer ships a new part to you following a remote diagnostic that shows a defective part. Your location makes no difference.
2/ rack VS tower: do you have rack space? otherwise tower is the default.
3/SSD capacity: that totaly depends on your usage. How much data will your largest simulation produce? be carefull if your run transient simulation and generate data for making videos. you could be generating TBs of data with a 10M cells mesh.
4/I can't help with shared usage on a standard CPU. I looked at Nvidia Grid for our cloud solution but never went ahead with it.

best regards
gui@trampoCFD
trampoCFD is offline   Reply With Quote

Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
TimeVaryingMappedFixedValue irishdave OpenFOAM Running, Solving & CFD 32 June 16, 2021 07:55
Problem in running ICEM grid in Openfoam Tarak OpenFOAM 6 September 9, 2011 18:51
on the definition of Start in shell autoignition mepgzzi Siemens 0 June 18, 2008 11:33
CFD Online Running on a New Server CFD Online Team Main CFD Forum 2 November 30, 2007 17:58
Lets start the public domain CFD-Project! Heinz Wilkening Main CFD Forum 3 March 11, 1999 23:55


All times are GMT -4. The time now is 09:05.