CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   Hardware (https://www.cfd-online.com/Forums/hardware/)
-   -   Home workstation for large memory demand CFD (https://www.cfd-online.com/Forums/hardware/224230-home-workstation-large-memory-demand-cfd.html)

yutsumi February 11, 2020 04:48

Home workstation for large memory demand CFD
 
I need a workstation for shape optimization using SU2. Since I'm running SU2, I believe there is no limit in number of cores I can use. Because SU2 requires large memory for shape optimization, I need at least 512GB of RAM or 1TB if possible because I want to refine mesh more. Since CFD is my personal weekend project, I don't mind waiting for a week to complete one case and I don't have a fixed budget although exceeding well over 3,000USD starts to make me uncomfortable. With that in mind, I came up with 3 options so far.

Option 1: 4,000USD
- Supermicro MBD-H11DSI-NT-B 637USD
- 2xEPYC 7252 3.1GHz 8-core 1100USD
- 16x32GB DDR4-2666 RDIMM dual-ranked (21.3GB/s) 1600USD
- Fan: 2x Noctua NH-U14s TR4-SP3 160USD
- Case: Phanteks Enthoo Pro 100USD
- Power: Seasonic Focus Plus Platinum 750W 140USD
- 500GB NVMe-SSD, e.g. Samsung 970 Evo Plus 120USD
- AMD RX 570 120USD
(Adjusted flotus1's recommendation (https://www.cfd-online.com/Forums/ha...tml#post744418) for my demand with rough price on newegg.com. Since I'm in Japan, shipping will probably add some hundred dollars. I could not find local shops that have reasonably priced EPYC.)

Option 2: 3,090USD
- Dell Precision T7910
- 2x E5-2667v3 3.2GHz 8-core
- 16x 32GB 2400MHz DDR4 ECC memory (19.2GB/s)
- 480GB SSD
- NVIDIA Quadro K4200 4GB GDDR5
(https://www.theserverstore.com/dell-...ion-t7910.html)

Option 3: 2,184USD
- Dell Precision T7610
- 2x E5-2667v2 3.3GHz 8-core
- 16x 32GB DDR3 1866MHz ECC RDIMM (14.9GB/s)
- 512GB SSD
- NVIDIA Quadro K4200 4GB GDDR5
(https://www.theserverstore.com/dell-...ion-t7610.html)

Question 1
Are those options reasonable for my demand? I'm leaning towards option 2 or 3, but is there anything wrong with the configurations?

Question 2
What would be the difference in speed? Reading the past threads, memory bandwidth seems to be a limiting factor in CFD. If so, is option 1 slightly more than twice as fast as option2 considering memory channel (8 vs 4) and memory speed (21.3GB/s vs 19.2GB/s)? Also, is option 2 about 29% faster (19.2GB/s vs 14.9GB/s) than option 3? It roughly matches comparison between EPYC7301, E5-2643v3 and E5-2695v2 in the second figure on the thread below. Is my understanding correct?
https://www.cfd-online.com/Forums/ha...-hardware.html

Question 3
Is there a way to get 1TB RAM workstation at similar price range as above? I didn't list 1TB options because 16x64GB RAM seems to be significantly more expensive but I want 1TB if it is reasonable. Even if I choose 512GB option for now, is it better to go with DDR4 option because I can replace RAM with 16x64GB when they become cheaper in the future?


[Background]
For discrete adjoint solver, SU2 consumes roughly 160GB of RAM per 1 million nodes, which is roughly ten times more than direct simulation. I have been trying to run it with my i7-6700K with 64GB RAM. Since it does not have enough RAM, I'm using 2 NVMe SSDs as a swap (I know it is not a good way but I've been trying to do my best with what I have now). I haven't completed running my 1.4 million nodes case, but it probably takes more than 3 weeks at the current pace and I realized TBW of SSD will be consumed very quickly, so I think it is time for me to buy a workstation. I also considered use of cloud but it seems to cost a lot in the long run and also considered building a cluster but I don't have space for it in my apartment.

flotus1 February 11, 2020 11:01

Now that's ...challenging.

There is an issue with the Epyc configuration. Epyc Rome CPUs with less than 128GB of L3 cache (like Epyc 7252) effectively only have 4 memory channels. The short(er) version: the interconnect between I/O die and chiplets has a limited bandwidth. Rome CPUs with less than 128GB of L3 cache don not have enough chiplets to saturate more than 4 memory channels. The memory interface technically remains 8-channel, because it is handled by the I/O die. But there is a severe bottleneck between the cores/chiplets and the I/O die. You can see it in AMDs official specs: https://www.amd.com/de/products/cpu/amd-epyc-7252, the per-socket memory bandwidth is much lower than octa-channel DDR4.
The long version: https://www.servethehome.com/amd-epy...ory-bandwidth/
In conclusion: the cheapest Epyc Rome CPU that would kind of make sense is the 7262.

Option 2 and especially 3 are definitely good value for 512GB of RAM.


What are your other options for more than 512GB in a shared-memory system?
Staying with dual-socket and 16 DIMM slots: LRDIMM. 64GB LRDIMM exist for both DDR3 and DDR4. But as you can imagine, they don't fit into your budget of 3000$, unless you strike the deal of the decade.

Which leaves one last option: quad-socket 2011(-3) with 32 DIMM slots. Disclaimer: this requires an amount of research and tinkering that I would shy away from. The upsides: These CPUs can be found pretty cheap, and you can stay with relatively cheap 32GB DIMMs.
The downsides: motherboards. They are hard to find, and only come in proprietary form factors, and don't necessarily have ATX power connectors. Unless you stumble upon a complete server, you will have to make one fit into a regular PC case. Software support for these low-volume parts might be an issue too. The CPUs are cheap for a reason ;)

If distributed memory is an option, I would rather go with that. 2-4 dual-socket systems don't necessarily take up a huge amount of space, and you could go with really cheap components, while still getting much more performance than with a single dual-socket system.

Edit: by the way, if this solver offers out-of-core computing capabilities, better configure your SSDs as a scratch drive. This should yield better performance overall, compared to abusing them as swap space.

yutsumi February 12, 2020 09:40

Thank you for your reply, flotus1. There are lots of learnings for me.

I thought 2nd generation EPYC is better but, at the similar price range, 1st generation seems to be better then. Anyway, EPYC option will be too expensive for me especially if I consider future upgrade to 1TB.

Assuming there is only 29% or so difference in speed between option 2 and 3, I think it is better for me to choose the option 3 for now to save money so that, when I need 1TB RAM, I can get another similar one to create a cluster (I assume distributed memory means a cluster?). I briefly looked for quad-socket 2011(-3) but I could only find motherboards but not a complete server. I'm not confident in assembling a workstation using lots of special parts.

For option 3, I'm thinking of using E5-2687W v2 instead of E5-2667 v2 because there is only small price difference. There is nothing wrong with this change like the one you mentioned for EPYC, right? Also, there is 10GbE Ethernet card (Intel X520-T2) option available. Is it better to get one if I consider creating a cluster?

I did some research on out-of-core computing capabilities of SU2, but I haven't been able to find anything so far. I'll do more research to see if my current PC can do what I want to do in a more reasonable time span before I buy a workstation.

flotus1 February 12, 2020 10:14

There is indeed a pitfall with the Xeon E5-2687W v2: at least according to Intels official specs, it supports a maximum of 256GB RAM per CPU. So 512GB in a dual-socket system. https://ark.intel.com/content/www/de...-3-40-ghz.html
It's fine for 512GB in a single system, but you lock yourself out of the option to get even higher memory capacity in a single node.
I can tell you from experience that this CPU is not worth the price premium over the cheaper E5-2667 v2, which also happens to support 768GB of RAM per CPU.

You are correct, when I say distributed memory I mean cluster.
For connecting two nodes together, you could use 10Gigabit Ethernet. But the better option would be used Infiniband gear sourced on ebay. It is dirt-cheap, and for only two nodes, you don't even need a switch. Two cards and a cable are enough.

One of the quad-socket platforms you could look into is Dell R910: https://www.ebay.com/itm/Dell-PowerE...UAAOSwA2hd1j1b
But keep in mind that it is server-grade hardware and cooling. You would not want to be in the same room when this thing is running ;)

You probably already checked (also here in the SU2 forum), but just to make sure: there is no possible way to reduce the outrageous memory consumption of your simulations? I have no idea about the inner workings of SU2, but when you say shape optimization: could it be that with default settings, it is processing several possible configurations simultaneously? And there is an option to process them sequentially instead, which should reduce memory consumption? Just a random thought, I could be completely on the wrong track.

yutsumi February 13, 2020 09:46

I was somehow assuming the maximum RAM supported by manufacturer is just something they tested but in reality more RAM can be used. Also, I wasn't sure how 768GB can be filled in. If you use 64GB RAM, only 12 slots will be filled in. Can that make use of 4 memory channels?

Since you mentioned cautions on rack type server in other threads, I was not paying attention to that. I did some investigation and some people are saying it sounds like (a little quiet) vacuum cleaner. I don't want that in my apartment.... Thank you for letting me know the option though.

Now I'm wondering about expected life of an old workstation with E5 v2 and DDR3. I will probably order one once I figure it out.

Yeah, I did lots of investigation on how to reduce memory consumption of discrete adjoint solver in SU2. According to the issue report below, my observation is typical and an improvement seems to be still in work.
https://github.com/su2code/SU2/issues/594
The function that I need got dropped from the latest version 7 of SU2, so I need to stick with the old version 6. I cannot expect any improvement in memory consumption for me unfortunately.

Thank you again for your help. I had very limited knowledge on hardware but reading past threads here for the last couple of weeks and your comments here gave me lots of very useful information.

flotus1 February 13, 2020 11:49

You are right, Intels maximum memory capacity is usually a mere suggestion, and outdated before the release. Mostly due to larger DIMMs becoming available.
But this is a bit different. Both CPUs were released at roughly the same time, yet they have different maximum memory capacities. Maybe the "W" version does not support LRDIMM.
Anyway, it's a moot point. The E5-2667 v2 is cheaper, equally fast and guarantees higher memory capacity.

Quote:

Also, I wasn't sure how 768GB can be filled in. If you use 64GB RAM, only 12 slots will be filled in. Can that make use of 4 memory channels?
There are servers with 3 DIMM slots per channel for this. Dell R720 for example. as long as you fill all 4 channels, you get quad-channel.

Quote:

Now I'm wondering about expected life of an old workstation with E5 v2 and DDR3. I will probably order one once I figure it out.
Expected life as in "way too slow" or as in "went up in smoke"?
It's old hardware, there is always the risk of something breaking. Usually motherboards go first, followed by power supplies. Depending on what you bought, replacements are rather cheap. Components like CPUs and RAM rarely fail before they become completely obsolete.

yutsumi February 14, 2020 08:49

I wonder why Intel released "W" version to begin with...



Expected life as in "went up in smoke". You are right that it should be cheap to replace parts. I hope I don't need to deal with that any time soon.

As for the speed, it won't change as long as I don't change the size of mesh.


I think I will go with option 3 for now. I hope it will provide performance that I need.

yutsumi March 1, 2020 03:26

So, I went with option 3. The workstation has arrived and I have run a test case. It is amazingly faster than my previous PC (i7-6700K with 64GB RAM). When I run a direct simulation within RAM capacity, it is 3 times faster. When I use 220GB of RAM for adjoint simulation exceeding RAM capacity of the previous one but not the new one, it is 20 times faster! I was a bit worried that DDR3 may be too slow, but it turned out to be fast enough for me. Thank you for your help.

One concern is that CPU temperature sometimes becomes close to 90degC. It seems to be a bit higher than what is reported on this forum. I'm asking the seller if they have replaced thermal grease. If they haven't, I will replace it to see how it helps.

flotus1 March 1, 2020 04:00

While that should be part of any refurbishing job, not many resellers go beyond wiping the outside of the case with a wet cloth.

I had to do some maintenance on a Dell T7600 recently, which is very similar. Surprisingly, renewing the thermal paste between CPUs and coolers made the temperature and noise issues worse at first.
The problem was fan control. For some reason, the fan on CPU0 would not ramp up. With CPU1 running much cooler thanks to new thermal paste, it would instead ramp up all other case fans in an attempt to cool the second CPU. No, the connectors were not swapped and the fan wasn't dead, the fan control on Dells workstations just sucks.
What solved the issue was adding two 80mm fans with thermal control, one between the two CPU heatsinks, and one behind the second one. With the temperature probes attached to the second heatsink.
https://www.arctic.ac/de_en/arctic-f8-tc.html
Plus some adapters to get to a 12V line inside the case.

yutsumi March 8, 2020 05:27

The seller said they had reapplied thermal paste but one of the CPUs had only half of area covered by thermal paste.


I had difficulty finding a fan with thermal control built into it in my country, so I ordered ones you suggested to a seller in Germany. It looks like most of fans with thermal control have a separate control unit. I hope it will solve the issue. In the meantime, leaving the case open and blowing air by room fan seem to be working pretty well.

flotus1 March 8, 2020 06:11

I should have been clearer on this: you can use any fan you want, as long as it fits.
The thermal sensor and speed control on these is just a very convenient method to get them as quiet as possible during low CPU load. And they are literally 5€ here in Germany, so adding any kind of external control would be more expensive.
If you find a regular fan with a noise level you are comfortable with at 100% speed all the time, it's fine too.

Quote:

The seller said they had reapplied thermal paste but one of the CPUs had only half of area covered by thermal paste.
Is that what they found initially, or how they left it after their refurbishing job???

yutsumi March 8, 2020 09:06

The seller in Germany ships internationally for free, so it is OK. I want my workstation to be as quiet as possible, so your recommendation looked good for me. I also had difficult time looking for an adapter between PCI-E 6pin and fan, so I decided to use an adapter between SATA power cable and fan. There are lots of empty PCI-E 6pin connectors in my workstation but it is not for powering anything other than video card?

Quote:

Originally Posted by flotus1 (Post 760876)
Is that what they found initially, or how they left it after their refurbishing job???

That's what they left it and what I found out myself.

flotus1 March 8, 2020 09:41

Yes, those 6-pin PCIe connectors are just for GPUs. These OEM workstations are not supposed to be expandable like normal PCs. So the power supply only has unused connectors for graphics cards. I also ended up using the SATA power cable for the fans.

So you re-applied a proper amount of thermal paste?

yutsumi March 9, 2020 09:37

Yes, but I only had small amount of leftover from last PC build, so I added on top of already applied one. I ordered new one, so I will remove it and re-apply it.

yutsumi April 18, 2020 06:12

After long wait probably due to COVID-19, I finally got the case fans. I have installed them, one between two CPU heatsinks, and one behind CPU heatsink in the back of the case as you said. Also, I reapplied thermal paste. Temperature has reduced but it is still showing 82-84degC. It's almost 10degC higher than Tcase shown in the website below. Should I try to reduce temperature more?
https://ark.intel.com/content/www/us...-3-30-ghz.html

I tried some other fan arrangements too. For example, placing one between the heatsinks and one in front of the heatsink near the front of the case. It reduced temperature of CPU in the front to 60-70degC but it increased the temperature of CPU in the back. I guess the CPU in the back receives hot air from the one in front in this arrangement. I did some research on the web and other people seem to have similar issues on this type of workstation as well.

flotus1 April 18, 2020 06:31

After all, it's an OEM workstation with tiny CPU coolers. The only goal they have is to keep CPU temperatures out of thermal throttling territory. You can't expect very low temperatures that would be possible with aftermarket coolers in a well-ventilated case.

Intels Tcase spec is for the temperature on the outer side of the heatspreader. You can't really measure that, the figure is more important for OEMs designing their cooling solution.
CPU core temperatures in the 80-90°C range are fine. Not great, but fine as in the CPUs won't thermal throttle, and they won't die from it in the foreseeable future.

It is in fact the CPU in the back that tends to run hotter, and should be the focus of your efforts to find the optimal fan setup.

yutsumi April 18, 2020 06:53

I see. Thank you for all your advice.

I didn't know that workstation creates so much heat. I don't think I need a heater in this room even in winter.

Duke711 May 5, 2020 09:13

Quote:

Originally Posted by flotus1 (Post 757885)

You are correct, when I say distributed memory I mean cluster.
For connecting two nodes together, you could use 10Gigabit Ethernet. But the better option would be used Infiniband gear sourced on ebay. It is dirt-cheap, and for only two nodes, you don't even need a switch. Two cards and a cable are enough.


Is not a better option, then the dirt cheap Infiniband on ebay is an old technology, most QDR, an up to 96 cores not really faster as 10G. The driver for new OS also out of date.



https://www.hpcadvisorycouncil.com/p...is_AMD_617.pdf
http://www.hpcadvisorycouncil.com/pd..._E5_2697v3.pdf


All times are GMT -4. The time now is 07:42.