CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

Hardware recommendation for combustion solvers - Forte - Converge

Register Blogs Members List Search Today's Posts Mark Forums Read

Like Tree2Likes
  • 1 Post By flotus1
  • 1 Post By kstuart

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   February 11, 2020, 22:03
Default Hardware recommendation for combustion solvers - Forte - Converge
  #1
New Member
 
Kurt Stuart
Join Date: Feb 2020
Location: Southern illinois
Posts: 17
Rep Power: 2
kstuart is on a distinguished road
I'm beginning some ICE simulations for my PhD research - and I'm looking for a hardware recommendation.

Right now I am using a $300 acer with win 10, i5-8400, single stick 8gb memory, and some optane thing. Yes I know, a second stick in the other memory channel will perk it up quite a bit, and I have one for it.

I have been running Ansys Forte R19.3 Student. I've run a few tutorials, one of which was a single cylinder port injected spark ignition engine. It was around 400k cells, with chemistry only enabled during combustion/expansion, and it took about 30 hrs. Ansys says it should be 20hrs "on a cluster with 16 nodes with dual intel xeon E5-2690 at 2.9ghz (8 total cores)" - I think there is a typo by ansys in there...

AFAIK, the student version of Ansys lets me run 16 cores. I have also secured an academic license for Converge 3.0, and will likely be using that to do the actual work. I do not know what the limits on that are.

What sort of computing power will I need to get this done? I'll be doing simulations of a 3 cylinder port injected spark ignition engine, based on what I've seen using Forte, it will be about 1.5mil cells. Will my existing desktop get it done? 3-4 days of running is ok, 2 weeks is not ok.

I started a forte multi-cylinder tutorial, that is more complicated than my research engine. It looks like it was on track to take about 15 days to complete on my existing desktop, which has scared me into looking at other options.

I've been poking around on here, looking through the benchmark thread for faster cheap hardware, but it's hard place my machine in the mix, as I have not ran the benchmark, nor will I have time to for a few days.

I've considered a lot of options to speed things up, my budget is about $0, also.
1. Buying another unit like I have, at least then I could run 2 at a time

2. Buying some $100 i5-4590 desktops from craigslist and building Beowulf like this A low-cost Beowulf Cluster (grad student style)

3. Buying an older multi processor server. I've found a few that were interesting. A Dell R910 with 4x E7- 4850, and 32X16GB memory - $500. An R720 with 2x E5-2609v2 and 16gb - $300, and a R420 2x E5-2420v2 16gb for $200.

4. Dept chair suggests trying to get time on a big computer somewhere. IDK if that is possible with Converge or not. I'm trying to find that out. It is certainly the cheapest option, but if I can buy a $200 server and get a run done in a few days with it, I would rather go that route.
kstuart is offline   Reply With Quote

Old   February 12, 2020, 10:44
Default
  #2
Senior Member
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 2,503
Rep Power: 35
flotus1 will become famous soon enoughflotus1 will become famous soon enough
Quote:
1. Buying another unit like I have, at least then I could run 2 at a time
Not if you have access to dual-socket Ivy bridge workstations for 300$. They are faster.

Quote:
2. Buying some $100 i5-4590 desktops from craigslist and building Beowulf like this https://www.cfd-online.com/Forums/ha...ent-style.html
Same thing here: a dual-socket Xeon workstation outperforms two of these. And if you want to connect more than two, you will need to start looking into faster interconnects. Sure, Infiniband cards, cables and switches are cheap on ebay. But you will need a lot more of them when you connect lower-end machines, compared to dual-socket Xeons.
And since you seem to be limited to 16 cores max, a single workstation is really all you need.

Quote:
3. Buying an older multi processor server. I've found a few that were interesting. A Dell R910 with 4x E7- 4850, and 32X16GB memory - $500. An R720 with 2x E5-2609v2 and 16gb - $300, and a R420 2x E5-2420v2 16gb for $200.
I will leave it up to you if you have a separate room for that R910 to sit in. But it is probably not worth the money, energy cost and headache compared to a more pedestrian dual-socket workstation.
When you shop for used machines with Xeon E5-2xxx v2 keep a few things in mind:
1) CPUs with numbers E5-24xx v2 only have three memory channels, compared to 4 memory channels on the E5-26xx v2 models. Which is worth the investment.
2) CPUs with numbers E5-260x v2 don't have turbo boost, and overall low frequency. In other terms, they are really slow. Also, only 4 cores on the E5-2609 v2.
Overall, maybe don't go lower than two E5-2650 v2.
3) Since these CPUs have 4 memory channels, you need to populate at least 8 identical DIMMs in a dual-socket system to get decent performance. No need to buy it fully decked out from a retailer though, you can buy cheap used memory (DDR3 reg ECC) and install it yourself.

Quote:
4. Dept chair suggests trying to get time on a big computer somewhere. IDK if that is possible with Converge or not. I'm trying to find that out. It is certainly the cheapest option, but if I can buy a $200 server and get a run done in a few days with it, I would rather go that route.
It should not be too difficult to convince the admins to install your software, as long as there is a Linux version. On clusters used mostly by universities, chances are that lots of Ansys products are already installed.
crestang likes this.
__________________
Please do not send me CFD-related questions via PM
flotus1 is offline   Reply With Quote

Old   February 12, 2020, 15:51
Default
  #3
New Member
 
Kurt Stuart
Join Date: Feb 2020
Location: Southern illinois
Posts: 17
Rep Power: 2
kstuart is on a distinguished road
Thanks for your insight Alex.

When I was digging through the benchmark thread, it appeared to me that on these dual socket machines by the time you get 4 processors on each socket going, they don't get much faster after that, and certainly by 6, it has leveled off. Also I recall seeing that v2 stuff was not terribly faster than the v1 stuff.

That sort of excited me about the 4 socket machine. Lots of memory channels, and only run 4,5,6 processors each socket.

I suppose that is what also drew me to the 2609v2 and the 2420v2, don't pay for the extra processor that won't speed things up.

I think I was sort of thinking about buying 2 of the 2420v2 machines and coupling them with Ethernet. I'd have 24 (even if I only used 16) cores and 12 memory channels for less than $500 vs. If I bought a 2650v2 machine and have 16 cores and 8 memory channels for $400.


So in light of all that, how about a few alternate ideas.

1. Local 2 me, pile of 6 hp servers. 2 are E5-2420, others are x55, bunch of ram. $250, plan would be to put a couple together, and hook them together, sell off the rest. Probably not a good idea, more work, slow, ect.. https://www.facebook.com/marketplace...5010249749173/


2. How about a 2630v3 machine? 16 cores, DDR4 $330, but I'll have to buy DDR4. Is the speed of DDR4 worth the extra cost? https://www.ebay.com/itm/Gigabyte-R1...D/193326850686

3. 2637v2? Needs memory and drives but $300. Looks like really high frequency, but 4 cores and ddr3. Cheap to populate though.
https://www.ebay.com/itm/Dell-PowerE...d/324021469917

4. 2650v2 https://www.ebay.com/itm/HP-ProLiant...ry!61330!US!-1

Lastly, say like a 2650v2 vs. a 2420v2. The 26 is going to cost 50% more, is it going to be 50% faster?
kstuart is offline   Reply With Quote

Old   February 13, 2020, 11:38
Default
  #4
Senior Member
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 2,503
Rep Power: 35
flotus1 will become famous soon enoughflotus1 will become famous soon enough
Quote:
When I was digging through the benchmark thread, it appeared to me that on these dual socket machines by the time you get 4 processors on each socket going, they don't get much faster after that, and certainly by 6, it has leveled off. Also I recall seeing that v2 stuff was not terribly faster than the v1 stuff.
Yes, scaling with that particular OpenFOAM benchmark is less than linear with more than 1 core per memory channel. But keep in mind, this is a different solver, and most full scaling runs I can recall are with rather high-end CPUs. Lower end CPUs with lower frequency will see better scaling.
And the benchmark results are not really comparable. Different memory configurations, different kernel versions, different OpenFOAM versions, different people running the benchmarks. With all being equal, similar v2 CPUs should be around 15-20% faster than v1, assuming both use the rated memory frequency.

Quote:
Lastly, say like a 2650v2 vs. a 2420v2. The 26 is going to cost 50% more, is it going to be 50% faster?
Probably not, and that's rarely the case with hardware. You decide if 30% more performance is worth 50% more money to you.

Quote:
Anyway, how about a new set of ideas. [...]
Again, you decide if an incremental performance increase is worth a more substantial price increase to you. You can always drop lower to save some money, but at some point, you have to stop. The price/performance ratio of v2 Xeons in really good, and I personally would not go lower. Your threshold might be different.
That being said, all these options are 1U server blades. Filled to the brim with infernally loud 40mm fans. I really recommend buying a workstation platform instead, even if it might be a bit more expensive.
__________________
Please do not send me CFD-related questions via PM
flotus1 is offline   Reply With Quote

Old   February 13, 2020, 17:36
Default
  #5
New Member
 
Kurt Stuart
Join Date: Feb 2020
Location: Southern illinois
Posts: 17
Rep Power: 2
kstuart is on a distinguished road
Quote:
Yes, scaling with that particular OpenFOAM benchmark is less than linear with more than 1 core per memory channel. But keep in mind, this is a different solver, and most full scaling runs I can recall are with rather high-end CPUs. Lower end CPUs with lower frequency will see better scaling.
I assume this implies that they are under similar memory frequency - meaning it will take 2x as many cores at half the frequency to use up the memory bandwidth?

Say we had a 2643 and a 2648L both on DDR3-1600, would they have similar performance?

Quote:
With all being equal, similar v2 CPUs should be around 15-20% faster than v1, assuming both use the rated memory frequency.
Thanks, that alone helps a lot.


Quote:
Probably not, and that's rarely the case with hardware. You decide if 30% more performance is worth 50% more money to you.
Why is it not 55% 33% more memory channels, and 17% faster memory. Not enough processor to utilize it?


Quote:
Again, you decide if an incremental performance increase is worth a more substantial price increase to you. You can always drop lower to save some money, but at some point, you have to stop. The price/performance ratio of v2 Xeons in really good, and I personally would not go lower. Your threshold might be different.
I've built race cars and such for years, and I've come to a realization. There are 2 ways to do things: As cheap as possible, and as best as possible. Anything in the middle is a waste. I think that applies here. I really just need to figure out what the minimum I need to meet the goal is.

Right now on my current desktop (i5-8400 1x8gb 2666mhz) it's going to take 2-3weeks per simulation. I need this to be like 3-4 days, 5-6 tops.

Quote:
That being said, all these options are 1U server blades. Filled to the brim with infernally loud 40mm fans. I really recommend buying a workstation platform instead, even if it might be a bit more expensive.
Is there a real reason technical reason to not use one besides the noise and electrical use? I live on a 10acre farm, with many out buildings - noise/heat is not an issue. In fact my well shed needs a little heat to keep the switch from freezing, lol.

Last edited by kstuart; February 19, 2020 at 02:01.
kstuart is offline   Reply With Quote

Old   February 17, 2020, 08:45
Default
  #6
New Member
 
Leo Natan
Join Date: Dec 2019
Posts: 7
Rep Power: 2
crestang is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
Not if you have access to dual-socket Ivy bridge workstations for 300$. They are faster.
I completely agree! I would recommend using a regular PC instead of a laptop.
crestang is offline   Reply With Quote

Old   February 19, 2020, 01:58
Default
  #7
New Member
 
Kurt Stuart
Join Date: Feb 2020
Location: Southern illinois
Posts: 17
Rep Power: 2
kstuart is on a distinguished road
I have myself semi convinced into buying an R820 with 4x E5-4640v1 and 16 4gb ddr3 1600, and a 1tb hard drive for about $500 shipped. I'm planning on using converge, and I think I will be able to use all 32 cores with it.

Is there anything odd about running a 4 processor machine? Can I install ubuntu on it and run it just like my desktop, just bigger and noisier? If it is just that easy, it really seems like the solution based on cost and some results from the bench marking thread.

My other thought is buying 2 Dl160's with 2x E5-2637v2, 8x2gb ddr3 1866,a 1tbhd, and connecting with 1gibt network. It would cost just a bit more, but would probably be the faster option if I am stuck running Forte and 16 cores. I have no idea how to set this up though. From the little bit of information it looks like it might be quite the pain to get setup.
kstuart is offline   Reply With Quote

Old   February 20, 2020, 04:29
Default
  #8
Senior Member
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 2,503
Rep Power: 35
flotus1 will become famous soon enoughflotus1 will become famous soon enough
I guess you could do that. Just make sure you have a graphics card that fits into the R820 and can be supplied with power. Unless you plan to run it as just a compute node.
__________________
Please do not send me CFD-related questions via PM
flotus1 is offline   Reply With Quote

Old   February 20, 2020, 14:51
Default
  #9
New Member
 
Kurt Stuart
Join Date: Feb 2020
Location: Southern illinois
Posts: 17
Rep Power: 2
kstuart is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
I guess you could do that. Just make sure you have a graphics card that fits into the R820 and can be supplied with power. Unless you plan to run it as just a compute node.
All I plan on doing with this is running the solvers. I plan on preparing my cases on my desktop, and then transferring them over to the R820 to run.

I'm going to get 16 x 4gb pc3 12800, is there an alternate memory configuration I should get to improve things?

I have checked into it and with my license for converge I will be able to use all the the cores on this machine, and many more as well.
kstuart is offline   Reply With Quote

Old   February 20, 2020, 14:56
Default
  #10
Senior Member
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 2,503
Rep Power: 35
flotus1 will become famous soon enoughflotus1 will become famous soon enough
16 DIMMs is the best memory configuration for this. 4GB DIMMs will most likely be single-rank, you could gain a few percent of performance by using dual-rank. But it is probably going to cost more.
__________________
Please do not send me CFD-related questions via PM
flotus1 is offline   Reply With Quote

Old   February 20, 2020, 14:59
Default
  #11
New Member
 
Kurt Stuart
Join Date: Feb 2020
Location: Southern illinois
Posts: 17
Rep Power: 2
kstuart is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
16 DIMMs is the best memory configuration for this. 4GB DIMMs will most likely be single-rank, you could gain a few percent of performance by using dual-rank. But it is probably going to cost more.
how do I achieve that?

Also comparing to this result

OpenFOAM benchmarks on various hardware

This setup I'm looking at should perform close to this one, being that it should be handicapped by the 10600 memory?


thanks!
kstuart is offline   Reply With Quote

Old   February 20, 2020, 15:09
Default
  #12
Senior Member
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 2,503
Rep Power: 35
flotus1 will become famous soon enoughflotus1 will become famous soon enough
You would have to look at the specifications/exact model of the memory you buy. For example, 1Rx4 indicates single rank, 2Rx4 is dual-rank.
You are getting older CPUs with lower core count, but faster memory. So it is safe to assume that both systems will be pretty close in terms of parallel performance in CFD.
__________________
Please do not send me CFD-related questions via PM
flotus1 is offline   Reply With Quote

Old   February 21, 2020, 01:25
Default
  #13
New Member
 
Kurt Stuart
Join Date: Feb 2020
Location: Southern illinois
Posts: 17
Rep Power: 2
kstuart is on a distinguished road
Thanks for the advice Alex. Ended up ordering it finally. $520 shipped, from this techmikeny place. R820, 4x E5-4640, 32 2gb PC312800R, 1tb sata, dvd-rw, 1100w psu, ubuntu server installed.

I was doing some digging trying to figure out what memory I needed to buy to get dual rank, and found some interesting posts. https://web.archive.org/web/20131102...ance-guide.pdf

I'm pretty sure I would have needed to get 8 or 16gb to get dual rank, which would have added over $100 to the cost, and I don't need that much. The dell paper shows 2 single rank dimms are almost as good as a single dual rank. At least in the 2 proc machines. Hope it's the same for a 4.

Anyway, looking forward to testing it out. Hope it's a "holy cow its fast" not a "why is it so slow" post lol.
flotus1 likes this.
kstuart is offline   Reply With Quote

Old   February 29, 2020, 02:14
Default
  #14
New Member
 
Kurt Stuart
Join Date: Feb 2020
Location: Southern illinois
Posts: 17
Rep Power: 2
kstuart is on a distinguished road
Got it running tonight. They ended up sending it with 18 4gb pc312800, just sort of randomly placed. Turns out they are rank 2 sticks so it worked out ok after put them(16) where they needed to be. It's right about where I figured it would be, but I was kinda hoping it would be a little better. I think it's pretty awesome for a $500 setup. I'm so tempted to order another one to hook up to it. My I5-8400 desktop was $350ish with 2x8gb 2666mhz ddr4 is at 342s on 6 cores, and I thought that was a pretty fast machine for the money. This is almost 5x faster!

Plan to run a test case tomorrow with forte, and see how it really does there.

# cores Wall time (s):
------------------------
1 1137.55
2 619.25
4 264.8
6 187.7
8 142.35
10 125.07
12 105.16
14 96.76
16 85.26
18 85.96
20 77.75
22 78.91
24 71.71
26 75.19
28 69.97
32 72.9
kstuart is offline   Reply With Quote

Old   March 30, 2020, 03:49
Default
  #15
New Member
 
Kurt Stuart
Join Date: Feb 2020
Location: Southern illinois
Posts: 17
Rep Power: 2
kstuart is on a distinguished road
I've been running a few forte cases on it. 2cyl premixed SI engine 500k cells, over 2 cycles takes about 6 days.

On forte I can only run 16cores. I've noticed it doesn't run 16 full on, it runs all 32 about halfway. Is that bad? Any tips I can try to make this thing faster? I'm running windows server on it for forte.
kstuart is offline   Reply With Quote

Old   March 30, 2020, 04:09
Default
  #16
Senior Member
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 2,503
Rep Power: 35
flotus1 will become famous soon enoughflotus1 will become famous soon enough
That's rather weird.
Which operating system are you using. And how exactly do you check CPU load on individual cores?
My first shot in the dark with this kind of issue is core binding or lack thereof, as it is often the case with NUMA systems. I assume you have Hyperthreading turned off in the bios?
__________________
Please do not send me CFD-related questions via PM
flotus1 is offline   Reply With Quote

Old   March 31, 2020, 03:07
Default
  #17
New Member
 
Kurt Stuart
Join Date: Feb 2020
Location: Southern illinois
Posts: 17
Rep Power: 2
kstuart is on a distinguished road
I'm running Windows Server 2019 Standard Evaluation. I'm looking at "Resource Monitor" . I'm pretty sure I turned off hyper-threading, but I've kinda wondered if it really is off.
Attached Images
File Type: jpg server processors.jpg (128.4 KB, 9 views)
kstuart is offline   Reply With Quote

Old   April 27, 2020, 23:01
Default weird findings...
  #18
New Member
 
Kurt Stuart
Join Date: Feb 2020
Location: Southern illinois
Posts: 17
Rep Power: 2
kstuart is on a distinguished road
So, since I am running the student version of ansys and am limited to 16/32 cores, I thought I would see what happens if I tried to run 2 jobs at the same time. So I did, and it ran them, but weirdly. It was an Ansys Forte tutorial, and normally it runs in just a tick over 6hrs on my system. So I started it 2 times with different names, about 1min apart. The first one I started (19) finished in about 12hrs. The second one I started (2), finished in about 6 hrs, like normal. Looking at the outputs, it looked like it ran the first submitted about half normal speed. It didn't just queue it until the other job finished. I posted a screen shot of the 2 folders.

Doing more digging, it looks like the first submitted was slow since it didn't have enough memory allocated to it or something.

Any ideas what that's all about, and how to improve that?
Attached Images
File Type: jpg zach.jpg (133.0 KB, 3 views)
File Type: jpg zach2.jpg (91.6 KB, 3 views)
File Type: jpg zach3.jpg (82.8 KB, 2 views)
kstuart is offline   Reply With Quote

Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
ANSA hardware recommendation Mohamed Mousa ANSA 0 September 21, 2017 12:26
Hardware recommendation? AMD X2, Phenom, Core2Duo, Quadcore? rparks OpenFOAM 0 April 22, 2009 09:10
Hardware Recommendation for Parallel Processing Brian Bian CFX 2 February 7, 2006 17:27


All times are GMT -4. The time now is 19:56.