CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

3960x vs 12900k - Need a rig for my postgrad

Register Blogs Community New Posts Updated Threads Search

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   September 14, 2022, 07:04
Default 3960x vs 12900k - Need a rig for my postgrad
  #1
Member
 
Join Date: Aug 2021
Posts: 59
Rep Power: 4
cons013 is on a distinguished road
Hello all,

I am doing my thesis soon which will involve a lot of CFD. I think somewhere between 20 and 40 million elements. Right now I have our FSAE team's 6-core xeon silver workstation, but it's built for cad and for reference took about 26 hours to run a sim with 17 million elements. This isn't feasable for me and I need to get my own rig. We have access to a threadripper PC from one of our sponsors but this is for team stuff, and since it's their company computer I can't just use it whenever I want, let alone for my personal thesis.

I have been looking online and had my eye on a 3960x, but it looks hard to find motherboards for them, and from what I've seen in benchmarks the 12900k is actually better, also half the cost. Can anyone advise me on this? I know it only has dual channel memory, and 8 usable cores (I assume the efficiency cores won't be used for cfd?). Buying an 8 year old system is not ideal for me because I want this to be used as a gaming and personal desktop from time to time as well, so I'd like to just make one big purchase that I can rely on for years to come.

Thanks!
cons013 is offline   Reply With Quote

Old   September 14, 2022, 12:29
Default
  #2
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 316
Rep Power: 12
wkernkamp is on a distinguished road
Quote:
Originally Posted by cons013 View Post
Hello all,

I am doing my thesis soon which will involve a lot of CFD. I think somewhere between 20 and 40 million elements. Right now I have our FSAE team's 6-core xeon silver workstation, but it's built for cad and for reference took about 26 hours to run a sim with 17 million elements. This isn't feasable for me and I need to get my own rig.

What is your target run time? It looks to me that your plan may get you about a factor two speedup over the xeon silver. Is that enough? If not, what is your budget?
wkernkamp is offline   Reply With Quote

Old   September 14, 2022, 19:55
Default
  #3
Member
 
Join Date: Aug 2021
Posts: 59
Rep Power: 4
cons013 is on a distinguished road
Quote:
Originally Posted by wkernkamp View Post
What is your target run time? It looks to me that your plan may get you about a factor two speedup over the xeon silver. Is that enough? If not, what is your budget?
I would prefer maybe a 4x speed up, budget is an issue right now until I begin work over the holidays. Ideally no more than $5000 AUD, if there is a bargain then I can go 6000.
cons013 is offline   Reply With Quote

Old   September 15, 2022, 03:51
Default
  #4
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
A 4x improvement in run time might be difficult to achieve. But it mostly depends on the setup of the unknown "Xeon Silver" CPU.
If it was an older model running on dual-channel memory, it might be possible without making any compromises towards your other requirements.
If it was a recent model with memory populated correctly, not a chance.

I don't know the current prices for TR 3960X CPUs. A quick glance at ebay tells me people still want 1000$ for it. Combined with expensive motherboards and the fact that it won't get good scaling on all 24 cores, I don't think it's a good idea.
Intel's 12th gen CPUs paired with fast DDR5 memory will most likely be faster for CFD, and definitely cheaper. You don't really need a 12900k, a 12700k has the same amount of P-cores and will be just as fast.
The other option you have is an Epyc 7313P. This one will be faster than a TR3960X and any 12th gen Intel CPU thanks to 8-channel memory. But it's a tradeoff in terms of gaming performance. It clocks lower and can only use reg ECC memory. Part of that is compensated by being Zen3 instead of Zen2, but It just won't run games as fast while CPU limited.
flotus1 is offline   Reply With Quote

Old   September 15, 2022, 05:30
Default
  #5
Member
 
Join Date: Aug 2021
Posts: 59
Rep Power: 4
cons013 is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
A 4x improvement in run time might be difficult to achieve. But it mostly depends on the setup of the unknown "Xeon Silver" CPU.
If it was an older model running on dual-channel memory, it might be possible without making any compromises towards your other requirements.
If it was a recent model with memory populated correctly, not a chance.

I don't know the current prices for TR 3960X CPUs. A quick glance at ebay tells me people still want 1000$ for it. Combined with expensive motherboards and the fact that it won't get good scaling on all 24 cores, I don't think it's a good idea.
Intel's 12th gen CPUs paired with fast DDR5 memory will most likely be faster for CFD, and definitely cheaper. You don't really need a 12900k, a 12700k has the same amount of P-cores and will be just as fast.
The other option you have is an Epyc 7313P. This one will be faster than a TR3960X and any 12th gen Intel CPU thanks to 8-channel memory. But it's a tradeoff in terms of gaming performance. It clocks lower and can only use reg ECC memory. Part of that is compensated by being Zen3 instead of Zen2, but It just won't run games as fast while CPU limited.

Will the epyc be fast enough over the 12700k to warrant being 4x the price? The intel is about $650 and the 7313 is 1500 used, over 2500 new for me. If the i7 beats a 3960x then I doubt I could go any tier higher. So even with 3x the cores, the 3960x will do models slower because it's not using ddr5? Despite double the memory channels?
cons013 is offline   Reply With Quote

Old   September 15, 2022, 12:24
Default
  #6
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Let's see...

TR3960X result here
Quote:
Here is my result. Newlt configured workstation with Threadripper 3960x, 3.8 GHz 24C, 64 G memory (4 channel)

# cores Wall time (s):
------------------------
1 550.49
2 299.15
4 161.65
6 120.55
8 101.56
12 99.13
16 93.74
20 93.71
24 93.65
And an I5-12600 with memory overclocking here
Quote:
12th Gen Intel(R) Core(TM) i5-12600 DDR5 @ 4800

# cores Wall time (s):
------------------------
1 427.45
2 234.12
4 149.91
6 125.75



12th Gen Intel(R) Core(TM) i5-12600 DDR5 @ 5600

# cores Wall time (s):
------------------------
1 410.79
2 219.95
4 137.39
6 112.58

12th Gen Intel(R) Core(TM) i5-12600 DDR5 @ 6000

# cores Wall time (s):
------------------------
1 399.94
2 213.75
4 131.87
6 107.09
Add another 2 P-cores and a bit higher core frequency for an I7-12700k, and the peak performance is too close to be significant. Again, only with fast DDR5 memory.

As for the Epyc 7313P: we only have results for a dual-socket 7313 system:
Quote:
# cores Wall time (s):
------------------------
1 590.39
2 316.51
4 123.64
6 79.28
8 60.54
12 46.89
16 38.5
20 35.74
24 30.97
28 30.61
32 28.88
Simply doubling execution time on all 32 cores is a good enough estimate for the performance of a single 7313P: 58s
5000 sounded like plenty of budget at first, I completely overlooked that it is AUD.
flotus1 is offline   Reply With Quote

Old   September 15, 2022, 15:06
Default
  #7
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 316
Rep Power: 12
wkernkamp is on a distinguished road
For gaming: Unless you are a professional gamer, your gaming experience will be governed by the Graphics Card. So just make sure your power supply can handle the power draw.


Your current run times are so long that I think you should completely focus on bringing those times down. Flotus has the right experience to advise you. However, right now he is suggesting what I would call "compromise" systems. He is doing that because of the way you are engaging in this thread.


You should start by listing the exact specs of the Xeon Silver system. That will firmly establish were you are at now. This is important, because it might be that the configuration has defects that can be remedied. For example, if your system has just two channels active instead of six, you might speed the calculations up by a factor of three already (at very small cost).


One very effective way to speed up CFD is to simply add machines. If you were to add another xeon silver server identical to the one you have, that would double your speed.


There is an advantage of using server hardware for your mission. Servers are designed for continuous operation near their performance limit. Gaming PC's not as much, because the CPU is either used to the max on one core, or lightly used on all cores (with exceptions).


The disadvantage of server hardware is the noise level. So it matters where the hardware is deployed.


Remember that your PhD work is very important for your future.
wkernkamp is offline   Reply With Quote

Old   September 15, 2022, 15:31
Default
  #8
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 316
Rep Power: 12
wkernkamp is on a distinguished road
This configuration completes the benchmark in 15.91 seconds and might just fit in your budget:


Quote:
Originally Posted by Novel View Post
We just bought a new Workstation for our department. Thanks to this Thread we were able to find a good configuration.

The following setup was done:
OpenFOAM was compiled with the tag "-march=znver1". Also SMT was switched off and all processors were set to performance mode using "cpupower frequency-set -g performance" from the HPC Tuning Guide provided by AMD ( http://developer.amd.com/wp-content/resources/56420.pdf).

CPU:

2x AMD EPYC 7532 (Zen2-Rome) 32-Core CPU, 200W, 2.4GHz, 256MB L3 Cache, DDR4-3200
RAM:
256GB (16x 16GB) DDR4-3200 DIMM, REG, ECC, 2R

OpenFOAM v7

cores time (s) speedup
1 677,34 1,00
2 363,04 1,87
4 161,42 4,20
6 101,82 6,65
8 77,16 8,78
12 52,28 12,96
16 39,4 17,19
20 32,01 21,16
24 27,31 24,80
28 24,15 28,05
32 21,53 31,46
36 21,32 31,77
40 20,46 33,11
44 18,99 35,67
48 18,12 37,38
52 17,45 38,82
56 17,06 39,70
60 16,5 41,05
64 15,91 42,57
wkernkamp is offline   Reply With Quote

Old   September 15, 2022, 17:44
Default
  #9
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Quote:
Originally Posted by wkernkamp View Post
This configuration completes the benchmark in 15.91 seconds and might just fit in your budget:
That seems a bit excessive. 5000AUD is around 3300USD/€. I don't see a way to get a top-tier Epyc CPU with that, let alone two of them?

I also overlooked the 20-40M cell requirement. That means at least 64GB or RAM. Definitely with the option to upgrade to 128GB if/when it becomes necessary.
That's where 12th gen Intel falls behind again. Getting this much memory to run at DDR5-6000 might not even be possible today.
flotus1 is offline   Reply With Quote

Old   September 15, 2022, 18:58
Default
  #10
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 316
Rep Power: 12
wkernkamp is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
That seems a bit excessive. 5000AUD is around 3300USD/€. I don't see a way to get a top-tier Epyc CPU with that, let alone two of them?

That is less money than I thought. I saw the 7532 on Ebay for under $1000. Motherboard maybe $600, memory $600. So definitely over $3300, but not that far out. Performance about 6x better than the ryzen or i5 configs. Maybe you know a cheaper EPYC Rome that gives some performance at improved price?
wkernkamp is offline   Reply With Quote

Old   September 17, 2022, 01:20
Default
  #11
Member
 
Join Date: Aug 2021
Posts: 59
Rep Power: 4
cons013 is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
Let's see...

TR3960X result here


And an I5-12600 with memory overclocking here


Add another 2 P-cores and a bit higher core frequency for an I7-12700k, and the peak performance is too close to be significant. Again, only with fast DDR5 memory.

As for the Epyc 7313P: we only have results for a dual-socket 7313 system:

Simply doubling execution time on all 32 cores is a good enough estimate for the performance of a single 7313P: 58s
5000 sounded like plenty of budget at first, I completely overlooked that it is AUD.
So if I'm understanding correctly the 7313 is about 35% faster than the 3960 and about twice as fast as the consumer intels on all cores? As you said DDR5 is so expensive, so maybe the added cost of some kind like a 3960 or 7313 will sort of balance with DDR5 costs?

I'm not sure about the memory channel setup on our xeon. I suspect we only 2 memory channels used. We actually have a second xeon pc with 2 cpus, but they're very old and I think ddr3, again not sure about channel use. Since this is my thesis I want to do it properly, and again I'm looking for a machine to use for long term gaming and continue doing my own cfd and for
FSAE, so I could increase my budget. In your opinion do you think the 7313 is the best overall value? I'm having trouble finding solid benchmarks online and I'm not too sure what I'm even looking for.

I have access to a 3955 threadripper pro machine from a team sponsor, I will run a model this weekend to gauge the time. Any idea how a 3960 or 7313 would compare roughly to this? Let's assume 128gb on 4 memory channels.

Cheers for all your help!
cons013 is offline   Reply With Quote

Old   September 17, 2022, 13:15
Default
  #12
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Quote:
I have access to a 3955 threadripper pro machine from a team sponsor, I will run a model this weekend to gauge the time. Any idea how a 3960 or 7313 would compare roughly to this? Let's assume 128gb on 4 memory channels.
TR PRO 3955WX will be slower than a 7313P
While its spec sheet boasts 8 memory channels which is technically correct, it only has 4 memory channels worth of bandwidth thanks to only 2 active CCDs. As a result, it only has half the L3 cache (64MB).
It is an all-around worse version of the non-pro TR 3960x.
flotus1 is offline   Reply With Quote

Old   September 17, 2022, 20:58
Red face
  #13
Member
 
Join Date: Aug 2021
Posts: 59
Rep Power: 4
cons013 is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
TR PRO 3955WX will be slower than a 7313P
While its spec sheet boasts 8 memory channels which is technically correct, it only has 4 memory channels worth of bandwidth thanks to only 2 active CCDs. As a result, it only has half the L3 cache (64MB).
It is an all-around worse version of the non-pro TR 3960x.
My apologies!! I just checked, it's actually a 3975wx
cons013 is offline   Reply With Quote

Old   September 18, 2022, 05:13
Default
  #14
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
That should be about on par with an Epyc 7313P when running on 16 cores. Assuming those 16 threads are evenly distributed across all 4 CCDs.
flotus1 is offline   Reply With Quote

Old   September 18, 2022, 15:02
Default
  #15
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 316
Rep Power: 12
wkernkamp is on a distinguished road
Quote:
Originally Posted by cons013 View Post
So if I'm understanding correctly.....

I'm not sure about the memory channel setup on our xeon. I suspect we only 2 memory channels used. We actually have a second xeon pc with 2 cpus, but they're very old and I think ddr3, again not sure about channel use.
Cheers for all your help!

You should check whether the old 2 cpu server has E5-26xx v1 or v2 cpu's. If yes, you could upgrade it at very low cost to 12 core cpus and 1866 MHz memory and achieve 84 seconds on the benchmark. This would be your fastest option so far I think.
wkernkamp is offline   Reply With Quote

Old   September 18, 2022, 15:48
Default
  #16
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
It would definitely have its upsides.
Using a cheap dual-socket machine just for the number crunching part would get rid of all the compromises with the other requirements.
Though I would not go lower than Xeon E5-26xx v3 these days. The CPUs themselves are barely a cost factor in such a setup. You can get 12-core CPUs for around 100USD/EUR.

This one can run simulations 24/7, and you get another purpose-built PC for all the other stuff like gaming.

Edit: probably also worth noting that with e.g. a 24-core CPU, you can't just run a CFD simulation on 16 threads and do anything else than basic tasks while it runs. Stuff like playing modern games will be a stuttery mess. Shared CPU resources are a thing.

Last edited by flotus1; September 19, 2022 at 02:59.
flotus1 is offline   Reply With Quote

Old   September 18, 2022, 16:00
Default
  #17
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 316
Rep Power: 12
wkernkamp is on a distinguished road
Right now the E5-2680 v4 (14 core) and E5-2683 v4 (16 core) would be preferred because of their DDR4-2400 memory (versus DDR4-2133 for v3) and low current prices. Your benchmark will be around 66 seconds, so better than 12 core xeon v2. However, an old server with DDR3 is likely xeon E5 v1/v2. There are also DDR3 servers with older (not E5) xeons that are a lot slower.



A few v3 CPU's, such as the E5-2678 v3 can handle both DDR3 or DDR4 memory. I have not seen a server implementation of a v3 cpu with DDR3 memory. There are Chinese motherboards that do.


The results posted here show the numbers:


Quote:
Originally Posted by nmc1988 View Post
I have just bought the used 2 x Xeon E5-2678 v3 + X99 dual Jingsha + 8x8GB 2400 MHz on Aliepxress with around 620USD. After doing some benchmark I realized that E5-2678 v3 does not support 2400Mhz DDR4, so I changed the CPU to 2 x E5-2680 v4. Those two used CPU (2678v3 and 2680v4) are at same price now (around <100 USD) and even 2680v4 is slightly cheaper.
Here are some benchmark result

Case 1. 2 x Xeon E5-2678 v3 + X99 dual Jingsha + 8x8GB 2400 MHz; Hyper threading OFF
Centos 8.5, Opean Foam 8 on docker:
# cores Wall time (s)
16 93.02
20 83.81
24 80.22

Case 2. 2 x Xeon E5-2678 v3 + X99 dual Jingsha + 8x8GB 2400 MHz; Hyper threading OFF
Linux Mint 20.3, Open Foam 9; HT OFF

# cores Wall time (s)
16 87.9
20 80.89
24 77.37

HT ON

# cores Wall time (s)
16 90.92
20 81.59
24 80.66

Case 3. 2 x Xeon E5-2680 v4 + X99 dual Jingsha + 8x8GB 2400 MHz; Hyper threading OFF
Linux Mint 20.3, Open Foam 9; HT OFF
# cores Wall time (s)
16 81.64
20 74.62
24 70.79
28 68.11

So with limited budget, I think 2 x Xeon E5-2680 v4 + X99 dual + 8x8GB 2400 MHz is a good choice
wkernkamp is offline   Reply With Quote

Old   September 20, 2022, 00:22
Default
  #18
Member
 
Join Date: Aug 2021
Posts: 59
Rep Power: 4
cons013 is on a distinguished road
Quote:
Originally Posted by wkernkamp View Post
This configuration completes the benchmark in 15.91 seconds and might just fit in your budget:
I'm having trouble understanding these times and translating to my use case. From what I can see, these are 2 million cells, 100 iterations? How can I try match, say, single core 7443p, 30 million cells at 500 iterations for comparing times? I can't find any good data online
cons013 is offline   Reply With Quote

Old   September 20, 2022, 13:03
Default
  #19
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 316
Rep Power: 12
wkernkamp is on a distinguished road
The benchmark times are used to compare the proposed systems to each other. The assumption is that a machine that is twice as fast on the benchmark will also be twice as fast on your use case. In general that is a reasonable assumption for CFD.

In order to estimate the expected performance for your use case, we would need the exact Xeon Silver configuration that you ran your test on. Based on similar systems, we can estimate what your Xeon Silver would do on the benchmark. The estimation of run time on a proposed system would then be:

t_proposed = t_silver * t_bm_proposed./ t_bm_silver

t_silver = Your run time for your use case on the Xeon Silver
t_bm_proposed = The run time on the benchmark for the proposed system.
t_bm_silver = The estimate run time for your Xeon Silver on the benchmark

Rather than estimate the benchmark performance of the Xeon Silver, you might run the benchmark on that machine. Note that the benchmark has gone though slight input file changes consistent with newer OpenFOAM versions. If you run into a problem, don't waste time and post it here.
wkernkamp is offline   Reply With Quote

Old   September 20, 2022, 13:18
Default
  #20
Member
 
Join Date: Aug 2021
Posts: 59
Rep Power: 4
cons013 is on a distinguished road
Quote:
Originally Posted by wkernkamp View Post
The benchmark times are used to compare the proposed systems to each other. The assumption is that a machine that is twice as fast on the benchmark will also be twice as fast on your use case. In general that is a reasonable assumption for CFD.

In order to estimate the expected performance for your use case, we would need the exact Xeon Silver configuration that you ran your test on. Based on similar systems, we can estimate what your Xeon Silver would do on the benchmark. The estimation of run time on a proposed system would then be:

t_proposed = t_silver * t_bm_proposed./ t_bm_silver

t_silver = Your run time for your use case on the Xeon Silver
t_bm_proposed = The run time on the benchmark for the proposed system.
t_bm_silver = The estimate run time for your Xeon Silver on the benchmark

Rather than estimate the benchmark performance of the Xeon Silver, you might run the benchmark on that machine. Note that the benchmark has gone though slight input file changes consistent with newer OpenFOAM versions. If you run into a problem, don't waste time and post it here.
Sorry if this is obvious, but where is the link to the test used? I can't see it in that thread, also I'm using Ansys so could I even run it?
cons013 is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On



All times are GMT -4. The time now is 00:51.