|
[Sponsors] |
![]() |
![]() |
#821 |
New Member
DS
Join Date: Jan 2022
Posts: 15
Rep Power: 5 ![]() |
Asus ROG STRIX G713PV, Ryzen 7945HX, 2 x 48GB DDR5 5600Mhz,
OpenFoam2406 (precompiled), Ubuntu 24.04.1, Motorbike_bench_template.tar.gz (default settings) Meshing (real) # cores | Meshing Wall time(s)| Solver Wall time(s): ------------------------ 1 | 399 | 513 2 | 278 | 278 4 | 177 | 167 6 | 141 | 143 8 | 120 | 142 12 | 112 | 139 16 | 108 | 139 |
|
![]() |
![]() |
![]() |
![]() |
#822 | |
New Member
Join Date: Nov 2016
Posts: 16
Rep Power: 10 ![]() |
Quote:
|
||
![]() |
![]() |
![]() |
![]() |
#823 |
Senior Member
Sultan Islam
Join Date: Dec 2015
Location: Canada
Posts: 145
Rep Power: 11 ![]() |
It seems a lot of people (on Reddit) and some here praise Apple silicon because of memory bandwidth for CFD. However I have only ever seen openfoam numbers, none from fluent (built-in solvers and/or custom solver with UDF) or starccm+. I can use OpenFOAM numbers (I use OpenFOAM too) to guesstimate the performance but I'm not seeing the craze on Apple silicon. I looked into older posts in this thread and saw that someone with both an m1 (pro?/max?) and a 13900K shows that Apple Silicon got 81 seconds but the 13900K beat it with a high 70's in run time. Most eypcs here beat out Apple silicon, however, I'm not seeing any numbers for the latest desktop AMD or laptop ones (most newer ones should beat out or come close to 13900K easily).
I believe I also saw a post on results from a non-apple arm system, seems to show arm is not as well in floating point versus x86_64, is this true? I know id need to stick with x86-64 for cad/pre-processing but is arm really good for cfd at this point or all just fluff? |
|
![]() |
![]() |
![]() |
![]() |
#824 |
Senior Member
Sultan Islam
Join Date: Dec 2015
Location: Canada
Posts: 145
Rep Power: 11 ![]() |
I was really looking into apple silicon. I use openfoam and my prop also has a consol license. However i haven't seen any numbers on fluent or starccm+ running on the silicon (i know it's going to need a linux vm with box64 to work). I also was looking into any numbers for cad running on vm, but none.
One thing to note, I feel the silicon is also over hyped (in a sense for cfd atleast). Here is an awesome video about it: https://youtu.be/fdvzQAWXU7A?feature=shared It is seem the reported bandwidth is gpu mostly, cpu is less. This shouldn't matter such for LLMs and AI task but it's interesting. For openfoam here I've seen some desktop cpu still beat it (even older intel ones). Not much on laptop cpu here. Here is a comsol one: https://forum.level1techs.com/t/cfd-...256/185?page=5 It's see with memory training and PBO x86_64 chips are faster still in cfd (both in pure cfd benchmark and the coupled EM one). It's seem that arm might be not so good with double precision floating point and also lacks avx 512. I can't argue with the efficiency of arm chips though. I'm curious how well the upcoming amd strixpoint apu will handle this, could actually mean a good x86_64 laptop for cfd. Also alot of older server nvidia cards have superior fp64 performance and decent memory bandwidth to, so I guess can't really understand apple silicon hype for cfd. Saying that I am still tempted by the m4 mini for everything else haha. |
|
![]() |
![]() |
![]() |
![]() |
#825 |
New Member
Join Date: Dec 2024
Posts: 1
Rep Power: 0 ![]() |
Not new, only to complete. Running with OF10.
OS Ubuntu 24.04 (needed to compile OF10 for that) Board Gigabyte MZ73-LM0-000 CPU 2x EPYC 9634 84-Core Processor (168 Cores), L3 384 MB, TDP 290W Mem 24*16 GB DDR5 4800MHz BIOS settings default (exception NPS4 for NUMA) and no additonal mpirun commands like binding..ranking... Meaning SMT = ON and not the maximum performance settings. Latter one (max. performance) seems to be a disadvantage when conducting longer runs lasting hours/days. The simple BIOS default is faster and less noisy ![]() # cores Wall time (s): ------------------------ 1 640.831 6 76.3422 12 37.8541 24 19.9972 32 16.1443 48 11.6251 64 9.70698 72 9.21079 96 8.28556 120 8.4395 144 8.13012 168 8.16584 |
|
![]() |
![]() |
![]() |
![]() |
#826 | |
Senior Member
Join Date: May 2012
Posts: 563
Rep Power: 16 ![]() |
Quote:
I would say that Macbooks are S-tier for CFD while still maintaining treats that are important for a laptop. While there is no report on the M4 Max here, we have the M3 Max laptop doing the benchmark in 63 seconds - running on battery. I doubt the x86_64 camp has any similar offering. If you need something with wide compatibility though then MacOS is lowest tier.. Unfortunately, old Macs retain their value too well. Otherwise I am pretty sure that I would have a few Mac Studio M1 Ultras (about 40s on this benchmark) sitting on my desk with thunderbolt interconnect. Dead silent and each drawing approximately 100 W on full load. The fans on my 13900k are ramping up just by doing a regular system update. As soon as I get some spare time, that space heater will go into the server room ![]() My 2c |
||
![]() |
![]() |
![]() |
![]() |
#827 |
New Member
Kaissar Nabbout
Join Date: Feb 2022
Posts: 2
Rep Power: 0 ![]() |
I am not an expert in this area, so I would like to ask anyone who knows better about that to help me understand.
I have seen that the new AMD AI chips will have like 256 GB/s memory bandwidth, which is pretty good for a x86 consumer product that can natively run linux (I know Apple will have more, but it is not native linux and it is also extremely pricy in my opinion). As openfoam's simulations directly scales up with memory bandwidth (according to all results I have seen so far), I would like to know if this memory bandwidth that they announce is really the one we can get for our simulations. Or, is there anything behind it that lowers the performance? I am asking that also because I see that everybody announces that with LPDDR5 memories and according to my research, those memory have smaller bus, and I don't know if this affects the performance. Besides that, I got suspicious with this memory bandwidth, because it means that a consumer product will basically have better performance than a single Epyc 7003 CPU. Thanks a lot in advance for anyone who can help me understand that better. |
|
![]() |
![]() |
![]() |
![]() |
#828 | |
Senior Member
Join Date: May 2012
Posts: 563
Rep Power: 16 ![]() |
Quote:
|
||
![]() |
![]() |
![]() |
![]() |
#829 |
New Member
Kevin Nolan
Join Date: Nov 2012
Posts: 18
Rep Power: 14 ![]() |
Apple M4 Mac mini M4 Pro 12 cores (4e 8p), 48 GB of RAM on macOS 15.3.
OpenFOAM v2412 compiled natively. Code:
cores MeshTime(s) RunTime(s) ----------------------------------- 1 407.23 334.47 2 292.06 196.36 4 181.5 101.12 6 135.4 79.75 8 128.51 62.74 12 162.66 95.73 |
|
![]() |
![]() |
![]() |
![]() |
#830 | |
Senior Member
Join Date: May 2012
Posts: 563
Rep Power: 16 ![]() |
Quote:
How are the thermals and noise when you run it for an extended period of time? The case size seems silly at this point, I rather have a low noise fan and larger heat sink, while retaining the old form-factor. Perhaps it is still a non-issue though? |
||
![]() |
![]() |
![]() |
![]() |
#831 |
New Member
Kevin Nolan
Join Date: Nov 2012
Posts: 18
Rep Power: 14 ![]() |
It's very quiet, my 8 bay NAS next to me hums louder.
|
|
![]() |
![]() |
![]() |
![]() |
#832 |
Senior Member
Join Date: May 2012
Posts: 563
Rep Power: 16 ![]() |
Nice to hear! Do you know the power draw when you run on the 8 performance cores? (I use iStats as monitoring, not sure how accurate it is though as it uses internal sensors)
Now, if we can get some results how well thunderbolt works as interconnect with these, then there is a clear upgrade path - just purchase another unit ![]() |
|
![]() |
![]() |
![]() |
![]() |
#833 |
Senior Member
andy
Join Date: May 2009
Posts: 349
Rep Power: 19 ![]() |
Interesting result showing a parallel efficiency above 50% with 8 cores. There's no increase in efficiency due to cache effects one would expect from a server with a better memory system but nor is there the rapid collapse in parallel efficiency with cores one usually sees with consumer hardware.
Code:
Apple M4 Mac mini M4 Pro 12 cores (4e 8p) Cores Time Efficiency ----------------------------- 1 334.47 1.0 2 196.36 0.85 4 101.12 0.83 8 62.74 0.67 Code:
Apple M4 Mac mini base model 4P+6E Cores Time Efficiency ----------------------------- 1 315.54 1.0 2 191.29 0.82 4 118.64 0.66 8 111.61 0.35 Code:
Apple Macbook Pro with M1 Max Cores Time Efficiency ----------------------------- 1 433.18 1.0 2 240.02 0.90 4 135.12 0.80 8 85.57 0.63 |
|
![]() |
![]() |
![]() |
![]() |
#834 |
Senior Member
Join Date: May 2012
Posts: 563
Rep Power: 16 ![]() |
Interesting comparison. Perhaps the 8c base M4 Mac mini result is not so representative though, as it only has 4 performance cores. Running on E-cores naturally would tank the results in terms of scaling. I wonder how much power draw 4 E-cores add, for the minimal gains between 4P-cores (118 s) to 4P+4E cores (111 s).
|
|
![]() |
![]() |
![]() |
![]() |
#835 | |
Member
|
Quote:
M3 also has a larger L3 compared with M2. |
||
![]() |
![]() |
![]() |
![]() |
#836 |
Senior Member
Join Date: May 2012
Posts: 563
Rep Power: 16 ![]() |
Not sure about the cache. The L2 did not change between M2 and M3. Not sure, but it seems M4 also uses 16 MB shared L2.
I cannot find a source regarding the CPU-addressable memory bandwidth between generations, but I remember M3 having a higher than M1/M2. https://www.anandtech.com/show/21387...ts-on-ipad-pro |
|
![]() |
![]() |
![]() |
![]() |
#837 | |
Member
|
It's very possible that L3 (called SLC?) of M3 is larger than M2 and M1.
https://forums.macrumors.com/threads.../post-32845968 L3 sometimes plays an important role for realistic memory bandwidth, at least this is true for AMD's zen processors. I have no idea on the hardware design, but all the benchmark results here seem agree with it. Hopefully M4 will enlarge L3 again ![]() Quote:
|
||
![]() |
![]() |
![]() |
![]() |
#838 | |
Senior Member
andy
Join Date: May 2009
Posts: 349
Rep Power: 19 ![]() |
Quote:
Code:
Cores Time Eff. 1 546.46 1.00 4 110.53 1.26 8 51.49 1.32 16 27.53 1.24 32 15.38 1.11 64 8.67 0.98 128 6.49 0.65 192 6.43 0.44 In the Apple examples the efficiency is dropping steadily indicating moving data between processors is the limiting factor. It is unusual though in that the memory restriction is significant with 2 processors but only grows modestly on 4 and 8 cores rather than rapidly when memory bandwidth becomes insufficient. It would be interesting to know what they are doing. |
||
![]() |
![]() |
![]() |
![]() |
#839 |
New Member
Join Date: Feb 2025
Location: Germany
Posts: 1
Rep Power: 0 ![]() |
Server HP ProLiant DL385 gen11 with 2 x AMD EPYC 9684X (2 x 96 = 192 physical cores, 1152 MB L3 cache per core), 24 x 64 GB RAM 4800 MHz.
Code:
# cores snappyHexMesh (s) simpleFoam (s) 192 95.092 7.14 160 68.389 7.23 128 61.932 7.54 96 56.130 8.09 64 55.193 9.45 56 62.801 10.62 48 56.863 12.01 40 56.618 13.84 32 58.931 16.50 28 57.512 18.33 24 60.457 20.70 20 65.944 24.24 16 73.347 29.12 12 83.844 37.81 8 103.656 56.16 4 179.401 110.04 1 480.004 526.93 Software configuration Ubuntu 24.04 LTS OpenFOAM v2412 Basecase from OpenFOAM benchmarks on various hardware, post # 504 without any changes but processor numbers. Last edited by Nolcera; February 7, 2025 at 02:13. Reason: Software configuration added |
|
![]() |
![]() |
![]() |
![]() |
#840 | |
Member
|
I have no idea why cache (at least L3) plays an important role here, maybe as you said it is related to MPI overhead.
But this is true especially during the time of zen2, a lot of posts back then in this tread show that L3=256MB (e.g. epyc 7532) is about 30% faster than their L3=128MB varient, if other specs are almost the same. -X models with even larger L3 should be even faster, but typically the price curve at -X modes is quite steep ![]() Quote:
|
||
![]() |
![]() |
![]() |
Thread Tools | Search this Thread |
Display Modes | |
|
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to contribute to the community of OpenFOAM users and to the OpenFOAM technology | wyldckat | OpenFOAM | 17 | November 10, 2017 15:54 |
UNIGE February 13th-17th - 2107. OpenFOAM advaced training days | joegi.geo | OpenFOAM Announcements from Other Sources | 0 | October 1, 2016 19:20 |
OpenFOAM Training Beijing 22-26 Aug 2016 | cfd.direct | OpenFOAM Announcements from Other Sources | 0 | May 3, 2016 04:57 |
New OpenFOAM Forum Structure | jola | OpenFOAM | 2 | October 19, 2011 06:55 |
Hardware for OpenFOAM LES | LijieNPIC | Hardware | 0 | November 8, 2010 09:54 |