|
[Sponsors] |
November 13, 2022, 16:29 |
|
#601 | |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
@naffrancois
The charts appear to be very low resolution, at least on my end. If that's a problem with the file limits enforced by the forum software, you can upload them to an image sharing site instead. Quote:
Thanks to increasing memory bandwidth by more than 2x, scaling will be better. Whether that actually results in 2x maximum performance for CFD remains to be seen. I'll probably do a small writeup/buyers guide once I wrapped my head around some of Genoa's intricacies. |
||
November 13, 2022, 16:37 |
|
#602 |
Senior Member
Join Date: Oct 2011
Posts: 242
Rep Power: 17 |
@flotus1
Yes it seems there is some compression when attaching them. Here are the links: https://ibb.co/MsQh94V https://ibb.co/GVnbYP5 |
|
November 13, 2022, 16:55 |
|
#603 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
Thanks, I added your files on the bottom of the first post.
At some point, we might want to think about a successor to this thread. But that should be done by someone who knows more about operating OpenFOAM than I do. |
|
November 17, 2022, 09:26 |
|
#604 |
New Member
Johann
Join Date: Oct 2022
Posts: 13
Rep Power: 4 |
Hello, here are the first numbers from my freshly delivered system. I used the old script from the first post. Version 2 is still calculating...
Openfoam Version 10 on Ubuntu in WSL inside Windows 11 Epyc 7373X 16x3.8GHz w/ 8x16GB DDR4-3200 RAM Code:
# cores Wall time (s): ------------------------ 1 15.8124 2 8.22782 4 5.41716 6 3.86773 8 3.1326 12 2.92129 16 2.87439 Version 2 fits better with the single core result, but the rest seems to be the same as before - what am I missing? Code:
# cores Wall time (s): ------------------------ 1 363.166 2 8.34771 4 4.59407 8 3.31006 16 2.943 Last edited by hurd; November 17, 2022 at 10:48. |
|
November 17, 2022, 10:44 |
|
#605 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
Which exact version of OpenFOAM are you trying to run?
There are 3 scripts, two of which I added under "Moderator note" in the first post for more recent versions. Each of them is supposed to work ootb with different versions. And check the logs for error messages. Something isn't right here. Your CPU is certainly fast, but not that fast |
|
November 17, 2022, 11:00 |
|
#606 |
New Member
Johann
Join Date: Oct 2022
Posts: 13
Rep Power: 4 |
Edited my post to add the info that I use openfoam 10, I now found the 3rd version of the script, which seems to be what everybody else used, it won't run (yet), but let's see.
|
|
November 17, 2022, 12:39 |
|
#607 |
New Member
Johann
Join Date: Oct 2022
Posts: 13
Rep Power: 4 |
Sorry for being incompetent with openFoam, I wanted to use it as a benchmark and to share the knowledge as the 7373X seems to be a new data point in the list.
Is there some kind of output file that I can run a hash on to see if the end-result after 100 iterations is correct? For reference here is the final step in the log.simpleFoam of the plausible run (363s single core run): Code:
smoothSolver: Solving for Ux, Initial residual = 0.00119047, Final residual = 0.000102912, No Iterations 9 smoothSolver: Solving for Uy, Initial residual = 0.022928, Final residual = 0.00183307, No Iterations 9 smoothSolver: Solving for Uz, Initial residual = 0.0198999, Final residual = 0.00164319, No Iterations 9 GAMG: Solving for p, Initial residual = 0.00900837, Final residual = 8.37447e-05, No Iterations 4 time step continuity errors : sum local = 0.000120813, global = -2.75464e-06, cumulative = -0.000127849 smoothSolver: Solving for omega, Initial residual = 0.00019487, Final residual = 1.44885e-05, No Iterations 3 smoothSolver: Solving for k, Initial residual = 0.00192524, Final residual = 0.000173624, No Iterations 3 ExecutionTime = 363.166 s ClockTime = 364 s End Code:
smoothSolver: Solving for Ux, Initial residual = 0.549225, Final residual = 0.297086, No Iterations 1000 smoothSolver: Solving for Uy, Initial residual = 0.466708, Final residual = 0.0464898, No Iterations 1 smoothSolver: Solving for Uz, Initial residual = 0.44934, Final residual = 0.0430763, No Iterations 1 GAMG: Solving for p, Initial residual = 0.0585246, Final residual = 0.000331004, No Iterations 2 time step continuity errors : sum local = 3.17385e-15, global = 8.78889e-18, cumulative = 5.96915e-16 smoothSolver: Solving for omega, Initial residual = 7.92638e-09, Final residual = 7.92638e-09, No Iterations 0 smoothSolver: Solving for k, Initial residual = 6.04358e-09, Final residual = 6.04358e-09, No Iterations 0 ExecutionTime = 2.943 s ClockTime = 3 s End Finalising parallel run |
|
November 17, 2022, 15:57 |
|
#608 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
We will have to wait for someone more knowledgeable with OpenFOAM to get to the bottom of this. In the meantime, you should upload the log files. Not only from the solver run, but especially from the meshing stage.
On the part of showing off with a brand new toy: I'm all about that. But you lose a lot of performance from WSL. If you want impressive numbers, you will have to use Linux natively. |
|
November 21, 2022, 03:27 |
|
#609 |
Member
Join Date: Sep 2010
Location: Leipzig, Germany
Posts: 96
Rep Power: 16 |
@hurd: Could you please upload the complete (compressed) logfile for one of the superfast runs or at least the complete output for the last time step?
|
|
November 21, 2022, 12:42 |
|
#610 |
New Member
Johann
Join Date: Oct 2022
Posts: 13
Rep Power: 4 |
Thank you for your help, I think I solved it by using the openfoam-dev package from openfoam.org
Now the script from the bench_template_v2 archive runs and I get these results (still using WSL Ubuntu 22.04 on a Win 11 OS) Epyc 7373X 16x3.8GHz w/ 8x16GB DDR4-3200 RAM Code:
#cores time[s] inverse[it/s] 1 398.798 0.251 2 195.208 0.512 4 107.312 0.932 6 73.4123 1.362 8 56.0352 1.785 10 45.3033 2.207 12 39.625 2.524 14 38.8043 2.577 16 34.4127 2.906 |
|
December 3, 2022, 05:58 |
|
#611 |
New Member
Richard Moser
Join Date: Aug 2009
Posts: 29
Rep Power: 17 |
Could you elaborate on this a little please (apologies if it has already been discussed earlier in the thread)? What causes you to not trust the numbers on openbenchmarking.org
|
|
December 3, 2022, 06:14 |
|
#612 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
Not in this thread right here, but the topic came up from time to time.
Relative positions of CPUs make no sense, benchmark numbers are reported for systems that should not have enough memory to run it, and the "variance" in results is impossibly low. Just to name a few of the issues. Personally, I consider this benchmark pretty much useless. And it actively does harm because it is so prevalent if you search for OpenFOAM benchmarks. 5800X3D - The new budget king of CFD? Thoughts on Openbenchmarking.org |
|
December 3, 2022, 06:45 |
|
#613 |
New Member
Richard Moser
Join Date: Aug 2009
Posts: 29
Rep Power: 17 |
Thanks for coming back so quickly. I can understand your points. It is disappointing, as a good benchmark comparison would be very useful for me at the moment as I am deciding on some new hardware which is specifically for OpenFOAM.
|
|
December 19, 2022, 15:48 |
|
#614 | |
New Member
Join Date: Dec 2022
Posts: 1
Rep Power: 0 |
Quote:
These results seem better than others posted before in terms of scaling, don't they? Which version of WSL did you use? WSL2? |
||
January 4, 2023, 18:33 |
Xeon Max vs EPYC 7773X
|
#615 |
Member
dab bence
Join Date: Mar 2013
Posts: 48
Rep Power: 13 |
Intel have released a slide and config data for an OpenFoam comparison between the Xeon MAX with HBM2 compared to EPYC 7773X
This is the slide claiming 2.5x speed up. Interesting that Fluent is only 1.2x http://www.nextplatform.com/wp-conte...erformance.jpg The test setup was also published there.. https://edc.intel.com/content/www/us...rcomputing-22/ which is... AMD EPYC 7773X: Test by Intel as of 9/2/2022. 1-node, 2x AMD EPYC HT On, Turbo On, Total Memory 256 GB (16x16GB 3200MT/s, Dual-Rank), BIOS Version M10 rev5.22, ucode revision=0xa001224, Rocky Linux 8.6, Linux version 4.18.0-372.19.1.el8_6.crt1.x86_64, OpenFOAM 8, Motorbike 20M @ 250 iterations, Motorbike 42M @ 250 iterations Intel® Xeon® CPU Max Series: Test by Intel as of 9/2/2022. 1-node, 2x Intel® Xeon® CPU Max Series, HT On, Turbo On, SNC4, Total Memory 128 GB (8x16GB HBM2 3200MT/s), BIOS Version SE5C7411.86B.8424.D03.2208100444, ucode revision=0x2c000020, CentOS Stream 8, Linux version 5.19.0-rc6.0712.intel_next.1.x86_64+server, OpenFOAM 8, Motorbike 20M @ 250 iterations, Motorbike 42M @ 250 iterations |
|
January 5, 2023, 22:48 |
|
#616 | |
Member
|
I guess the HBM is integrated with CPU or at least with MB and not sufficiently large as DDR5, so the speedup might be influenced by grid size.
Quote:
|
||
January 6, 2023, 20:45 |
|
#617 |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 371
Rep Power: 14 |
The selected EPYC is not the latest 9654 "Genua", 12 channel high Cache cpu.
|
|
January 10, 2023, 11:40 |
|
#618 | ||||
New Member
Eduardo
Join Date: Feb 2019
Posts: 9
Rep Power: 7 |
Hello,
I am facing some troubles with the performance of OpenFOAM in my machine. These are the details of my setup Quote:
I have downloaded the case ‘bench_template_v02.zip’ from the first post (I had to do just a tiny modification to substitute ‘surfaceFeatures’ by ‘surfaceFeatureExtract’ since the former was only introduced in OpenFOAM v2112). Other than this, the case is the same. My setting looks rather similar to Yannick’s one (same processor) reported in THIS post (only couple of months ago). The only difference is he has 2xEpyc7742 whilst I only have one. Quote:
(1) Regarding single-core time, ym92 reports 936.35s. However, my case runs in 598.44s. This implies about 1/3 faster. I assume this may be caused by the ‘high performance’ settings that we apply to our machine. However, just to ensure (and hoping that Yannick sees this post) I paste here the last few lines of my log file and the mesh count (given by checkMesh) in order to fully ensure the compared cases are the same: Log file: Code:
Time = 99 smoothSolver: Solving for Ux, Initial residual = 0.000910672, Final residual = 7.07286e-05, No Iterations 9 smoothSolver: Solving for Uy, Initial residual = 0.0219921, Final residual = 0.00194097, No Iterations 8 smoothSolver: Solving for Uz, Initial residual = 0.0192765, Final residual = 0.00173756, No Iterations 8 GAMG: Solving for p, Initial residual = 0.0107949, Final residual = 8.33928e-05, No Iterations 4 time step continuity errors : sum local = 0.000124784, global = 6.76048e-06, cumulative = -0.000372448 smoothSolver: Solving for omega, Initial residual = 0.000140106, Final residual = 1.01603e-05, No Iterations 3 smoothSolver: Solving for k, Initial residual = 0.00179223, Final residual = 0.000172271, No Iterations 3 ExecutionTime = 592.51 s ClockTime = 592 s Time = 100 smoothSolver: Solving for Ux, Initial residual = 0.000897164, Final residual = 6.96645e-05, No Iterations 9 smoothSolver: Solving for Uy, Initial residual = 0.0215208, Final residual = 0.00191335, No Iterations 8 smoothSolver: Solving for Uz, Initial residual = 0.0188435, Final residual = 0.00171037, No Iterations 8 GAMG: Solving for p, Initial residual = 0.0106305, Final residual = 8.19107e-05, No Iterations 4 time step continuity errors : sum local = 0.000122673, global = 6.80292e-06, cumulative = -0.000365645 smoothSolver: Solving for omega, Initial residual = 0.000139402, Final residual = 1.01096e-05, No Iterations 3 smoothSolver: Solving for k, Initial residual = 0.00176476, Final residual = 0.000169463, No Iterations 3 ExecutionTime = 598.44 s ClockTime = 598 s End Code:
Mesh stats points: 2113393 faces: 5877894 internal faces: 5691855 cells: 1893343 faces per cell: 6.11075 boundary patches: 72 point zones: 0 face zones: 0 cell zones: 0 Overall number of cells of each type: hexahedra: 1704507 prisms: 30021 wedges: 4131 pyramids: 4 tet wedges: 5828 tetrahedra: 294 polyhedra: 148558 Breakdown of polyhedra by number of faces: faces number of cells 4 15702 5 24762 6 22859 7 14956 8 6138 9 44228 10 256 11 77 12 10929 13 73 14 54 15 7331 16 9 17 11 18 1167 21 6 Checking topology... Boundary definition OK. Cell to face addressing OK. Point usage OK. Upper triangular ordering OK. Face vertices OK. Number of regions: 1 (OK). Quote:
Quote:
In order to improve these results, I have tried the following changes:
NONE of these changes did a significant change. There were only minimum variations barely significant between the different runs. My question is: is this normal? Is there any other setting that we could try and magically improve our scalability curve up to more decent values? Any possibility is welcomed and we are happy to perform other tests or provide more information if needed. Thank you for all your help Best regards |
|||||
January 11, 2023, 06:40 |
|
#619 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
Part of the reason you see worse scaling with your system is the faster single-core time you got. For a more intuitive comparison, I would recommend scaling both results by the same single-core value.
There are reasons for this large difference in single-core performance, but we don't need to get into that. Your result is good, and indicates decent FP optimizations at work. Which don't apply at high tread count, where the workload becomes bound by memory bandwidth. Speaking of memory bandwidth: that's what ultimately limits scaling on your single 64-core CPU. You are comparing against two CPUs, which have twice the amount of shared CPU resources. Memory bandwidth and last level cache being two of them. Your peak performance of 33s doesn't seem too far off. For best performance, these are the settings I would recommend: SMT off NPS=4 cleared caches before each run using "echo 3 > /proc/sys/vm/drop_caches" as root and then run the simulation with mpirun -np 64 --bind-to core --rank-by core --map-by numa It won't change results drastically though. It's still one CPU against two. |
|
January 11, 2023, 07:21 |
|
#620 |
New Member
Yannick
Join Date: May 2018
Posts: 16
Rep Power: 8 |
I fully agree with flotus1. Actually when you would use "number of cores used/total number of cores available" on the horizontal axis, our results would probably look very similar. Curve is almost flat for using more than ~50% of the cores.
Not sure why the results for single core is so different, but I might not have used adequate settings. At least I am sure I did not use core binding (which might be a good idea to bind cores to one cpu for around 2-10? cores). |
|
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to contribute to the community of OpenFOAM users and to the OpenFOAM technology | wyldckat | OpenFOAM | 17 | November 10, 2017 16:54 |
UNIGE February 13th-17th - 2107. OpenFOAM advaced training days | joegi.geo | OpenFOAM Announcements from Other Sources | 0 | October 1, 2016 20:20 |
OpenFOAM Training Beijing 22-26 Aug 2016 | cfd.direct | OpenFOAM Announcements from Other Sources | 0 | May 3, 2016 05:57 |
New OpenFOAM Forum Structure | jola | OpenFOAM | 2 | October 19, 2011 07:55 |
Hardware for OpenFOAM LES | LijieNPIC | Hardware | 0 | November 8, 2010 10:54 |