CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

OpenFOAM on Apple M1 Render Farm?

Register Blogs Community New Posts Updated Threads Search

Like Tree2Likes
  • 1 Post By xuegy
  • 1 Post By xuegy

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   May 20, 2021, 20:03
Lightbulb OpenFOAM on Apple M1 Render Farm?
  #1
Member
 
Join Date: Jun 2016
Posts: 99
Rep Power: 9
xuegy is on a distinguished road
I came across an idea to run OpenFOAM on M1 Mac Mini render farm:
For each 16GB model with 10GbE port, it's $999 or $899(edu).
In Geekbench 5, M1 outperforms 12-core E5-2697 v2 so we don't need to worry about the CPU performance.
Pros:
1. For each $1000 you get 68 GB/s of memory bandwidth from LPDDR4X-4266.
2. 192 KB L1i, 128 KB L1d, 12MB L2, that's huge.
3. 25W super low power consumption.
4. Possible to do optimizations based on its unified memory, e.g. solving fvMatrix on GPU directly.
Cons:
1. Small RAM per computer.
2. High RAM latency ~100ns (~50ns on Intel/AMD.)
3. No infiniband. There're two options: 1. 10GbE switch 2. Thunderbolt 3(40Gb) but might be expensive as well.
Or maybe it's too early and I should wait until M2 brings HBM.
xuegy is offline   Reply With Quote

Old   May 21, 2021, 00:10
Default Interesting Idea!
  #2
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 316
Rep Power: 12
wkernkamp is on a distinguished road
For a small cluster the 10 Gb/s ethernet should be fine. Don't need Infiniband. Before commiting to multiple units, try OpenFOAM benchmarks on various hardware on one node. Openfoam scaling is very linear with cluster nodes so you can do your test for just $1,000.


Good luck and post results!


Will
wkernkamp is offline   Reply With Quote

Old   May 21, 2021, 03:37
Default
  #3
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Yeah, I would not be too optimistic about that.
If you want to go that route, definitely try to get your hands on one of these machines first, and see how that performs.
Here are my thoughts/questions:
Quote:
1. For each $1000 you get 68 GB/s of memory bandwidth from LPDDR4X-4266.
Do you? I can't tell at first glance whether that memory operates in dual-channel mode.
And for 1000$, I can spec out some rather powerful PCs from regular parts. The kind of parts that can be replaced if they break, instead of swapping the whole machine. With the freedom to use whatever you want: more memory? ECC support? Infiniband? regular GPUs?...

Quote:
2. 192 KB L1i, 128 KB L1d, 12MB L2, that's huge.
Compared to what? It's an entirely different CPU architecture compared to what Intel and AMD offer in x86 space. Hence comparing specs like that is rather pointless. But if we really want to do that: 12MB of L2 may sound huge compared to what current-gen mainstream CPUs have. Until you realize that this is the last-level cache, and its size should rather be compared to L3 of known CPUs. Plus the fact that one cache hierarchy is missing. But again, different architecture, that doesn't have to mean anything.

Quote:
3. 25W super low power consumption.
Which will lead to TDP throttling, especially if you want to leverage GPU compute in addition to CPU. And possibly thermal throttling aswell, since the cooling solution is designed by apple
Quote:
4. Possible to do optimizations based on its unified memory, e.g. solving fvMatrix on GPU directly.
Good luck!
flotus1 is offline   Reply With Quote

Old   May 21, 2021, 14:35
Default
  #4
Member
 
Join Date: Jun 2016
Posts: 99
Rep Power: 9
xuegy is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
Yeah, I would not be too optimistic about that.
If you want to go that route, definitely try to get your hands on one of these machines first, and see how that performs.
I’ve ordered a M1 Mac Mini and will test the performance on a single computer. ( If I can successfully compile it)
For ANSYS users they don’t have a choice. But for OpenFOAM users, ARM64 is not far away.
flotus1 likes this.
xuegy is offline   Reply With Quote

Old   May 21, 2021, 14:36
Default
  #5
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Please let us know how it turns out
flotus1 is offline   Reply With Quote

Old   May 21, 2021, 14:51
Default
  #6
Member
 
Join Date: Jun 2016
Posts: 99
Rep Power: 9
xuegy is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
Which will lead to TDP throttling, especially if you want to leverage GPU compute in addition to CPU. And possibly thermal throttling aswell, since the cooling solution is designed by apple
Good luck!
I've seen a benchmark result using both CPU and GPU. It will be 15W for CPU and 10W for GPU. So yes the throttling could be an issue.
Not sure if PETSc GPU support is available on M1. It says OpenCL but who knows.
xuegy is offline   Reply With Quote

Old   May 23, 2021, 03:55
Default
  #7
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Forgot to mention one thing about GPU compute...
One of the reasons why GPU computing is a thing for CFD is dedicated memory on those GPUs, with pretty high memory bandwidth. In the order of several hundred GB/s and more on high-end models. The GPU in question here doesn't have that, and instead has to share memory capacity and bandwidth with the CPU. Apple calls it "unified" memory, but it's just good old shared memory without fixed allocation. That's not a pro, but another downside from a possible bottleneck.
flotus1 is offline   Reply With Quote

Old   May 28, 2021, 01:08
Default
  #8
Member
 
Join Date: Jun 2016
Posts: 99
Rep Power: 9
xuegy is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
Please let us know how it turns out
Here's the result:

Compiled OF-v2012 on native ARM64 in 36 minutes with all 4+4 cores. This part is blazing fast.

Then I ran the motorbike case. It took 155s to run 500 iterations with 4 big cores. If I use all 8 cores it's 168s so these little cores are really weak. During that I can't hear any fan spinning at all.

Tried my 2018 MBP (i5-8259U, 4 cores), took 232s and the fan was almost taking off.

Also tried my dual E5-2667v2 hackintosh workstation, with 8x DDR3-1333 channels:
4 cores - 232s
8 cores - 158s
16 cores - 135s

Overall, floating point of M1 is not proportionally good as other parts. However I didn't enable SIMD optimizations (no idea how to enable Neon on Apple Clang) so it may have some potential. Also I haven't tried GPU acceleration yet.

And I know this test case is too small to scale on my workstation. Is there any benchmark case that uses ~10GB RAM?
flotus1 likes this.
xuegy is offline   Reply With Quote

Old   May 28, 2021, 02:31
Default
  #9
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
There is always OpenFOAM benchmarks on various hardware
Also lets you compare the results to tons of other setups. And it has been proven to be large enough for good scaling with much higher core count.
flotus1 is offline   Reply With Quote

Old   May 29, 2021, 01:03
Default
  #10
Member
 
Join Date: Jun 2016
Posts: 99
Rep Power: 9
xuegy is on a distinguished road
The result is here:
OpenFOAM benchmarks on various hardware
The build is still buggy and there could be some more improvement.
xuegy is offline   Reply With Quote

Old   October 4, 2021, 05:39
Default
  #11
Member
 
Miguel Hernandez
Join Date: Feb 2021
Location: En mi casa
Posts: 56
Rep Power: 5
Miguel Hernandez is on a distinguished road
Quote:
Originally Posted by xuegy View Post
Here's the result:

Compiled OF-v2012 on native ARM64 in 36 minutes with all 4+4 cores. This part is blazing fast.

Then I ran the motorbike case. It took 155s to run 500 iterations with 4 big cores. If I use all 8 cores it's 168s so these little cores are really weak. During that I can't hear any fan spinning at all.

Tried my 2018 MBP (i5-8259U, 4 cores), took 232s and the fan was almost taking off.

Also tried my dual E5-2667v2 hackintosh workstation, with 8x DDR3-1333 channels:
4 cores - 232s
8 cores - 158s
16 cores - 135s

Overall, floating point of M1 is not proportionally good as other parts. However I didn't enable SIMD optimizations (no idea how to enable Neon on Apple Clang) so it may have some potential. Also I haven't tried GPU acceleration yet.

And I know this test case is too small to scale on my workstation. Is there any benchmark case that uses ~10GB RAM?
It's been a while since I'm trying to install OpenFoam 9 on my MacBook Pro M1 without any success. I've used docker image, but it's seems to be emulated (x86) and so is very very slow...

Can you provide some tips on how you managed to install (natively?) OpenFoam on the new SoC M1?

It could be very useful...

Thanks in advance.
Miguel Hernandez is offline   Reply With Quote

Old   October 7, 2021, 10:12
Default
  #12
Member
 
Join Date: Jun 2016
Posts: 99
Rep Power: 9
xuegy is on a distinguished road
You'll need to remove the sigfpe part(seems like Apple silicon doesn't support this feature?) then it should compile.
https://github.com/mrklein/openfoam-...ment-850090915
xuegy is offline   Reply With Quote

Old   October 8, 2021, 14:20
Default
  #13
Member
 
Miguel Hernandez
Join Date: Feb 2021
Location: En mi casa
Posts: 56
Rep Power: 5
Miguel Hernandez is on a distinguished road
Quote:
Originally Posted by xuegy View Post
You'll need to remove the sigfpe part(seems like Apple silicon doesn't support this feature?) then it should compile.
https://github.com/mrklein/openfoam-...ment-850090915


Thanks Xuegy... i tried what you suggested, it doesn’t work for me...
Miguel Hernandez is offline   Reply With Quote

Old   October 9, 2021, 06:39
Default
  #14
Member
 
Miguel Hernandez
Join Date: Feb 2021
Location: En mi casa
Posts: 56
Rep Power: 5
Miguel Hernandez is on a distinguished road
Quote:
Originally Posted by xuegy View Post
You'll need to remove the sigfpe part(seems like Apple silicon doesn't support this feature?) then it should compile.
https://github.com/mrklein/openfoam-...ment-850090915
Could you please provide some more details?

I've edited the sigFpe.C file --> I've removed all the __APPLE__;
I founded it in some #if include...
And what removing -ftrapping-math means?

Thanks in advance,
regards
Miguel Hernandez is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Map of the OpenFOAM Forum - Understanding where to post your questions! wyldckat OpenFOAM 10 September 2, 2021 05:29
Connect your Paraview Client to an autoscaling Paraview render farm on Google Cloud FluidNumerics_Guy Hardware 0 November 5, 2020 10:27
UNIGE February 13th-17th - 2107. OpenFOAM advaced training days joegi.geo OpenFOAM Announcements from Other Sources 0 October 1, 2016 19:20
Superlinear speedup in OpenFOAM 13 msrinath80 OpenFOAM Running, Solving & CFD 18 March 3, 2015 05:36
Adventure of fisrst openfoam installation on Ubuntu 710 jussi OpenFOAM Installation 0 April 24, 2008 14:25


All times are GMT -4. The time now is 00:59.