CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   Main CFD Forum (https://www.cfd-online.com/Forums/main/)
-   -   OpenFOAM - parallel raspberry pi 4 grid/cluster (https://www.cfd-online.com/Forums/main/221931-openfoam-parallel-raspberry-pi-4-grid-cluster.html)

cfdhelp November 5, 2019 08:20

OpenFOAM - parallel raspberry pi 4 grid/cluster
 
Hello,


I would like to ask you about possibility to use OpenFOAM for parallel calculations with some raspberry pi 4 type computer.
How many raspberry pi 4 is need to have the same calculation power asi for example with Intel I7 with six cores - for example.

Or is there possibility to run OpenFOAM on some Google Coral edquipment ?


Thank you and regards,
Ivan

flotus1 November 5, 2019 08:46

The first question that comes to mind is: why?

It is definitely possible to build a diy cluster from single-board computers like raspberry pi. There are plenty of examples on the internet, e.g. https://www.researchgate.net/publica...rry_Pi_Cluster and https://blog.mvbakker.nl/posts/openf...raspi-cluster/
But these are not practical in any way. These projects are either for fun, or for learning purposes. If that is what you want, then go ahead ;)
If you want performance per dollar (and much less hassle), get a normal PC instead.
Ultimately, the performance of such a cluster would be limited by the node interconnect. Without low-latency interconnects available, speedup in CFD applications like OpenFOAM will drop off rather quickly. Maximum scaling was observed on only 4 nodes here: https://krex.k-state.edu/dspace/handle/2097/17612

ahmedeissa December 12, 2019 05:44

Quote:

Originally Posted by flotus1 (Post 748947)
The first question that comes to mind is: why?

It is definitely possible to build a diy cluster from single-board computers like raspberry pi. There are plenty of examples on the internet, e.g. https://www.researchgate.net/publica...rry_Pi_Cluster and https://blog.mvbakker.nl/posts/openf...raspi-cluster/
But these are not practical in any way. These projects are either for fun, or for learning purposes. If that is what you want, then go ahead ;)
If you want performance per dollar (and much less hassle), get a normal PC instead.
Ultimately, the performance of such a cluster would be limited by the node interconnect. Without low-latency interconnects available, speedup in CFD applications like OpenFOAM will drop off rather quickly. Maximum scaling was observed on only 4 nodes here: https://krex.k-state.edu/dspace/handle/2097/17612

The research article you're referring to is very old and was on the very old Raspberry Pi... Now the Raspberry Pi comes with Quad-core CPU and 4GB of RAM and a 64bit base single board computer. So I think the answer is Yes you can do it.

flotus1 December 12, 2019 07:57

I never said that it could not be done.
Instead, I gave reasons for why this is not a viable solution for a productive system. And these reasons hold true, regardless of CPU type. More theoretical FP throughput would probably not help anyway, due to memory bandwidth limitations.

ahmedeissa December 12, 2019 08:50

Quote:

Originally Posted by flotus1 (Post 752238)
I never said that it could not be done.
Instead, I gave reasons for why this is not a viable solution for a productive system. And these reasons hold true, regardless of CPU type. More theoretical FP throughput would probably not help anyway, due to memory bandwidth limitations.

I understand your reply, but why then ORACLE had used it for their Database Centres?

Check this:
Oracle: This 1,060 Raspberry Pi supercomputer is 'world's largest Pi cluster'
URL:
https://www.zdnet.com/article/oracle...st-pi-cluster/

Please advise.

ahmedeissa December 12, 2019 08:52

Quote:

Originally Posted by flotus1 (Post 752238)
I never said that it could not be done.
Instead, I gave reasons for why this is not a viable solution for a productive system. And these reasons hold true, regardless of CPU type. More theoretical FP throughput would probably not help anyway, due to memory bandwidth limitations.

And this one:

https://www.zdnet.com/article/raspbe...test-software/

flotus1 December 12, 2019 11:40

Do you know which kind of codes they run on these clusters? I don't, but I can make an educated guess: codes that don't rely on low-latency node-interconnects to achieve inter-node scaling. OpenFOAM does not fall into this category.
Since you already need quite a few of these boards to consolidate a basic PC in OpenFOAM -e.g, with a Ryzen 5 3600- I stand by my assessment: a solution that works, but makes no sense from a performance or financial point of view. These solutions are showpieces or testbeds, not alternatives for conventional PC or server hardware.

ahmedeissa December 12, 2019 12:48

Quote:

Originally Posted by flotus1 (Post 752271)
Do you know which kind of codes they run on these clusters? I don't

I do respect your answer, but:

Do you work on Databases before? Especially ORACLE Database? Have you experienced the Load Balance for this type of database?

When someone like ORACLE adopt hardware, they really did this after a thoroughly tested hardware before they announce to the world that we are using this.

ORACLE database management algorithm is very complicated and is taking a very long time on an average computer with limited CPU power, which includes searching algorithms that must return results in a fraction of seconds.

To me, this is even far complicated than OpenFOAM algorithms.

arjun December 12, 2019 13:12

Quote:

Originally Posted by ahmedeissa (Post 752279)
I do respect your answer, but:

Do you work on Databases before? Especially ORACLE Database? Have you experienced the Load Balance for this type of database?

When someone like ORACLE adopt hardware, they really did this after a thoroughly tested hardware before they announce to the world that we are using this.

ORACLE database management algorithm is very complicated and is taking a very long time on an average computer with limited CPU power, which includes searching algorithms that must return results in a fraction of seconds.

To me, this is even far complicated than OpenFOAM algorithms.


This is first time i heard someone said a search algorithm is more complicated than a multiphysics solver that solves navier stokes.





Quote:

Originally Posted by ahmedeissa (Post 752279)

When someone like ORACLE adopt hardware, they really did this after a thoroughly tested hardware before they announce to the world that we are using this.


This statement i agree with because you are chosing openfoam on pi and you haven't done thorough investigation so yes. They have done but openfoam users do not do it.

wangmianzhi December 12, 2019 13:24

let's have a $500 x86 vs Pi4 challenge! The losing side pays for both systems.

arjun December 13, 2019 00:46

Quote:

Originally Posted by wangmianzhi (Post 752282)
let's have a $500 x86 vs Pi4 challenge! The losing side pays for both systems.




i can buy Intel E5-2670 Prozessor SR0KX, 2,60 GHz, 20 m Cache, 8-Core

for 45 euros in germany from amazon. I did.



Dual processor mainboard for 199.



I am not sure your pi can catch 16 core 32 process code that could be made around 500$

Simbelmynė December 16, 2019 10:09

A Raspberry Pi 4 is around 65-70 Euros where I live. Power supply is another 10 Euros. Then I guess some cooling would be needed if we are to run a long CFD simulation. That would add at least 10 more Euros if we can go with a passive solution. An active solution will add even more.


Some type of interconnect is needed, most likely Gigabit Ethernet + Switch perhaps?


I think you will be really hard pressed to land below 100 Euros per unit.


A Raspberry Pi 4 has a memory bandwidth of around 4 GiB/s. Assuming this will be bandwidth limited then your guesstimate of performance per Euro would be 0.04 GiB/(s*Euro)


A Ryzen 3600 will give you around 47 GiB/s and assuming you build your system for 500 Euro (I can assemble a computer without a case for 450 Euro). Then you will have a guesstimate of performance per Euro of 0.09 GiB/(s*Euro).


Conclusion: even without the performance degradation of the interconnect you can see that the Raspberry Pi 4 is almost twice as expensive as the AMD for the most important metric. Considering that the Ryzen 3600 is quite expensive, you can find even better bargains in the x86 camp (not to mention the huge stock of used components available on Ebay).


Would be fun if you benchmark a Pi 4 though and put it in the benchmark thread.

flotus1 December 16, 2019 12:03

But the Pi has 4 cores. And Oracle...and complisticated algorithms :rolleyes:

sbaffini December 16, 2019 13:11

Quote:

Originally Posted by ahmedeissa (Post 752279)
ORACLE database management algorithm is very complicated and is taking a very long time on an average computer with limited CPU power, which includes searching algorithms that must return results in a fraction of seconds.

To me, this is even far complicated than OpenFOAM algorithms.

We also respect your answer but, let's try to see what happens within a single iteration of a general purpose CFD solver (I don't know OpenFOAM specifically, but that's not the point).

For each set of equations intended to be solved in a coupled manner (V equations all together):
  1. Do the necessary memory allocations/deallocations, O(VN) cost, for O(N) cells per process.
  2. Exchange V independent variables in ghost cells for parallel computations. For O(N) cells for P processes, this is O(VPN^2/3) exchanges globally in the system and O(VN^2/3) exchanges for each process.
  3. Compute 3D gradients of V independent variables on O(N) cells. This involves, at least, an O(N) loop and an O(3VN) access to memory.
  4. Exchange 3D gradients of V independent variables in ghost cells for parallel computations. For O(N) cells for P processes, this is O(3VPN^2/3) exchanges globally in the system and O(3VN^2/3) exchanges for each process.
  5. Compute fluxes of V independent variables on O(3N) faces. This has a memory access footprint and cost of O(6VN). Actually, the constant here might be quite high, in some cases proportional to V^2, so you may end up with an O(NV^3) cost.
  6. Add flux jacobian to the system matrix. For a matrix in CSR format you might want to use a binary search of the column within the matrix (just to mention a place where some search might be done during iterations), but it is typically not worth the effort and the column is moe commonly found by brute force searching in the CSR structure. Adding a jacobian to a matrix for each flux then sums up to the brute force seach cost (however, it is a constant with good approximation) times an O(6V^2N) cost.
  7. Previous two points should actually be repeated on the O(N^2/3) faces of the boundary for bcs.
  8. Add source terms to the rhs (O(VN)) and their jacobians to the lhs (O(V^2N)).
  9. Actually solve the system of equations. Not going into the details of the thing here, but a typical AMG does multiple sweeps of a SGS, each one with its parallel exchange of solution variables + actually building the parallel structure for exchanging variables at the different levels it employs in that iteration. This times the number of times it is performed at each iteration.
  10. Update the solution from the system solve, an O(VN) operation.
  11. Do remaining office stuff, like monitoring quantities. Cost may vary according to the specific operation, but at leastl O(VN) operations are quite common here, followed by O(Log(P)) operations globally for the reduction on P processes.
EndFor

This is performed multiple times per time step for unsteady cases. Also, at some point, you want to write your solution on disk (P processes writing O(VPN) entries on file).

Typical industrial applications may have PN=O(10^6-10^9)

Now, I know that databases are an important part of the modern world, but I don't get how retrieving an entry from whatever data structure it is stored in can be more delicate, complicate or compute/memory intensive then the process described above. In the worst case it is a single O(PN) operation, but probably it is an O(Log(P)Log(N)) one.

Hip2BL7 March 19, 2021 04:26

Have you seen this?
 
Hi guys,

I'm wondering what you think of this:

8-stack picluster outperforming various laptops

I have some experience in using HPCs for OpenFOAM calculations at my university, but purely from an academic point of view so I'd like to know your thoughts are on hardware.

:D

flotus1 March 19, 2021 13:32

As is stated in the first 2 minutes of the video: this cluster of SBCs is first and foremost a learning tool.
And the code that was run could not be further from a FV CFD code. Low optimization, computationally intensive and virtually zero communication between threads/nodes (i.e. embarrassingly parallel).

New-to-CFD May 2, 2021 23:35

not worth it
 
lol I can confirm that I have done exactly this and built a 6node + 1master RPi3 cluster and ran OpenFOAM on it and it was fun but its a toy..and I doubt the 4's would be much better


All times are GMT -4. The time now is 08:47.