|
[Sponsors] |
March 25, 2022, 02:25 |
how do we unify cpu+gpu development?
|
#1 |
Senior Member
Sayan Bhattacharjee
Join Date: Mar 2020
Posts: 495
Rep Power: 8 |
Hello everyone,
Developing separate code bases for CPUs and GPUs might be a limiting factor in future. OpenHyperflow for example, has separate branches setup for its CPU and CUDA code : https://github.com/sergeas67/OpenHyperFLOW2D Maintaining such a codebase becomes difficult in the long run, and if possible it would be good to have the same code run on both CPUs and GPUs. Is this a worthwhile effort? Another dev asked this same question before : https://stackoverflow.com/questions/9631833/ and the main answer stated that due to the hardware architectural differences, the codes needs to be optimized for the device it runs on. That's true. The differences will be even more severe as the GPU architecture evolves in the future. CPU architecture has been more or less stable for one or two decades. But GPU architecture is continuously improving and that will result in way more different coding strategies for both. So I don't know if it's possible to have the same code run on CPUs and GPUs. Should we just stick with MPI/OpenMP based parallelization for CPUs? OpenMP can now technically be used for coding GPUs : https://stackoverflow.com/questions/28962655/ but since this new feature is not being widely used today, we still don't know what are the limiting factors. Currently I can only see one strategy that might be useful : represent the equations in standard BLAS or LAPACK form, and use libraries that are either CPU/GPU accelerated. For example, we can create separate CPU/GPU accelerated function to add two arrays of floats. Then we would just need to use the appropriate function in our code. I'm looking for other strategies that might be good, but this is the only strategy which looks feasible to me currently. Since we need parallelization to do large amounts of work, it makes sense to write our code to allow better parallelization of these simple but very widely used mathematical methods. Some other non BLAS/LAPACK example of such a widely used method would be Gauss-Siedel Elimination, LU decomposition, Cholesky factorization, Fast Fourier Transforms etc.... Is this the correct approach? What can we do better? Thanks ~sayan |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
General recommendations for CFD hardware [WIP] | flotus1 | Hardware | 18 | February 29, 2024 12:48 |
GPU acceleration in Ansys Fluent | flotus1 | Hardware | 63 | May 12, 2023 02:48 |
[Resolved] GPU on Fluent | Daveo643 | FLUENT | 4 | March 7, 2018 08:02 |
Superlinear speedup in OpenFOAM 13 | msrinath80 | OpenFOAM Running, Solving & CFD | 18 | March 3, 2015 05:36 |
Star cd es-ice solver error | ernarasimman | STAR-CD | 2 | September 12, 2014 00:01 |