OpenCL linear solver for OpenFoam 1.7 (alpha) ---------clFoam v0.1 come out
The OpenCL solver plugin : clFoam v0.1 come out for test.
Until now, clFoam single precision has been tested on ATI 5650M GPU and NVidia Tesla C2050. The speed is slightly slower than CPU on Tesla C2050 for 160000 cells of case: cavity 4 times steps (clPCG). (see profilingDatasheet.xls in profiling data/ for details)
The openCL solver is still promising, as it is a new tech and has great space to improve.
Quite a lot of work to do, any advice on improving the efficiency is appreciated. further, there must be some errors in the manual, DO leave me a email to correct them.
Thanks very much
1. Project Layout
# file system structure of the project generated by command:
there are 3 projects(subfolders) in clFoam
clUtils/ basic vector csrMatrix operation written by author
Tested and profiled on AMD_STREAM_SDK, SP on GPU and DP on CPU
clFoam/ clPCG and clPBICG solver based on clUtils/
Tested and profiled on AMD_STREAM_SDK , single precison on GPU
vclFoam/ a wrapper to call viennaCL blas solver
Not finished, there is a bug
# other resource included
doc/ some useful documents, tutorials, install manuals
bin/ some bash scripts
SpeedITOFPlugin1.1/ is downloaded from SpeedIT toolkit website and edited for SP support
(1)clUtils : single precision works for both AMD and NV GPU
double precision past the test on openCL via GPU
double precision on cuda 3.1, fails for "OUT_OF_RESOURCE"
double precision NOT work properly on Tesla C2050 Cuda 3.1
(2)clFoam is usable for only single precison on GPU, clPCG and clPBiCG
(see profilingDatasheet.xls in profiling data/ for details)
For double precision, it should work but still buggy.
I did not have hardware handy for debug, only ssh assess to the remote cluster without upgrade to CUDA 4.0
(3)vclFoam is totally not usable,
As vclFoam will be not probably faster than clFoam, I do not spend quite a lot time on that plugin
clFoam requires the following:
* A recent C++ compiler (e.g. gcc 4.x.x), GCC >4.4 is needed!!!
* OpenFoam 1.7.X
* OpenCL: For accessing GPUs(shared library and include files)
For AMD GPUs, install the AMD_STREAM_SDK
SEE installation guide:
For Nvida GPUs, CUDA_SDK and CUDA_TOOLKIT
SEE installation guide:
* uBLAS : (shipped with the Boost libraries)
#sudo apt-get install boost
* viennaCL 1.1 header has been put into vclFoam,
the install tutorials are put in separate files:
4. Authors and Contact
June 01 2011
======================== old post ==============================
An openCL solver is planned Xmas 2011, inspired by speedIT plugin free for Single Precision.
At first, I want wrapper the BLAS solvers from ViennaCL.1.0.5, but there is always some error, so I just write my only PCG and BiPCG solvers. I have not fully profile the solver, it is slower on my laptop ATI card, but I am trying on the Tesla C2050. The first version of technote(first and only test on my ATI 5650) is on my blog.
The code will be release as GPL for solver wrapper and BSD for the clUtils(BLAS function).
If someone is interested in the ViennaCL solver. I will upload my wrapper. So he/she can debug. I can not include the *.hpp of ViennaCL. I am trying on the NVidida cards, hopefully, it can work.
In my opinion, the GPU solver will not greatly faster than CPU, because the preconditioners of OF can not be paralleled. Yet, it should be promising for DSMC method, I will try it after my PhD thesis submission.
Recently, my colleague send me a link to the 'ofgpu' from symscape.com.
I attempt to compile this solver with mime, but It seems work only for windows version. Am I right? if not share me some tips to compile on Linux.
At least, give me some idea, how fast it is on GPU.
I am extremely busy this days to finish my PhD thesis. I do not have time to debug, profiling so many GPU solver, I have spent one week on the Telsa GPU on remote cluster, will give further profiling result for the Openfoam conference this year. I find there is a bug prevending me to compile with double precision support on GPU of remote cluster.
Any advice and suggestions are appreciated on ViennaCL and ofgpu.
Email: jasonyale (at) gmail.com
The University of Manchester
May 20, 2011.
ofgpu is cross-platform
I don't currently have any benchmarks.
You can find the original CFD-Online announcement at:
There is a bench marked using PCG solver from the speedit class plugin
by Japanese guy. It shows it is 3times SLOWER than CPU !.
I have come with similar result on my laptop, but I am trying on our university HPC Tesla C2050. I got an error change from SP to DP, so I have yet finished the benchmark. The bottleneck seems to be the kernel schedule, Seeing from the visual profiler. it use only about 1% time to calculate the kernel(viennacl vector bench). but I am still new to GPU, I am not sure how to improve the performance.
I know the ofgpu can be built on Linux. but the install tutorial is a little messy. My understanding is that even linux users need to patch the source developed for windows, and need to rebuild the source. I think that is not necessary, am I right?
Dr Jasak said interface supdate in matrix muliplication(Amul() Tmul()) should not be overlooked. I am afraid this will make the GPU solver even slower. I have not dig into the speedit plugin. I am not sure how they make GPU work with MPI.
I would classify the patch and build procedure as advanced, not messy.
I am trying to use ofgpu too, and i face some difficulties with tht patching of OpenFoam...
Is there anybody who can help me ?
|All times are GMT -4. The time now is 21:41.|