OpenFOAM goes multigpu
Dear All,
We are about to release a new version of OpenFOAM plugin aimed at acceleration of OF simulations on multigpu systems (also with single gpu). I was wondering if OpenFOAM users have some special feature requests. FYI, the plugin has the following functionality:  It is a plugin and does not require to recompile OF. You just change one line in the configuration file and gpubased solvers are used, if available.  Currently, we also provide CG in single precision at no cost to demonstrate its usefulness but the plugin works with any gpubased solver that produces matrices in CSR format.  Note: We charge for CG/BCGSTAB in double precision and support. Installation of the free OF plugin will be quite simple.  It will discover how many gpu cards the system has and will submit more intensive jobs to better cards.  The plugin will be available at speedit.vratis.com  So far we tested it with our solvers and we observed acceleration from several times on standard setup : GTX285+GTX460 to x38 on 4xTesla machines depending on the problem. At speedit.vratis.com there is a forum where we have opened the discussion on feature requests. I am looking forward to meeting you there. 
Excellent!

Wonderful!
Is it SpeedIT eXtreme 1.2? When do you release it? 
a few questions
Quote:
1. When you say 38x speedup are we talking about the linear system solvers or the whole solver (say simpleFoam)? If we are talking about the whole solver (simpleFoam again) then the 38X speedup sounds a little high since there are lots of other processes besides the linear system solver that takes a lot of time to complete. If we are talking about the linear system solver itself then that is a little more believable. Also...how does the code compare to more CPUs? What CPUs are we comparing to...P4s or Quad Core Xeons? 2. When you compare the speedup are you comparing single precision GPU to single precision OF or single precision GPU to double precision OF? 3. Have you ensured that you are comparing convergence the same way that OF is computing with a scaled residual? If not then your speedup calculations are not entirely accurate. 4. Parallel: Are you letting OF do the domain decomposition and then passing the decomposed domains to the GPU and communicate between each other or are you only solving the AX=b system on several GPUs through a parallel Krylov subspace CG (or alike) solver? If this is the later, then I would request that the code use the decomposed domains from decomposePar and work that way. This method should be faster on really large problems since there are other processes that can be done by the separate CPUs and the linear system solvers on the GPUs. I guess Im asking for parallel CPU and parallel GPU together. 5. What kind of preconditioners are available? Multigrid would be excellent, or even some sort of sparse approximate inverse (without fill in to prevent forming a dense matrix). 6. Wouldn't speedup depend tremendously on the memory bandwidth of host to device (main memory on the mother board to the GPU device memory)? If so then what type of mother board is being used for the comparison and does anyone know if the results can be scaled in a way that they can be compared to other setups? 7. Lastly, since the CUDA code is provided with the plugin, couldn't one just change a float to a double in the proper place? Again...great work! I like this type of work and just had a few questions. Dan 
Quote:
SpeedIT eXtreme 1.2 will be released probably this year with:  support for complex operations  maybe also with new SpMV kernels (that are faster that those in cusparse). 
Dear Dan. Let me answer your questions:
1. When you say 38x speedup are we talking about the linear system solvers or the whole solver (say simpleFoam)? If we are talking about the whole solver (simpleFoam again) then the 38X speedup sounds a little high since there are lots of other processes besides the linear system solver that takes a lot of time to complete. If we are talking about the linear system solver itself then that is a little more believable. Also...how does the code compare to more CPUs? What CPUs are we comparing to...P4s or Quad Core Xeons? It was a large simpleFoam job: 1 GPU vs. 4 GPU the speedup is x6. 4 GPU vs. 12CPU (Opteron 2.2GHZ) using diagonal preconditioners for both cases, the speedup is x38. 2. When you compare the speedup are you comparing single precision GPU to single precision OF or single precision GPU to double precision OF? single precision vs. single precision, and double precision vs. double precision. 3. Have you ensured that you are comparing convergence the same way that OF is computing with a scaled residual? If not then your speedup calculations are not entirely accurate. Yes. We checked also residuals. Send me a private message and I could send you the logs from the computations. 4. Parallel: Are you letting OF do the domain decomposition and then passing the decomposed domains to the GPU and communicate between each other or are you only solving the AX=b system on several GPUs through a parallel Krylov subspace CG (or alike) solver? Exactly. We take advantage of OF domain decomposition. If this is the later, then I would request that the code use the decomposed domains from decomposePar and work that way. This method should be faster on really large problems since there are other processes that can be done by the separate CPUs and the linear system solvers on the GPUs. I guess Im asking for parallel CPU and parallel GPU together. So far we replaced solvers with their gpuversions for simpleFOAM and pisoFOAM cases and observed acceleration. We did not used parallel CPU and parallel GPU together. 5. What kind of preconditioners are available? Multigrid would be excellent, or even some sort of sparse approximate inverse (without fill in to prevent forming a dense matrix). We are working on GAMG but I cannot estimate the time when we finish yet. 6. Wouldn't speedup depend tremendously on the memory bandwidth of host to device (main memory on the mother board to the GPU device memory)? If so then what type of mother board is being used for the comparison and does anyone know if the results can be scaled in a way that they can be compared to other setups? We noticed that Intel i7 processors provide higher memory bandwith. Older types perform worse. Of course if the number of iterations is low then our library is of no use because of memory transfers. This is why we are working on porting the whole piso solver to gpu. 7. Lastly, since the CUDA code is provided with the plugin, couldn't one just change a float to a double in the proper place? There are double kernels but in the eXtreme version of SpeedIT. Classic version supports only float. Again...great work! I like this type of work and just had a few questions. Thanks :) Best wishes, Lukasz 
great...I'll check it out as soon as it is available.
great...I'll check it out as soon as it is available. Thanks for answering my questions.
Dan 
dear lukasz ,this looks too good to be true .please allow one question about licences .you list the classic version correctly as gpl .but the academic and commercial versions cannot be used as openfoam plugins .as you know, openfoam is gpl .a plugin interface must be derived from openfoam header files .for this reason every plugin is also gpl .please explain your license model .we would like to purchase at least the evaluation version ,but if the legal foundation is unclear ,we have to order cuda solvers elsewhere .
thanks zhijun ! 
Dear Zhijun, you are of course correct. All the work derived from OF is GPL, meaning OF plugin and SpeedIT Classic are GPLbased. SpeedIT extreme is a separate development and it has no dependencies, bindings and relations to OF, except it supports CSR format which is in a public domain. You can use it in your own code.
Let me know if I answered your question. 
So basically, this means that the only product we can use with OpenFOAM is your product called SpeedIT Classic, who is a limited, single precision, but GPL licensed package; is that right?
Martin Quote:

It is like using Matlab with OF. Matlab supports CSR as well. Anyway, thank you for the fruitful discussion. We decided to put a special explanation on our web page for the OF users that explains licensing terms in more details.

Sorry, still cannot find that information on your Web site.
Why don't you put that information plainly on this OpenFOAM Message Board? Martin Quote:

OF Version
Does the OpenFoamplugin work with fixed OF version or most versions. I am using OF 1.5dev and 1.6.

Dear Josiah Xu. It has been tested on OF 1.6 and 1.7.

Dear All,
We are happy to announce a new release of the OpenFOAM plugin 1.1 (GPL License). Here is the list of features: MultiGPU support. Tested on Fermi architecture (GTX460 and Tesla C2050). Automated submission of the domain to the GPU cards (using decomposePar from OpenFOAM). Optimized submission of computational tasks to the best GPU card in the system for any number of computational threads. Plugin picks the most powerful GPU card for a single thread cases. You can freely download it at speedit.vratis.com. Enjoy! 
Complex matrices will be soon supported as well.
For those who asked me about that. Best, Lukasz 
BTW, does anybody is aware of OpenCL based solvers ?

Hello,
I have heard about this plugin recenlty, I was wondering if you could tell me if this new plugin works for solvers like interFoam and InterDyMFoam too. Thanks MEhran Quote:

Hello,
To see if you can use Speedit plugin for interFoam or interDyMFoam please check system/fvSolution file and see if case is using CG or BICG solvers. If yes you can substitute them with CG_accel and BiCG_accel respectively. Remeber that in Speedit Classic version only CG_accel is available. I hope that I helped. Best regards. Kuba. 
Hello Kuba,
Thanks for your answer. I checked my fvSolution. The solver for Pcorr & P_rgh are PCG and for U is PBicG. I guess it is not compatible with this Plugin. Am i right? Mehran boy;297175]Hello, To see if you can use Speedit plugin for interFoam or interDyMFoam please check system/fvSolution file and see if case is using CG or BICG solvers. If yes you can substitute them with CG_accel and BiCG_accel respectively. Remeber that in Speedit Classic version only CG_accel is available. I hope that I helped. Best regards. Kuba.[/QUOTE] 
All times are GMT 4. The time now is 14:43. 