CFD Online Discussion Forums

CFD Online Discussion Forums (http://www.cfd-online.com/Forums/)
-   OpenFOAM (http://www.cfd-online.com/Forums/openfoam/)
-   -   OpenFOAM goes multi-gpu (http://www.cfd-online.com/Forums/openfoam/82023-openfoam-goes-multi-gpu.html)

Lukasz November 13, 2010 15:02

OpenFOAM goes multi-gpu
 
Dear All,

We are about to release a new version of OpenFOAM plugin aimed at acceleration of OF simulations on multi-gpu systems (also with single gpu).

I was wondering if OpenFOAM users have some special feature requests.
FYI, the plugin has the following functionality:

- It is a plugin and does not require to recompile OF. You just change one line in the configuration file and gpu-based solvers are used, if available.
- Currently, we also provide CG in single precision at no cost to demonstrate its usefulness but the plugin works with any gpu-based solver that produces matrices in CSR format.
- Note: We charge for CG/BCGSTAB in double precision and support. Installation of the free OF plugin will be quite simple.
- It will discover how many gpu cards the system has and will submit more intensive jobs to better cards.
- The plugin will be available at speedit.vratis.com
- So far we tested it with our solvers and we observed acceleration from several times on standard setup : GTX285+GTX460 to x38 on 4xTesla machines depending on the problem.

At speedit.vratis.com there is a forum where we have opened the discussion on feature requests. I am looking forward to meeting you there.

tcarrigan November 13, 2010 21:21

Excellent!

Ohbuchi November 13, 2010 23:45

Wonderful!
Is it SpeedIT eXtreme 1.2?
When do you release it?

chegdan November 14, 2010 23:55

a few questions
 
Quote:

Originally Posted by Lukasz (Post 283346)
Dear All,

We are about to release a new version of OpenFOAM plugin aimed at acceleration of OF simulations on multi-gpu systems (also with single gpu).

I was wondering if OpenFOAM users have some special feature requests.
FYI, the plugin has the following functionality:

- It is a plugin and does not require to recompile OF. You just change one line in the configuration file and gpu-based solvers are used, if available.
- Currently, we also provide CG in single precision at no cost to demonstrate its usefulness but the plugin works with any gpu-based solver that produces matrices in CSR format.
- Note: We charge for CG/BCGSTAB in double precision and support. Installation of the free OF plugin will be quite simple.
- It will discover how many gpu cards the system has and will submit more intensive jobs to better cards.
- The plugin will be available at speedit.vratis.com
- So far we tested it with our solvers and we observed acceleration from several times on standard setup : GTX285+GTX460 to x38 on 4xTesla machines depending on the problem.

At speedit.vratis.com there is a forum where we have opened the discussion on feature requests. I am looking forward to meeting you there.

First of all, great work! i have programmed CUDA and I know it can be challenging. I had several questions about your plugin:

1. When you say 38x speedup are we talking about the linear system solvers or the whole solver (say simpleFoam)? If we are talking about the whole solver (simpleFoam again) then the 38X speedup sounds a little high since there are lots of other processes besides the linear system solver that takes a lot of time to complete. If we are talking about the linear system solver itself then that is a little more believable. Also...how does the code compare to more CPUs? What CPUs are we comparing to...P4s or Quad Core Xeons?

2. When you compare the speedup are you comparing single precision GPU to single precision OF or single precision GPU to double precision OF?

3. Have you ensured that you are comparing convergence the same way that OF is computing with a scaled residual? If not then your speedup calculations are not entirely accurate.

4. Parallel: Are you letting OF do the domain decomposition and then passing the decomposed domains to the GPU and communicate between each other or are you only solving the AX=b system on several GPUs through a parallel Krylov subspace CG (or alike) solver? If this is the later, then I would request that the code use the decomposed domains from decomposePar and work that way. This method should be faster on really large problems since there are other processes that can be done by the separate CPUs and the linear system solvers on the GPUs. I guess Im asking for parallel CPU and parallel GPU together.

5. What kind of preconditioners are available? Multigrid would be excellent, or even some sort of sparse approximate inverse (without fill in to prevent forming a dense matrix).

6. Wouldn't speedup depend tremendously on the memory bandwidth of host to device (main memory on the mother board to the GPU device memory)? If so then what type of mother board is being used for the comparison and does anyone know if the results can be scaled in a way that they can be compared to other setups?

7. Lastly, since the CUDA code is provided with the plugin, couldn't one just change a float to a double in the proper place?

Again...great work! I like this type of work and just had a few questions.

Dan

Lukasz November 15, 2010 04:48

Quote:

Originally Posted by Ohbuchi (Post 283363)
Wonderful!
Is it SpeedIT eXtreme 1.2?
When do you release it?

That would be OF plugin 1.1 :) We should release it this week.

SpeedIT eXtreme 1.2 will be released probably this year with:
- support for complex operations
- maybe also with new SpMV kernels (that are faster that those in cusparse).

Lukasz November 15, 2010 05:12

Dear Dan. Let me answer your questions:


1. When you say 38x speedup are we talking about the linear system solvers or the whole solver (say simpleFoam)? If we are talking about the whole solver (simpleFoam again) then the 38X speedup sounds a little high since there are lots of other processes besides the linear system solver that takes a lot of time to complete. If we are talking about the linear system solver itself then that is a little more believable. Also...how does the code compare to more CPUs? What CPUs are we comparing to...P4s or Quad Core Xeons?

It was a large simpleFoam job:
1 GPU vs. 4 GPU the speedup is x6.

4 GPU vs. 12CPU (Opteron 2.2GHZ) using diagonal preconditioners
for both cases, the speedup is x38.

2. When you compare the speedup are you comparing single precision GPU to single precision OF or single precision GPU to double precision OF?
single precision vs. single precision, and double precision vs. double precision.

3. Have you ensured that you are comparing convergence the same way that OF is computing with a scaled residual? If not then your speedup calculations are not entirely accurate.
Yes. We checked also residuals. Send me a private message and I could send you the logs from the computations.

4. Parallel: Are you letting OF do the domain decomposition and then passing the decomposed domains to the GPU and communicate between each other or are you only solving the AX=b system on several GPUs through a parallel Krylov subspace CG (or alike) solver?
Exactly. We take advantage of OF domain decomposition.

If this is the later, then I would request that the code use the decomposed domains from decomposePar and work that way. This method should be faster on really large problems since there are other processes that can be done by the separate CPUs and the linear system solvers on the GPUs. I guess Im asking for parallel CPU and parallel GPU together.
So far we replaced solvers with their gpu-versions for simpleFOAM and pisoFOAM cases and observed acceleration. We did not used parallel CPU and parallel GPU together.

5. What kind of preconditioners are available? Multigrid would be excellent, or even some sort of sparse approximate inverse (without fill in to prevent forming a dense matrix).
We are working on GAMG but I cannot estimate the time when we finish yet.

6. Wouldn't speedup depend tremendously on the memory bandwidth of host to device (main memory on the mother board to the GPU device memory)? If so then what type of mother board is being used for the comparison and does anyone know if the results can be scaled in a way that they can be compared to other setups?
We noticed that Intel i7 processors provide higher memory bandwith. Older types perform worse. Of course if the number of iterations is low then our library is of no use because of memory transfers. This is why we are working on porting the whole piso solver to gpu.

7. Lastly, since the CUDA code is provided with the plugin, couldn't one just change a float to a double in the proper place?
There are double kernels but in the eXtreme version of SpeedIT. Classic version supports only float.

Again...great work! I like this type of work and just had a few questions.
Thanks :)

Best wishes,
Lukasz

chegdan November 15, 2010 16:48

great...I'll check it out as soon as it is available.
 
great...I'll check it out as soon as it is available. Thanks for answering my questions.

Dan

xizhijun November 15, 2010 17:13

dear lukasz ,this looks too good to be true .please allow one question about licences .you list the classic version correctly as gpl .but the academic and commercial versions cannot be used as openfoam plugins .as you know, openfoam is gpl .a plugin interface must be derived from openfoam header files .for this reason every plugin is also gpl .please explain your license model .we would like to purchase at least the evaluation version ,but if the legal foundation is unclear ,we have to order cuda solvers elsewhere .
thanks zhijun !

Lukasz November 15, 2010 23:38

Dear Zhijun, you are of course correct. All the work derived from OF is GPL, meaning OF plugin and SpeedIT Classic are GPL-based. SpeedIT extreme is a separate development and it has no dependencies, bindings and relations to OF, except it supports CSR format which is in a public domain. You can use it in your own code.

Let me know if I answered your question.

mbeaudoin November 16, 2010 22:37

So basically, this means that the only product we can use with OpenFOAM is your product called SpeedIT Classic, who is a limited, single precision, but GPL licensed package; is that right?

Martin

Quote:

Originally Posted by Lukasz (Post 283554)
Dear Zhijun, you are of course correct. All the work derived from OF is GPL, meaning OF plugin and SpeedIT Classic are GPL-based. SpeedIT extreme is a separate development and it has no dependencies, bindings and relations to OF, except it supports CSR format which is in a public domain. You can use it in your own code.

Let me know if I answered your question.


Lukasz November 17, 2010 15:08

It is like using Matlab with OF. Matlab supports CSR as well. Anyway, thank you for the fruitful discussion. We decided to put a special explanation on our web page for the OF users that explains licensing terms in more details.

mbeaudoin November 17, 2010 22:13

Sorry, still cannot find that information on your Web site.

Why don't you put that information plainly on this OpenFOAM Message Board?

Martin

Quote:

Originally Posted by Lukasz (Post 283817)
It is like using Matlab with OF. Matlab supports CSR as well. Anyway, thank you for the fruitful discussion. We decided to put a special explanation on our web page for the OF users that explains licensing terms in more details.


faithhidy November 18, 2010 12:12

OF Version
 
Does the OpenFoamplugin work with fixed OF version or most versions. I am using OF 1.5-dev and 1.6.

Lukasz November 18, 2010 15:18

Dear Josiah Xu. It has been tested on OF 1.6 and 1.7.

Lukasz November 19, 2010 14:53

Dear All,

We are happy to announce a new release of the OpenFOAM plugin 1.1 (GPL License).
Here is the list of features:

-Multi-GPU support.
-Tested on Fermi architecture (GTX460 and Tesla C2050).
-Automated submission of the domain to the GPU cards (using decomposePar from OpenFOAM).
-Optimized submission of computational tasks to the best GPU card in the system for any number of computational threads.
-Plugin picks the most powerful GPU card for a single thread cases.

You can freely download it at speedit.vratis.com. Enjoy!

Lukasz November 26, 2010 22:13

Complex matrices will be soon supported as well.
For those who asked me about that.

Best,
Lukasz

Lukasz December 8, 2010 15:15

BTW, does anybody is aware of OpenCL based solvers ?

farhagim February 25, 2011 18:26

Hello,

I have heard about this plugin recenlty, I was wondering if you could tell me if this new plugin works for solvers like interFoam and InterDyMFoam too.

Thanks
MEhran

Quote:

Originally Posted by Lukasz (Post 286582)
BTW, does anybody is aware of OpenCL based solvers ?


stainboy February 27, 2011 09:41

Hello,

To see if you can use Speedit plugin for interFoam or interDyMFoam please check system/fvSolution file and see if case is using CG or BICG solvers. If yes you can substitute them with CG_accel and BiCG_accel respectively. Remeber that in Speedit Classic version only CG_accel is available.

I hope that I helped.
Best regards.
Kuba.

farhagim March 12, 2011 12:25

Hello Kuba,

Thanks for your answer. I checked my fvSolution. The solver for Pcorr & P_rgh are PCG and for U is PBicG. I guess it is not compatible with this Plugin. Am i right?

Mehran


boy;297175]Hello,

To see if you can use Speedit plugin for interFoam or interDyMFoam please check system/fvSolution file and see if case is using CG or BICG solvers. If yes you can substitute them with CG_accel and BiCG_accel respectively. Remeber that in Speedit Classic version only CG_accel is available.

I hope that I helped.
Best regards.
Kuba.[/QUOTE]


All times are GMT -4. The time now is 14:43.