CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Wiki > GPGPU

GPGPU

From CFD-Wiki

(Difference between revisions)
Jump to: navigation, search
m
(Adding academic Resources)
 
(14 intermediate revisions not shown)
Line 1: Line 1:
-
   This page is writted by a non english speaker; please excuse the bad grammar (And correct it!).
+
   This article needs good pictures and diagrams.
 +
==Introduction==
 +
GPGPU is an acronym for General Purpose Graphic Processor Unit.  A GPGPU is any graphics processor being used for general computing beyond graphics. [http://en.wikipedia.org/wiki/Graphics_processing_unit GPUs]are widely available, and often targeted at the computer gaming industry.  Graphics workloads are very parallel, and so GPUs developed as large-scale parallel computation machines.  Originally GPGPU processing was done by tricking the GPU by disguising computation loads as graphic loads.  In recent years, GPU manufacturers have been actively encouraging GPGPU computing with the release of specialized languages which support GPGPU commands.  GPUs incorporate many more computational cores than their equivalent CPUs, and so the performance of parallel operations can be greatly enhanced.  Programming in parallel on a GPU has the same justification given for [http://www.cfd-online.com/Wiki/Parallel_computing parallel computing] in general.
-
GPGPU also know as GP² are acronym of General Purpose Graphic Processor Unit.
+
==Application to CFD==
 +
GPGPU computing offers large amounts of compute power, which can be tapped for parallel components of CFD algorithms, while the CPU performs the serial portions of the algorithm.  GPGPU languages also support data-parallel computation, similar to vector processors.  In short, modern GPUs provide raw computational power orders of magnitude larger than a CPU and can fit inside a single computer case.
-
This mean the use of graphics procesors -GPUs in place of CPUs- for general purpose programming.
+
==Graphics Architecture==
 +
A GPU has a main memory, many stages and parallel processors.  Traditional GPUs have a linear pipeline with several distinct stages: application, command, geometry, rasterization, fragment, and display.  Intel's project [http://en.wikipedia.org/wiki/Larrabee_%28microarchitecture%29 Larrabee] promises a reconfigurable graphics pipeline, with many of the traditional steps being handled in software.  Such a development would expose even more of the GPUs compute power to parallel programmers.
-
This article talks about commodity hardware, those video cards that are toys to play games, sold to common PC users, and his capabilities as math coprocessor for CFD.
+
===Traditional Pipeline===
 +
A traditional pipeline will have three main computation stages: geometry, rasterization, and fragment.  Graphics is traditionally done with triangles, and a GPU will operate on a batch of triangle verticies to first create fragments, which will help create the pixels that end up on the monitor.
 +
====Geometry====
 +
Vertex processing is handled in the geometry step.  Geometry from the CPU is transformed based on the vertex shaders (programs) written to the GPU.  These processors specialized in matrix transformations.  Common operations include projecting 3D coordinates onto 2D screen coordinates.  The closest analogue would be a vector or quaternion processor since each vector operation takes a series of components which represent a triangle vertex. Lagrangian frame computations might be well suited to vertex shaders.
-
Graphics processors have many orders of magnitude more power than PC chips. They equals the power of little MIMD/SIMD clusters (up to 10 PCs CPUs) in 2005. But in some applications they can gain up to 50x the power of PCs chips.
+
====Rasterization====
 +
Rasterization takes the transformed vectors from the geometry step and creates fragments from the geometry. The easiest way to think of rasterization, is "chunking" a large triangle into many fragments.  This stage is typically done with fixed function specialized hardware.
-
A graphic unit have a main memory (up to 256/512 Mb by card in 2005), and a graphic processor with many stages and parallel procesors:
+
====Fragment====
 +
Fragment processing requires floating point math, as the fragments are colored and filtered to become pixels.  This stage is where muchof the interesting compute for CFD can happen, as the parallel floating point processors can be repurposed with either fragment shaders or special purpose languages to do non-graphics floating point math.
-
First stage commonly is the vertex processor. It have commonly less parallel procesors than fragment stage.
+
==Performance==
-
The vertex processor does calculations specialized in matrix transformations. Typically project 3D coordinates in 2D screen coordinates, but is not restricted to matrix products.
+
NVidia has been actively promoting GPGPU computing in recent years. They introduced CUDA in 2007, giving programmer direct access to the GPU. NVidia maintains a [http://www.nvidia.com/object/cuda_showcase_html.html CUDA community showcase] on their CUDA website showing the performance boost to a variety of applications when making use of the GPU.
-
Then, follow a (still non programmable) stage where linear interpolations, and other works are done. This stage have poor flexibility, but can be useful to specialized tasks.
+
When programming GPGPU systems, it is important to remember that there is significant overhead involved in transferring data between the CPU and the GPU. Serial programs with memory requirements beyond those available on the GPU will not see the same dramatic performance enhancements as programs with low memory transfer overhead.
-
The last stage, the more useful for CFD, is the fragment processor. Latest GPUs from 2005 have up to 24 parallel fragment processors, capable of process 4 IEEE floating point each one (24x4=96 parallel calculus!); here 2D matrix of four numbers are processed. They map to 2D screen pixels containing four numbers -Red, Green, Blue and Alpha components of pixels- Can be thinked like a 4096x4096x4 matrix of 32 bits numbers/data.
+
==Languages==
 +
There exist several languages which support direct control of the GPU. OpenCL, Microsoft's DirectCompute, and Nvidia's CUDA are good examples of these.  For more information about how to program on the GPU, see the [http://en.wikipedia.org/wiki/GPGPU Wikipedia] site on GPGPU computing. Coding syntax is similar to C/C++ programming syntax.
-
Languages:
+
Additionally, if a programmer wants to use the graphics card to compute, and then display the results of their CFD equations. OpenGL hooks into C++ to enable that, and Microsoft's DirectX performs a similar role.
-
Is possible to program directly with OpenGL Shading Language -[http://oss.sgi.com/projects/ogl-sample/registry/ARB/GLSLangSpec.Full.1.10.59.pdf GLSL]-, his equivalent of Microsoft, DirectX shading language HLSL, or Nvidia CG [[http://developer.nvidia.com/object/cg_toolkit.html free CG toolkit]]; all in a format very similar to c/c++. OpenGL and CG are full portable to non Microsoft enviroments. Those 3 languages are almost identicals.
+
-
Also exist languages like [http://graphics.stanford.edu/projects/brookgpu/ Brook GPU] and c/c++ libraries/wrappings.
+
-
Since representation of CFD data requires graphic drawing, learning OpenGl is extremly useful for CFD, and from here, programming GPUs is a very straighforward step to do.
+
-
To end of 2005 two manufacturers produce programmable video chips for PCs: ATI and Nvidia, but ATI have only 24 bits of precission meanwile Nvidia reach 32 bits.
+
All of these languages have active communities participating in their further development. There are many code samples and tools on the internet to help a programmer get started with GPGPU computing.  Go for it!
-
How much power?
+
==References==
-
In a Geforce 7800 have been measured 160 Gigaflops (not peak, but maintained performance). But expect half to 1/3 this power in a general purpose/novice program.
+
===Coding Resources===
-
There are at least double chips video cards, and PC motherboards that support up to 4 video cards. This mean 160x2x4=1.2 Teraflops (1.2/2=600 Gigaflops) on one PC with 512*4=2 Gb of 'video' RAM.
+
[http://developer.nvidia.com/object/gpucomputing.html CUDA]
-
But nvidia drivers support '''transparently''' only up to 2 chips running like one and without the double of memory. Then, for the novice there are only (160/3)x2=100 Gflops and 512 Mb of video RAM available on 2005 at cost of near 1000 U$S. In comparison, there is possible to put 2 x86 processors with double CPU on a motherboard, allowing up to a peak of 15/20*4=60/80 Gflops in a PC, that can be reached by programs that not fill the cache.
+
-
To see an article on double chip video cards see this article: [http://www.tomshardware.com/2005/12/14/sneak_preview_of_the_nvidia_quad_gpu_setup/ Two's Company, Four's a WOW! Sneak Preview of NVIDIA Quad GPU SLI], and for a quadruple PCI express motherboard see this: [http://www.tomshardware.com/2005/10/04/one_gigabyte_motherboard/ One Gigabyte Motherboard, Four Graphics Cards].
+
[http://www.khronos.org/opencl/ OpenCL]
-
For more information see [http://www.gpgpu.org www.gpgpu.org]
+
[http://www.opengl.org OpenGL]
-
For a tutorial on 2D fluid simulation on GPGPU see this PDF: [http://download.nvidia.com/developer/SDK/Individual_Samples/DEMOS/OpenGL/src/gpgpu_fluid/docs/GPU_Gems_Fluids_Chapter.pdf GPU Gems Fluid Chapter]
+
[http://www.nvidia.com/object/cuda_showcase_html.html CUDA Showcase]
-
For free books on OpenGL download the red, blue, and orange books from [http://www.opengl.org www.opengl.org] (take nottice that this site offers both commercial and free versions of these books)
+
[http://graphics.stanford.edu/projects/brookgpu/ Brook GPU]
-
{{stub}}
+
[http://download.nvidia.com/developer/SDK/Individual_Samples/DEMOS/OpenGL/src/gpgpu_fluid/docs/GPU_Gems_Fluids_Chapter.pdf GPU Gems Fluid Chapter]
 +
 
 +
[http://oss.sgi.com/projects/ogl-sample/registry/ARB/GLSLangSpec.Full.1.10.59.pdf GLSL]
 +
===Academic Resources===
 +
Resources with a * require academic subscriptions for papers, or book purchases, you can often use your local university library.
 +
 
 +
{{reference-paper|author=Brandvik, Tobias and Graham Pullan|year=2008|title=Acceleration of a 3D Euler Solver using Commodity Graphics Hardware|rest=Proceedings of the 46th AIAA Aerospace Sciences Meeting ([http://gpgpu.org/2008/01/18/acceleration-of-a-3d-euler-solver-using-commodity-graphics-hardware view])}}
 +
 
 +
{{reference-paper|author=Cohen, J. and Molemaker, M.J.|year=1997|title=A fast double precision CFD code using CUDA|rest=Journal of Physics Soc. Japan vol. 66 pp. 2237 - 2341, 1997([http://www.jcohen.name/ view])}}
 +
 
 +
{{reference-paper|author=Christen, M., O. Schenk, E. Neufeld, P. Messmer, and H. Burkhart|year=2009|title=Parallel Data-Locality Aware Stencil Computations on Modern Micro-architectures|rest=Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing-Volume([http://doi.ieeecomputersociety.org/10.1109/IPDPS.2009.5161031 view])}}
 +
 
 +
{{reference-paper|author=Elsen, E. and Legresley, P. and Darve, E.|year=2008|title=Large calculation of the flow over a hypersonic vehicle using a GPU|rest=Journal of Computational Physics vol 227 num. 24 pp. 10148 - 10161([http://portal.acm.org/citation.cfm?id=1454864 view])}}
 +
 
 +
{{reference-book|author=Hagen, Trond Runar, Knut-Andreas Lie and Jostein R. Natvig |year=2006|title=Solving the Euler Equations on Graphics Processing Units|rest=in collection Proceedings of the 6th International Conference on Computational Science, pp. 220 - 227 ([http://www.springerlink.com/index/p5476l78431t4573.pdf view*])}}
 +
 
 +
{{reference-book|author=Harris, Mark |year=2004|title=Fast Fluid Dynamics Simulation on the GPU|rest=Ed. Fernando Randima, in collection GPU Gems, Ch. 38 ([http://http.developer.nvidia.com/GPUGems/gpugems_ch38.html view*])}}
 +
 
 +
{{reference-paper|author=Kolb, Andreas and Nicolas Cuntz|year=2005|title=Dynamic Particle Coupling for GPU-based Fluid Simulation|rest=Proceedings of the 18th Symposium on Simulation Technique, pp. 722 - 727 ([http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.89.2285 view])}}
 +
 
 +
{{reference-paper|author=Phillips, Everett H., Yao Zhang, Roger L. Davis, and John D. Owens|year=2009|title=Rapid Aerodynamic Performance Prediction on a Cluster of Graphics Processing Units|rest=Proceedings of the 47th AIAA Aerospace Sciences Meeting ([http://graphics.cs.ucdavis.edu/publications/print_pub?pub_id=958 view])}}
 +
 
 +
{{reference-paper|author=Liu, Youguan, Xuehui Liu and Enhua Wu|year=2004|title=Real-Time 3D Fluid Simulation on the GPU with complex obstacles|rest=Proceedings of Pacific Graphics 2004, pp. 247 - 256 ([http://ieeexplore.ieee.org/Xplore/arnumber=1348355 view*])}}
 +
 
 +
===News Sites===
 +
[http://www.gpgpu.org gpgpu.org]
 +
 
 +
 
 +
 
 +
 
 +
[[Category: Acronyms]]

Latest revision as of 01:03, 22 March 2010

 This article needs good pictures and diagrams.

Contents

Introduction

GPGPU is an acronym for General Purpose Graphic Processor Unit. A GPGPU is any graphics processor being used for general computing beyond graphics. GPUsare widely available, and often targeted at the computer gaming industry. Graphics workloads are very parallel, and so GPUs developed as large-scale parallel computation machines. Originally GPGPU processing was done by tricking the GPU by disguising computation loads as graphic loads. In recent years, GPU manufacturers have been actively encouraging GPGPU computing with the release of specialized languages which support GPGPU commands. GPUs incorporate many more computational cores than their equivalent CPUs, and so the performance of parallel operations can be greatly enhanced. Programming in parallel on a GPU has the same justification given for parallel computing in general.

Application to CFD

GPGPU computing offers large amounts of compute power, which can be tapped for parallel components of CFD algorithms, while the CPU performs the serial portions of the algorithm. GPGPU languages also support data-parallel computation, similar to vector processors. In short, modern GPUs provide raw computational power orders of magnitude larger than a CPU and can fit inside a single computer case.

Graphics Architecture

A GPU has a main memory, many stages and parallel processors. Traditional GPUs have a linear pipeline with several distinct stages: application, command, geometry, rasterization, fragment, and display. Intel's project Larrabee promises a reconfigurable graphics pipeline, with many of the traditional steps being handled in software. Such a development would expose even more of the GPUs compute power to parallel programmers.

Traditional Pipeline

A traditional pipeline will have three main computation stages: geometry, rasterization, and fragment. Graphics is traditionally done with triangles, and a GPU will operate on a batch of triangle verticies to first create fragments, which will help create the pixels that end up on the monitor.

Geometry

Vertex processing is handled in the geometry step. Geometry from the CPU is transformed based on the vertex shaders (programs) written to the GPU. These processors specialized in matrix transformations. Common operations include projecting 3D coordinates onto 2D screen coordinates. The closest analogue would be a vector or quaternion processor since each vector operation takes a series of components which represent a triangle vertex. Lagrangian frame computations might be well suited to vertex shaders.

Rasterization

Rasterization takes the transformed vectors from the geometry step and creates fragments from the geometry. The easiest way to think of rasterization, is "chunking" a large triangle into many fragments. This stage is typically done with fixed function specialized hardware.

Fragment

Fragment processing requires floating point math, as the fragments are colored and filtered to become pixels. This stage is where muchof the interesting compute for CFD can happen, as the parallel floating point processors can be repurposed with either fragment shaders or special purpose languages to do non-graphics floating point math.

Performance

NVidia has been actively promoting GPGPU computing in recent years. They introduced CUDA in 2007, giving programmer direct access to the GPU. NVidia maintains a CUDA community showcase on their CUDA website showing the performance boost to a variety of applications when making use of the GPU.

When programming GPGPU systems, it is important to remember that there is significant overhead involved in transferring data between the CPU and the GPU. Serial programs with memory requirements beyond those available on the GPU will not see the same dramatic performance enhancements as programs with low memory transfer overhead.

Languages

There exist several languages which support direct control of the GPU. OpenCL, Microsoft's DirectCompute, and Nvidia's CUDA are good examples of these. For more information about how to program on the GPU, see the Wikipedia site on GPGPU computing. Coding syntax is similar to C/C++ programming syntax.

Additionally, if a programmer wants to use the graphics card to compute, and then display the results of their CFD equations. OpenGL hooks into C++ to enable that, and Microsoft's DirectX performs a similar role.

All of these languages have active communities participating in their further development. There are many code samples and tools on the internet to help a programmer get started with GPGPU computing. Go for it!

References

Coding Resources

CUDA

OpenCL

OpenGL

CUDA Showcase

Brook GPU

GPU Gems Fluid Chapter

GLSL

Academic Resources

Resources with a * require academic subscriptions for papers, or book purchases, you can often use your local university library.

Brandvik, Tobias and Graham Pullan (2008), "Acceleration of a 3D Euler Solver using Commodity Graphics Hardware", Proceedings of the 46th AIAA Aerospace Sciences Meeting (view).

Cohen, J. and Molemaker, M.J. (1997), "A fast double precision CFD code using CUDA", Journal of Physics Soc. Japan vol. 66 pp. 2237 - 2341, 1997(view).

Christen, M., O. Schenk, E. Neufeld, P. Messmer, and H. Burkhart (2009), "Parallel Data-Locality Aware Stencil Computations on Modern Micro-architectures", Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing-Volume(view).

Elsen, E. and Legresley, P. and Darve, E. (2008), "Large calculation of the flow over a hypersonic vehicle using a GPU", Journal of Computational Physics vol 227 num. 24 pp. 10148 - 10161(view).

Hagen, Trond Runar, Knut-Andreas Lie and Jostein R. Natvig (2006), Solving the Euler Equations on Graphics Processing Units, in collection Proceedings of the 6th International Conference on Computational Science, pp. 220 - 227 (view*).

Harris, Mark (2004), Fast Fluid Dynamics Simulation on the GPU, Ed. Fernando Randima, in collection GPU Gems, Ch. 38 (view*).

Kolb, Andreas and Nicolas Cuntz (2005), "Dynamic Particle Coupling for GPU-based Fluid Simulation", Proceedings of the 18th Symposium on Simulation Technique, pp. 722 - 727 (view).

Phillips, Everett H., Yao Zhang, Roger L. Davis, and John D. Owens (2009), "Rapid Aerodynamic Performance Prediction on a Cluster of Graphics Processing Units", Proceedings of the 47th AIAA Aerospace Sciences Meeting (view).

Liu, Youguan, Xuehui Liu and Enhua Wu (2004), "Real-Time 3D Fluid Simulation on the GPU with complex obstacles", Proceedings of Pacific Graphics 2004, pp. 247 - 256 (view*).

News Sites

gpgpu.org

My wiki