This page is writted by a non english speaker; please excuse the bad grammar (And correct it!).
GPGPU also know as GP² are acronym of General Purpose Graphic Processor Unit.
This mean the use of graphics procesors -GPUs in place of CPUs- for general purpose programming.
Graphics processors have many orders of magnitude more power than PC chips. They equals little clusters (up to 10 PCs CPUs) in 2005. But in some applications they can gain up to 50x the power of PCs chips
A graphic unit have a main memory (up to 256/512 Mb by card today), and a graphic processor with many stages and parallel procesors:
First stage commonly is the vertex processor. It have commonly less parallel procesors than fragment stage. The vertex processor does calculations specialized in matrix transformations. Typically project 3D coordinates in 2D screen coordinates, but is not restricted to matrix products.
Then, follow a (still non programmable) stage where linear interpolations, and other works are done. This stage have poor flexibility, but can be useful to specialized tasks.
The last stage, the more useful for CFD, is the fragment processor: here 2D matrix of four numbers are processed. They map to 2D screen pixels containing four numbers -Red, Green, Blue and Alpha components of pixels- Can be thinked like a 4096x4096x4 matrix of 32 bits numbers.
Languages: Is possible to program directly with OpenGL Shading Language, his equivalent of Micorsoft DirectX, or Nvidia CG in a format very similar to c/c++. OpenGL and CG are full portable to non Microsoft enviroments. Those 3 languages are almost equals. Also exist languages like Brook GPU and c/c++ libraries/wrappings.
How much power? In a Geforce 7800 have been measured 160 Gigaflops (not peak, but maintained performance). But expect half to 1/3 this power in a general purpose/novice program. There are double chips video cards, and PC motherboards that support up to 4 video cards. This mean 160x2x4=1.2 Teraflops on one PC with 512*4=2 Gb of video RAM, But nvidia drivers support transparently only up to 2 chips running like one and without the double of memory. Then, for the novice there are only (160/3)x2=100 Gflops and 512 Mb of video RAM available on 2005 at cost of near 1000 U$S. In comparison, there is possible to put 2 x86 processors with double CPU on a motherboard, allowing up to a peak of 15/20*4=60/80 Gflops in a PC, that can be reached by programs that not fill the cache.
To see an article on double chip video cards see this: Two's Company, Four's a WOW! Sneak Preview of NVIDIA Quad GPU SLI, and for a quadruple PCI express motherboard see this: One Gigabyte Motherboard, Four Graphics Cards.
For more information see www.gpgpu.org
For a tutorial on 2D fluid simulation on GPGPU see this PDF: GPU Gems Fluid Chapter