CFD Online Discussion Forums - Any such thing as a CFD accelerator card?? Why not?

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)

- Hardware (https://www.cfd-online.com/Forums/hardware/)

- - Any such thing as a CFD accelerator card?? Why not? (https://www.cfd-online.com/Forums/hardware/80723-any-such-thing-cfd-accelerator-card-why-not.html)

Any such thing as a CFD accelerator card?? Why not?

I am here hoping to save hours of research maybe someone here knows or has a direction for me to look. I am a hardware guy, do computer installs and maintenance. I also design electronic equipment including specialized microprocessor designs. I have a client that uses Ansys CXF 11 suite looking to go to 12. He does a lot of work in Macrosonics (high amplitude sound waves) in gases. He also does some container/wall interaction and focused wave type work. The problems he sets up take days to run on a 32 bit Vista Intel Core2 Quad Q9505. I know there are better processors out, there is always an improved front side bus chip set coming too, I have read about parallel processing as well. I am not looking for that information, it is readily available everywhere I look. I am curious if there are other or better solutions for the type of work he is doing. Is there anything that would be like a 1000 or 10,000 times faster? Ok maybe 30 times is more realistic. Is there any type of specialized hardware device for running CXF solver or some other CFD solver. I am looking for something that would be comparable to what a graphics accelerator can do for 3D graphical transformations, they are 1000's of times faster than piping it all through the main CPU. I understand graphical transformations and CFD transformations are not the same thing but it gets the point across. Pardon my ignorance on the subject but what is the main limiting factor. How complex is the computation on a single block (element, vertex, node?) not sure the common term for this one piece in the mesh. What does the input output list look like? Is pressure, temperature, velocity, turbulence, vector, direction, entropy getting close or is it much more complicated? I understand it can vary problem to problem but when doing transients with full energy results for example what all is done to a node each pass? Is there something inherent about doing a CFD solution that restricts the number and order of the nodes being calculated at any given moment in time, in other words does the solution have to pass through the mesh something like a wave on every interaction? If you had the hardware power can you do 500 nodes at once with out having the results of adjacent nodes during one iteration? Not sure I have the iteration terminology right but I mean the inner most or smallest looping part of the solution as CXF solver runs.

Reference the link here

http://www.cfd-online.com/Wiki/Parallel_computing#Shared_memory_multiprocessor

I have seen this link and similar ones like it, as I read them I have not seen anything about specialized processors, they read like all processors are the same and something like a Pentium 4 or better core. Seems like the computation is done one node to one processor core at a time and if so that has to be a grossly inefficient use of die space. There is no way it take a die anything close to the size of a P4 to do one node calculation. Has anybody heard of subdividing the CFD solution into to specialized processors? In graphics processing as you see things like hundreds of parallel pixel streaming processors and texture shader you must realize that cost, size, and complexity of one pixel processor is .01% that of a Pentium 4 core. That one pixel processor is also 1000's of times faster at the one job it does. The trade off is that pixel calculations is all that processor can do. A CFD equivalent would be a drop in card that had all the memory to hold the problem, a very wide data path to that memory, and maybe something like 200 turbulence eddy processors, 400 vector/rate calculators and 100 node processors or whatever the sales force thinks they should be called.
So I would love to hear feedback, get some direction, constructive criticism is always welcome. I would imagine my thoughts about the direction of the technology have been considered before by many others but maybe not. Regardless if I can't find any such devices available I will definitely be researching the development of such a device myself so if I have inspired anyone that can contribute on the computational and or coding side I would love to hear from you.

Sorry so long, I got on a roll today.

You might consider a Solid State Hard Drive. Access time is .1 ms rather than 8.9 ms for a 7200 rpm sata. Installing multiple SSD in a RAID 0 config gives a vast improvement.
People are also taking these solid state drives and using them as system ram.
You could try a solid state drive set up as a Ram Drive. Since you are most likely in a windows enviroment and do not want to waste an Ansys install on experimentation , you could experiment with Elmer Finite Element Analysis software. It is free. Uses Fortran. Pretty good program. Very capable.
Multiple SSD in RAID, SSD as system ram, and SSD as a Ram Drive. A Ram drive would most likely be the fastest. Could try all three at once.
SSD is still pretty new, so longevity is unknown, but mfg's claim they will last. 64 bit will help a lot, but then you lose some compatibility with 32 bit models and other 32 bit data. 32 bit windows limits your ram to about 3.4 Gb.
As far as setting up a Ram drive, the cheapest way to experiment is with a simple USB memory stick. You could put various parts of the data or program on multiple USB's to see if it works. Example: put the mesh on one usb. Could even try RAID on usb. Some usb sticks are faster than others. Could use Damm Small Linux (DSL) Linux on the usb also, and use a Linux FEM for testing, would save using a Windows install.
There are some youtube sata vs SSD speed comparisons with 10,000 rpm scsi and sata drives that you can watch and see how much faster SSD is.

Thanks you for your reply Andy

Thanks you for your reply, some of your suggestions are good performance boost options for anyone running models to large to fit into ram. In my situations that is not the case, the models do fit into ram, very little if any hard drive activity during the CFX run other than when it is time for a result file to be writing. The result file write time is nothing compared to the calculation time during the iteration loops. From observation and experimentation I can pretty much tell you the 3 major factors that impact calculation time ...... during any one run one of these factors is the bottle neck. These apply only if the model fits into RAM.
1. Front Side Buss speed to the RAM.
2. Processor Speed, mostly clock rate, multi core helps until you hit a wall FSB BWL see number 1
3. Last but far from least, Processor cache, cache, cache can make a large improvement, the more levels and the larger the better.
If the model does not fit into ram then yes the next best thing is a Solid-state hard drive. A very simple and effective way to implement this is to set the SSD as a second drive, put the windows page file on it. This in effect is making RAM larger using the SSD. I am not sure of any other effective way to grow your ram size, but I did see this
http://www.reviewsaurus.com/tips-tri...-drive-as-ram/
Guess the USB stick would work but any USB 2.0 device hits the wall at 60 MB/s and best sticks out I have seen run 20 to 30 mb/s, don't think a USB flash drive stick would last very long under constant read write operations. Ram drives are cool fast things but if system memory is low they only can make things worse. From what I can tell Ansys CXF 11 solver running in a windows XP or Vista 32 bit or 64 bit environment does a pretty good job of making the best use out of RAM and the page file when the model is too large to fit in ram, going only to the page file as a last resort.
Thanks for the reply, it is always good to learn more but I am looking for something much different. All these tweaks and work abounds are Band-Aids compared to what I envision is possible.

Hello
There is only so much you can do with a personal computer. But a substantial amount can be done with a PC. If you go back in time to about 1995-96, what was being done on smaller less powerful supercomputers is now within reach of a personal computer(s).
Some seemingly mundane simulations are being done with 9,000 Opterons or more.

You might try loading the Net Framework 4 redistributable and the Visual C++ 2010 runtime. It helps with allowing more multicore use. Faster also.

You might consider adding another computer and making a mini grid, or mini parallel setup. You would only need an ethernet cable and another computer, and some open source software.
There may be no solution to your problem, until better hardware comes along. It is possible to design and build a motherboard.
Your model may exceed your hardware capability. Could easily exceed what is within reason to buy for a small corporation or small college, or even a larger instituion.

good luck

Hello JRL,

the magic words you are looking for are CUDA and OpenCL - both programming inerfaces to run calculations on graphics cards. There are some approaches which are successful (see e.g the NVIDIA CUDA conference which recently took place) but AFAIK for general problems you do not get very good acceleration because of the I/O-bottleneck.

For OpenFOAM (an open source CFD code) you find a discussion about that tpic here:
http://www.cfd-online.com/Forums/ope...oam-gpgpu.html

The whole GPU computing is still in development.

Regards, Markus.

You might want to take a look at
http://www.nics.tennessee.edu/ . This leads to a page leading into info about one of the largest parallel computer systems in the world, housed at the DOE's Oak Ridge National Laboratory.

They use multi-core chips (Intel or AMD?), many many of them. I don't know how these things are hooked up (in fact my knowledge of parallel processing is almost nil). But the environment is relatively open, so you may be able to find someone interested in discussing your problem, especially at the University of Tennessee.

Good luck!

It's not that easy to build a processor especially for CFD, like a stream processor of a graphics card.
Just imagine how complex a CFD calculation could be: Single phase or multi-phase? Inviscid, laminar or turbulent? Which turbulence model, k-e, k-w, RST, S-A, LES / DES? compressible / incompressible flow? When it's compresisble is it with a user-defined density polynom, ideal gas, real gas? Passive scalars? Boiling / melting / solidifaction / De-Icing / De-Fogging? Solid Stress analysis? Coupling with FEM-codes or 1D-codes? Moving meshes or morphed meshes? DFBI-calculations? isothermal calculation or heat transfer or combustion models? electric fields, liquid films, chemical reactions....

There's a nearly infinite list what you can do with a CFD code, and I assume, that's the reason why today only the usual multi-purpose CPU's are used.

The most common way to speed up a simulation today is to run it in parallel. But even that has its limitations. Memory bandwith, comunication latency, cell count, core count... All of that will have an impact on the time.

I assume, your client's case is not very big when it runs on a 32bit system, so the physics could be the reason why it takes that much time - not the limited hardware.

And to be honest, only needing some days for a simulation on a simple core2quad is not too bad, I've got a case which runs on our cluster since 2 weeks on 32 cores. CFD is not the same as solving a simple equation on a calculator.

I had put down that ANSYS 13 supported GPUs for performing calculations in CFX and Fluent, but it appears that support for this is only the mechanical/structural analysis part of ANSYS workbench. I guess it won't be too long before they incorporate this into CFX and Fluent given the performance gains that can be had.