DSP Processors
Im new to CFD, and my question is related to DSp processors. Since such processors are optimized for mathematical computations at a blinding speed, and are particularly suited to the "multiply and accumulate to sum" operation that is so common in solving matrix or linear diff. equations, is it feasible to use a DSP processor in CFD calculations, or am i underestimating the computing power required for CFD ?

Re: DSP Processors
(1). If you are sure of the results of computation, then the speed of computer is not an issue. Sooner of later, you will get the result. (2). But the major issues in CFD are: the algorithm development in numerical analysis to get accurate and converged solution,and the turbulence modeling to get the correct result. (3). Without the proper handling of these issues, you will get high speed garbage production with high speed processor. (4). High speed processor could be helpful, but only if you know that the solution will converge to the correst answer. (5). And in most cases, we simply do not know how to solve the problem and obtain the correct solution.

Re: DSP Processors
According to John's response, we might as well apply direct numerical simulation (DNS) to all CFD problems. Since it does not require turbulence modeling (all scales are resolved), we can be pretty sure about the answer. Only problem: at current processor speeds, the 3D flow over an aircraft will take well over 1000 years to compute using DNS. I don't know about you guys, but I won't be around to analyze the results.
I think in CFD a faster processor is always welcome, because you have to get results first, analyze them, and then decide whether or not they are 'garbage'. If we knew the answer a priori, then why bother with the computation? Perhaps someone can enlighten us with some data on DSP's. How fast do they run (GHz), and how many cycles do they use for a particular floating point operation (not including the time it takes to read and write to RAM, and excluding main processor tasks such as indexing the numeric array)? Also, what kind of FPU operations can they handle (+  * / sqr)? Here some data for the Pentium processor (any generation): ADD 3 cycles SUB 3 cycles MUL 3 cycles DIV 39 cycles SQR 70 cycles (A '1GHz' Pentium processor runs 10^9 cycles per second. The SQR ('square root') operation is needed in compressible flow to get the local speed of sound.) 
Re: DSP Processors
(1). One thing at a time. (2). When I started using the PC, it was one MHZ in the year 1981. For a job which takes 1000 years to finish, it will be completed in 2981. (3). In the year 1991, I was using a PC running at 33 MHZ. The actual speed ratio would be 2x33=66, taking into account the 8 vs 16 bit processor. A 1000 year job, restarted in 1991 would take around 15 years to complete, that is the year 2006. (4). Last month, I started using a $400 PC running at one GHZ. With the speed ratio of 1000 between a 32 bit cpu and 8 bit cpu, the actual speed ratio will be at least 4000:1. If the 1000 year job is restarted now, it will take 0.25 year, and the job will be completed in 2001. (5). And if one takes time to put together a parallel computing network, a 1000 year job will become a one month job, which is quite acceptable by today's standard. (5). I am not sure that DNS is the right approach without the right numerical algorithm, because after equivalent 1000 years of algebraic operations on the computer, it is quite possible that the garbage portion will be the main portion of the result. Sorting out the useful portion of the result can be very difficult, unless one is sure of the method used. (6). In stone age, today's airplane would be viewed as God's flying horse.

Re: DSP Processors
Are there any benchmarks on how fast the DSP processors can accomplish standard tasks like matrixvector multiply, matrixmatrix multiply, FFTs etc. Are these processors good in terms of vectorization, especially for long vectors.
Also, how much data do these DSP processors deal with. Do they have cache. My impression (I could be completely wrong) is that the data in DSP is not as much as in CFD. 
Re: DSP Processors
Another item in this general area of discussion.
About a year ago there was an item in the newspaper that some of the new common video GAME chips could process video/graphics at the same speed as the state of art US military chips. 
Re: DSP Processors
John,
Although your example is illustrative, you are somewhat comparing apples and oranges when you talk about the relative speed of 16bit versus 32bit processors and then apply that rationale to floating point operations. You are referring to the main processor, or CPU, whereas I was strictly referring to cycles of the math coprocessor or FPU (floating point unit). The first Intel math coprocessor was the 8087 which came out in 1980. Subsequent FPU's were the 80287 and 80387. The 486 and Pentium processors have the FPU built in. All Intel FPU's, starting with the 8087 in 1980 and continuing to today's Pentium, are 80bit processors. Each FPU can hold 8 floating point numbers at 80bit precision. When these numbers are stored in RAM, only 64bit are generally retained (double precision format), unless you program the FPU in assembly (like myself). The 16bit or 32bit registers of the main processor (CPU) only affect the efficiency of memory addressing. A 32bit CPU can address all the RAM I need (4 GB) at today's processor speeds. Axel 
Re: DSP Processors
(1). Inside the FPU, yes, it has more bits to do higher precision math operations. (2). But outside the FPU, it is a different story. That's why, there are 16bit compiler and 32bit compiler. (3). And sometimes when you are doing integer operations using screen pixels, you don't want to slow it down by using floating point operations. In that case, high precision FPU actually slow down the operations. (4).Anyway, memory access is normally slow. (4). The point I was trying to make is: last week when I convert the code I wrote ten years ago, from MS Fortran to VC++, the lid driven cavity flow, RE=1000, size 51x51, took only 6 seconds to converge.(on a one GHZ P/III, while it was several minutes on a 286PC, speed?) The same problem was tested here using commercial codes and the cpu time is in the range 150~250 seconds. NPARK code has a similar test case with a size of 128x128(?) and the reported time was several hours(?). It is possible that the code is compressible flow code (slow for low Mach number flow) and they just set the long time to make sure that the solution is converged solution. (5). So, given the right formulation, and time (to get the higher speed cpu), it seems to me that we can get a 1000 year job solved shorter than 1000 years. This is because for their survival, cpu companies will have to invent faster cpu all the time. And cfd users can just sit and relax and take advantage of the faster cpu speed. (6). So, more people and companies will use CFD in the future. The key to the success is the algorithm development and the turbulence modeling.

Re: DSP Processors
(1). Hardware graphic processing has been used in SGI machine since 80's. (2). The same kind of hardware acceleration board is becoming available for PC in recent years, to support OPenGL and DirectX API. This includes matrix transformation (translation, rotation, scaling,etc), texture mapping, rendering.

Re: DSP Processors
I think i can say something about DSP processors that will put them in perspective with Pentiums. I am definitely not an expert ,but here goes...
1. See, a pentium is not a processor optimized for math calculations alone, but also other things such as wordprocessing. A DSP on the other hand, has been made optimum to handle math such as DFT/FFT,convolution, vector algebra etc. 2. someone mentioned about the amount of data that is handled by a DSP, but i think it is not so much about the amount of data, but the nature of data. eg: DSP processors used in a mobile phone, have to process huge audio *real time* ,by filtering, correlation,noise reduction and so on, in a short time so the speaker doesnt notice it. So there is no beating a DSP when it comes to speed. 3. Now, looking at the operation cycles and clock speed, it is not appropriate to compare DSP and pentium clock speeds, inside a pentium the 1 GHz is divided several times before reaching the actual core, so that the actual frequency is much lower. A DSP on the other hand, may just run at say 50MHz, but the frequency inside is actually a multiple of that. Besides the level of pipelining and parallelism in side a DSP processor, is several times a Pentium, and so this enables *single clock cycle* math calculations. 4. as an example, look at an FFT calculation for which it is optimized. A FFT consists of a series sumation of n imaginary and n real terms, each contaning one multiplication. so we need some place to store these values, separate index pointers to access them, and so on. with the vast parallelism on offer, a DSP processor will provide a separate "multiply and accumulate" unit to calculate each multiplication and add the term to the existing series, simultaneously update all the pointers, and fetch the new data, all in a single clock cycle. 5. DSps are also built as floating point calculators with the same amount ,i not more ,of parallelism as fixed point DSPs. well thats my point of view,....comments,disagreements? 
All times are GMT 4. The time now is 03:29. 