Ram, cache and cpu upgrade help
Hi,
I'm writing my own NS code and I'm simulating 2D unsteady flow using about 200350 X 80160 grid points. I need about half a day or more to run my simulation, depending on the no. of grids used. I'm considering of upgrading my current AMD Athlon 2400+@2.2Ghz to the new intel core 2 duo. However, I've a couple of questions... 1. How can I estimate the amt of RAM usage? I wonder if 1G is enough, or do I need 2G? 2. I read in the forum ppl saying that L2 cache is important in matrix computation. Is that so? Because I'm thinking of getting a cheaper cpu with 2mb instead of 4mb L2 cache. 3. Will I be able to take advantage of the dual core cpu? I'm now using MPI to parallelized my code. 4. If I'm getting the E4300 or E6300, should I expect my run time to reduce by half or more? Thks! 
Re: Ram, cache and cpu upgrade help
 estimation of required memory is very simple: if we take 200 Byte per grid (i take appriximately upper bound) u need 350*160*200= 11 MB (low, in fact for 2d we usually don't need large memory), if your os take 200 MB, 500MB ram is sufficient.
In my opinion, upgrading system don't has considrable effect for you (your system is stateoftheart). AMD vs. Intel: note that L2 is essential but after L1, AMDs has usually larger L1 (128 K), bigger L2 help to cache coherency and decreasing cachemiss, but if u use cacheaware algorithm u can get better performance, for more related to cache read especiall issue if Int. J. High Per. Computing, fortunately is freeely online avalible: http://hpc.sagepub.com/content/vol18/issue1/ or read this: http://casper.cs.yale.edu/mgnet/www/...u/thesis.ps.gz >Will I be able to take advantage of the dual core cpu? >I'm now using MPI to parallelized my code yes with execution with at least 2Job, also i u run your code in parallel on your avalible CPU i guess that you give better performance, it is due to multithreading and reducing cache miss (try this) Finally i suggest u that for reducing CPU time concenterate on your code (improvement of linear solver or convertion of explict to implict), halfday is very large on about 40 K grid, especially if your grid be structured. Hope this Help. 
Re: Ram, cache and cpu upgrade help
Hi,
thanks for your comments rt. I don't think my sys is now state of the art and the thing now is that I need to run many different cases, hence the need to improve my computing speed. I need to determine how much improvement I can get from the money spent. Btw, I am running moving airfoil with deforming grid at around Re~10^4. If CFL=1, my time step is 0.8 to 1e4. Hence I need many time steps. My current linear solver is PETSc, which uses the krylov method. It should be one of the fastest linear solvers around. Implicit method is used. Lastly, I'm now running using the uniprocessor of Xeon 3.06Ghz on my school's server. So is it still considered slow? Thanks 
Re: Ram, cache and cpu upgrade help
If you've been using PETSC you could try to use an inexactNewton Krylov solver and try to enlarge the linear tolerance. I'm almost sure that PETSC has InexactNewton solvers. In inexactNewtontype methods you can use large linear tolerances (up to 0.99) and let the algorithm adapt the tolerance according to the solution convergence.
By the way, is your solver fullycoupled (velocity and pressure being solved in the same linear system)? In this case, if you're using GMRES, you could enlarge the number of Krylov vector (let's say 25, 35, 45). I hope it helps you in anything Regards Renato. 
Re: Ram, cache and cpu upgrade help
What is your type of grid? What is your solver? (what krylov method) Has it nonlinear iteration? (innerouter iteration)
>My current linear solver is PETSc, It should be one of :the fastest linear solvers around It is not correct, PETSc is general purpose solver and is suitable for large scale problem on massivliy parallel cluster not your very small problem on single cpu. not that your problem is large scale from time scale veiw not spatial, so parallelism (based on domain decomposition) don't help you because very well Especially if your grid is structured, selection of PETSc is poor, on structured grid it is possible to implement faster krylov or multigrid solver. 
Re: Ram, cache and cpu upgrade help
Is that so?
Well, I'm quite a novice with regard to which solver to use. Yes, I'm using structured grid (cgrid). I've thought of using a multigrid solver but I can't find a freely available and suitable one. I'm now using either biconjugate gradient stable or GMRES with preconditioner. In the past, I've used some other solver packages such as SPARSKIT and NSPCG but they are even slower than PETSc. In that case, do you know of any faster krylov or multigrid solver which I can use? I wouldn't want to go into learning and writing a multigrid sovler from scratch. Thanks rt! 
Re: Ram, cache and cpu upgrade help
The big question to ask when selecting hardware for CFD simulation is how much memory does the solver use for your real CFD problems and to ignore the large amount of marketing on the performance of small incache test cases. The reason for this can be seen in the linpack plots here:
http://techreport.com/reviews/2003q2...2/index.x?pg=3 where the large effect of overflowing the cache (or two caches) can be seen. Note that most real CFD problems will have typical matrix sizes off the plot to the right. Your 2D case may not. In addition, features on the motherboards can sometimes double or, in some cases, quadruple the performance of accessing main memory. Finding out about this and paying the extra for the motherboard can be wise. Note that clock speed has little effect on performance when operating with large matrices which is good news because low clock speed chips are usually much cheaper. It is possible to reorganise the way solvers work to try to limit the damage caused by overflowing the cache but this is not usually present using general purpose Fortran/C/C++ solvers but is often present in optimised lowlevel matrix libraries. 
Re: Ram, cache and cpu upgrade help
note that general purpose sparse solvers use general sparsity pattern (suitable for unstructured grid), but for structured grid sparsity is completely known, i guess that your stencil is five point, so location of neighbours are completely known, implementation of such solver with known regular sparsity pattern is very simple and is very efficient than others. In my experience with ICCG it was 1020 times faster than using SPARSEKIT.
u don't like implement it, so i guide u to use PERIC's code that are freely avalible from springer ftp, in that there is directory that contain solver (in FORTRAN 77) for structured grid (from GS, CIP, BCGTSTAB to MG and MG with CG precond). They are very easy to learn and use. Also FAS multigrid is usefull, implementation is very simple, e.g. solve problem completely on coarse then interpolate on finer (as inintial guess) and follow sequence. But, what is your system? (pressure poisson, velosity or coupled pressurevelosity, in block format), and do u use nonlinear iteration? also what is your solution method? 
Re: Ram, cache and cpu upgrade help
Thanks rt!
I am using the pressure correction which involves solving a momentum eqn followed by poisson eqn to ensure continuty. The eqns are linearized so it's basically a system of linear eqns. Btw, my stencil is 9 pt and that's why I can't use most of the available solver. Moreover, the diagonal arrangement is not valid at the cell location where the faces meet (cgrid intersection). In PETSc, the eqns are solved using BCGTSTAB or GMRES. Thank you 
Re: Ram, cache and cpu upgrade help
do you have incompressible flow?
your CFL is 1. and Re=10^4, i suggest to decrease CFL to .5 and use explicit treatment of momentum equations. and only implicit treatement of pressure eq. Also don't perform any nonlinear iterations. As usually pressure eq. is SPD, the congugate gradient with incomple cholesky preconditioner is the best choice. Also adapting 5point stencil solvers to 9point is simple. >>the diagonal arrangement is not valid what u mean from "diagonal arrangement" ? 
Re: Ram, cache and cpu upgrade help
regarding to: >>the diagonal arrangement is not valid
ok, this impose limitation for using structured solver, but i suggers a cure (i don't experience it) u can treate a row cells that fall below/over branch cut as dirichlet bc, and move them to rhs but update them every some iterations (e.g. every iteration), in this manner u solve sequence of linear equations to acheiving converged pressure field, but each of them is solved with low tollorance (in fact solution of weake nonlinear eq.) i think it is cheaper than using unstructured solver. 
Re: Ram, cache and cpu upgrade help
>In my experience with ICCG it was 1020 times faster than using SPARSEKIT
Can ICCG be used for solving Navier stokes problems. Using the linear system of equations are unsymmetric because of boundary conditions. I guess ICCG is used for symmetric systems only (am I correct?) I would like to know more details about this implementation. Is this applied for fully coupled velocity pressure system or for segregated approach (like SIMPLE etc.) 
Re: Ram, cache and cpu upgrade help
>>I guess ICCG is used for symmetric systems only (am I correct?) u r right,
my method was fractional step (sometimes called twostep projection): explict treatment of momentom and only solution of pressure poisson eq. implicitely for enforcing incompressibility. as pre. eq. is symmetric ICCG can be applied. 
Re: Ram, cache and cpu upgrade help
Hi rt,
Are you using explicit treatment for both convection as well as diffusion? I'm using implicit treatment for both. Maybe that's also one of the reasons why my solver seems slow to you. I tried explicit treatment for convection but I've to lower my CFL no. Moreover, it's not as stable. However, I'll try to look at the recommendation you have given. I believe as you said, a structured solver will be faster. 
Re: Ram, cache and cpu upgrade help
my diff and conve both are treated explicitely (your Renold is high so viscus term has negligible effect and is (probably) stable under stability of convection term)

All times are GMT 4. The time now is 03:29. 