How do people even make use of super computers for CFD?
Admittedly, I'm a bit of a novice when it comes to parallel computing, but from what I've seen so far, anything more than 4 cores has essentially no benefit. When I first started, I was really excited about the possibility of using Amazon's EC2, but now that seems completely useless. Is that right?
|
There is a huge difference in architecture between a cloud system and a supercomputer. When you talk about parallel scalability to many processors, the most important thing I have seen in running CFD in parallel is the speed of the interconnect between nodes and then, of course, the speed of the cores themselves. If your interconnect is 1GB/s (i.e gigabit ethernet) you won't see much improvement above some tens of processors. New supercomputers typically have QDR Infiniband interconnect with speeds of 10GB/s.
While I have a bit of experience running OpenFOAM on clusters and supercomputers on up to a few thousand processors, I am not so familiar with trying to do it on a cloud system. Apparently, Amazon does have custom HPC-type clouds with 10-GB interconnect. They claim this can match more standard HPC system performance. Their 'cloud' may simply be a normal cluster in itself and if so I am not sure what the advantage of EC2 would be other than on-demand access. Again, I know little about these systems as my original assumption was precisely your final conclusion--they are relatively useless for large-scale CFD. Perhaps someone who knows more can chime in if I am wrong. |
The Amazon product that I was looking at is the HPC EC2. It supposedly offers a 10 Gigabyte connection, so maybe it actually would be fast enough.
I was a bit pessimistic because parallel computing on multicore processors seemed to reach diminishing returns very quickly (nearly 0 benefit to go from 2 to 4 processors for the geometries I've tried). I couldn't imagine a supercomputer having better connection speeds between its processors than a multicore chip, but I am pretty ignorant on much of this. |
Well, but you also have to consider the problem size. You are going to see a max speedup around some number of meshpoints/processor. On QDR Infiniband systems for the type of problems I do (interFoam based) this is typically around 5K-10K polyhedral cells/processor. How large are the problems you have tried?
|
The cases I've been running are at around 100,000 tetrahedral cells. Going from 1 to 2 processors yields around a 40% increase in performance, and going from 2 to 4 yields an additional 10% at most. I don't suppose polyhedral meshes have better parallel performance, do they? I suppose it's possible since each polyhedral cell has more neighbors than each tet cell, and thus adds CPU calculation without adding more communication.
|
I would say the most crucial thing which affects the parallel efficiency is the CFD algorithm itself. Hardware issues are, to me, secondary. The current CFD algorithms, most of them, are good for serial processing. But they are not ideal for parallel processing. if one can use specialized algorithms on parallel machines then one can get near idea parallel scalability even on thousands of processors. The CFD is yet to mature for highly parallel hardware.
As an example consider this. If you are doing matrix inversion process and the domain is spread over many processors then for most of the conventional algorithms like gauss elimination, we require the whole matrix on single processor. It means all the components of the matrix are needed to be transferred back and forth between master and slave nodes all the time. And as we all know this is the bottle neck for speed. Instead there are methods which simply eliminate this data transfer and do matrix inversion locally on each processor, independent of other processor (read very less dependency). Thus they give high parallel efficiency. Only power of thousands of processors isnt enough. One need to know how to use it. |
You might find this interesting. I did this a few weeks ago and as you can see there is alot you can gain
when you increase the number of cpu's. Code:
# Scaling test on the KTH/PDC cluster. |
Quote:
|
Quote:
you can not invert a matrix locally without communicating with other processors. The only case where you can do it is where matrix is block diagonal and each block lies within a processor. |
Quote:
|
Quote:
I had a very very long startup time (A hour) when I am trying to use a thousand cpus. Any ideas? |
On which architecture?
One thing that I've noticed is that on the cray, if you have CRAY_ROOTFS=DSL you will get that behaviour. |
Quote:
I am wondering if other achetecture has the same env to set up. But anyway, here is the architecture I am using. Any suggestions? |
OK, I see that its not a cray, so its not that.
are you using the system mpi or are you compiling openmpi yourself? If you are using the thirdparty option to compile openmpi yourself, it is absolutely crucial that you add the --with-openib flag to $configOpts in the Allwmake script in the thirdparty folder. It is also important that when you compile it, you make sure that the hardware is the same as the cluster hardware and that the infiniband-libs are available. Sometimes the login/submit-node can differ in this respect, in which case you need to submit the compilation as a job. and last, if you are using the SYSTEMOPENMPI, you need to make sure that you have the library path's to the infiniband-libs in the LD_LIBRARY_PATH, otherwise it will fallback to using something else. you need to find where these libs are located and go into the config/settings.sh and add these under the SYSTEMOPENMPI option _foamAddLib /directoryToWhereInfinibandIsLocated _foamAddLib /directoryToSomethingThatIBMightNeed and maybe also this _foamAddPath /directoryToOPENMPIBIN |
Quote:
Thanks a lot, I will talk to the system manager to double check the openib issue (you know what, I am always worrying this issue, especially I am afraid that different computing nodes would use difference settings, this is a little bit tricky.) Anyway, I will try and keep you posted. And in the meanwhile, would you mind to test my cases, see what happens in your cluster? Your email so that I can send you the download address? Thanks |
sure,
its niklas dot nordin @ nequam dot se |
Quote:
Quote:
My point was, if one can judiciously modify the algorithm to make is parallel processor friendly one can get very good scaling without compromising on quality of results. Just increasing number of processors is not very bright idea.;) |
Quote:
Quote:
You made some assumption and seems to be working for your special case but it would not make me know how to use the power of thousands of processors for what i am doing. You can still not invert matrix locally without communicating and you can not still invert matrix by ignoring few off diagonals and doing less communications. If it were true we would have developed lots of methods around it. What you are assuming is that you are the only smarty pants and all the others are mindless stupids. There is a reason we do things the way we do. And the reason is that people have found out that it is really not possible to just ignore few things here and there and make things work. |
Quote:
Just my last post here. I was talking about the algorithm which is developed by NASA and used extensively by them for their hypersonic flight designs, extra-terrestrial probes, reactive flows etc. So I am not considering myself "smarty pants" you see, neither did I say that others here are "mindless stupids". I am merely telling my observations/opinions. BTW something which is applicable for whole of supersonic and hypersonic regime is not that special case, now is it?? There are algorithms which are more "parallel friendly" than others. E.g. Krylov subspace solvers. The computational physics guys have been using them since years. They don't invert matrix at all. I have done some literature survey out of interest, and then derived the conclusion. You can choose to ignore my opinions if you feel I am wrong. I did not enforced anyone to accept my views. I stand by my view, you stand by yours. But do it politely. |
All times are GMT -4. The time now is 20:05. |