Unsteady parallel scalability definition
I have a question.
In CFD unsteady simulation, how to define/calculate/measure the speedup?
Since it each time step the matrix needs to be solved is different, so is the iteration number. Then I mean, how did you set the "end time" and compare the clock time and calculate the speedup?
And in the meantime, if I got a bad speedup, isn't that a big possibility that the matrix solving scheme is not efficient enough. If then, when a person define the speedup, shouldn't he also provide the numerical schemes also? But how could we have a perfect scheme that will not affect the speedup measuring.
In case of transient solution you can compare computing time for achieve a fixed "real" time in simulated process
And another question,
Isn't there a possibility that the bad speedup in case one see, might be actually due to the bad matrix solution algorithm?
For example, if one uses bad multigrid method, and got a very bad sppedup, then his report of speedup is totally misleading.
If I was right, then, how could we find a "perfect" enough algorithm, that it will not affect the test of scalability?
Thanks, btw, love your signature!!
At first we must agree on what type of scalability we're talking about
One usually considers two types of scalability
1) The acceleration in solving the problem of fixed size for different number of cores
2) The solution of problems in several sizes for a fixed number of cores
Next, we must agree on what kind of scalability can be considered "bad" because as you know - everything is relative.
Usually the cause of poor scalability may be the presence of a significant piece of the algorithm can not be parallelized. As an example, see my blog on the comparison of the use of blocking and nonblocking MPI calls
If the algorithm does not contain any such places in an explicit form, the inefficiency may be associated with a significant number of barriers, synchronization and / or critical sections.
In the latter case can help to collect statistics on what part of the algorithm are called most frequently (in several data sets)
Thanks for your help.
So the real scalability test result is closely related to two major aspects, one is hardware side, another is software side.
Question concerning hardware side,
Is there a way, or what is the best way of finding where is the bottleneck?
For example, if I have a cluster, 2.5G cpu, 2G/core RAM, infiniband QDR, blahblah. Given a certain CFD software and case at hand, how could I know, how could I test, where is the bottleneck, so that I could choose another cluster?
Question concerning the software design.
If hardware is not a problem all the time, then how could I test that which part of the code can be further optimized for parallel efficiency?
I am not referring to which part takes the longest time, I just want to know which part can be ideally improved to a better efficiency. E.g., I know solving p equation takes the most of time, but we have no better choice, right, it is always so. I am also doing FSI simulations, I would hope that part of code that I am writing wont be too bad to slow down the whole simulation.
PS: Sorry, I am not from computer science, nor from pure programming, I am from other engineering field, I just want to make sure I dont make huge mistakes when choosing a cluster or writing a code. :)
Personally, I like valgrind (see attached screenshot for one of my code)
If you want to analyze and debug parallel code is then that there are special tools, such as TotalView (in the case of the MPI code) and ThreadSpotter
|All times are GMT -4. The time now is 16:13.|