I create small program in fro tan 90. This program just add two vectors, such that
A = B + C
The size of A, B, and C are 1,000,000. I run this program on IBM winterhawk II with 16 CPU on OMP paradigm. However I found it to be unscalable.
(The Speed of 16 CPU)/ (The Speed of 1 CPU) ~ 6
So, any idea why I do not get good scalability. Ideally addition of the two vectors should be 100% scalable.
Do you hav OMP_SET_DYNAMIC set to true or false? Are you actually running on 16 processors all of the time or is the number varying as processors become available.
What is the speedup for 2, 4, and 8 processors? Is this linear? Parallel speedup = (time for 1 processor)/(time for n processors)
What is the efficiency of the code? Parallel efficiency = (speedup)/(number of processors)
(Definitions from Edinburgh Parallel computing centre)
Have you varied the size of the problem and how does this scale on different numbers of processors?
|All times are GMT -4. The time now is 05:35.|