How to partition cartesian grid for MPI in fortran
Hi,
I'm programming in fortran and I'm trying to use a parallel solver known as Hypre, which is a library written in C. I'm working with cartesian grid and I wonder how I should partition my grid to run in parallel. Which is the best option for a cartesian grid of size_x*size_y, asuming to be divided to 4 grids? : 1. 4 (size_x/4)*size_y grids 2. 4 size_x*(size_y/4) grids 3. 4 (size_x/2)*(size_y/2) grids Is there some clear advantage of 1 over the other? Thanks 
Re: How to partition cartesian grid for MPI in for
If you are of an experimental bent of mind, try all three partitions on your computer system and pick the one which has the smallest turnaround time! If rather you are of an analytical bent of mind, read on.
The general answer is: if it is possible to balance the load this way, then a onedimensional partition in the direction with the greatest cell count is best. To find out why, read on. A partition suitable for concurrent computing should be a good try at all of the following: (a) balance the load (equal division of computation volume), (b) minimize the total intersubdomain boundary surface (minimize total intersubdomain communication), (c) minimize the maximum shared boundary surface between any two subdomains, and (d) minimize the maximum (over the set of subdomains) number of neighbors. How many subdomains should you partition the domain into? (Since you did not ask this question, you could skip this para, but I felt the trends were worth retelling). The answer represents a tradeoff among the fiscal costs of computation and communication, and the time costs of the same two factors. If the fiscal costs of computation (processors) and communication (links and switches) are the same, then only the relative timings matter. Assume perfect concurrency of communication. In this case, one should partition into a number of subdomains such that the average [volume to surface] ratio of the subdomains is proportional to the average ratio of [communication wallclocktime between two cells on two different processors to the computational wallclocktime for a cell before information has to be exchanged]. Basically, you want as many subdomains as will make the maximum communication time between two subdomains roughly equal to the maximum computational time in a subdomain before it needs to exchange data again. A finer partition would decrease the volume to surface ratio and lead to diminishing returns from the extra processors which could remain unpurchased or be put to other tasks. I will assume fiscal costs for computation and communication are comparable in the following. In special cases, you may find a single partition that meets all four criteria, but generally there is conflict among these criteria, and then tradeoffs must be made. The tradeoff between criterion (a) and criteria (b), (c), and (d) depends on the time cost of computation in a cell (tn) versus that of communication between two cells on different processors (te). If te .gt. tn, then one should emphasize criteria (b), (c), (d) over criterion (a), and viceversa. The tradeoffs among criteria (b), (c) and (d) would depend on the degree of concurrency of the communication among subdomains. This is dependent on the topology of the network linking your cores/processors. If there is zero concurrency (all communication must travel through a bottleneck master node), criterion (b) would be most important. If there is perfect concurrency at the subdomain level, criteria (c) and (d) would be more important than (b). If there is perfect concurrency at the message level, criterion (d) would be less important than (c). All three of your candidate partitions satisfy criterion (a) exactly, so there is no distinction among them there. Let nx and ny be the numbers of cells in the x and y directions of your domain mesh. Let us assume periodic boundary conditions in both x and y directions (the conclusions would be somewhat different if the bcs are local, since you are partitioning into such a small number of subdomains, the domain boundaries matter as much as the interior). Then the maximum number of neighbors of among subdomains is the same (two) for all three partitions, and hence criterion (d) does not distinguish among the three, either. For partition 1, the total intersubdomain boundary is 4*ny, while the maximum boundary of any single subdomain is 2*ny. By symmetry, for partition 2, the total is 4*nx, while the max single is 2*nx. For partition 3, the total is 2*nx+2*ny, while the max single is nx+ny. If nx=ny, it is clear that all three partitions have the same total boundary of 4*nx, and the same max single boundary of 2*nx, so there is no distinction among them. If nx .gt. ny, clearly partition 1 has both the least total boundary and the least max single boundary. If nx .lt. ny, it is partition 2 that has the advantage. Thus, your best choices are partition 1 if nx .gt. ny and partition 2 if nx .lt. ny. This can be stated more generally as: if it is possible to balance the load this way, then a onedimensional partition in the direction with the greatest cell count is best. 
Re: How to partition cartesian grid for MPI in for
Thank you Ananda for the detailed explanation. My domain is usually where nx>ny and hence I believe partition 1 is the best.

Re: How to partition cartesian grid for MPI in for
You are welcome. I forgot to mention that satisfying criterion (d) should generally also allow you to minimize the number of stages of communication needed. In your particular case, all three partitions satisfy the criterion. For example, for partition 1, with contiguous subdomains numbered 1 through 4, the first stage of communication may involve subdomains 1 and 2 talking with each other, at the same time as subs 3 and 4 talk with each other. In the second (and last) stage, subs 1 and 4 talk with each other simultaneously with subs 2 and 3 talking with each other. For more complex partitions in 3D, with more than two subdomain neighbors, you may use a graphcoloring algorithm to minimize the number of needed noninterfering communication stages.

All times are GMT 4. The time now is 22:13. 