CFD Online Discussion Forums - How to partition cartesian grid for MPI in fortran

If you are of an experimental bent of mind, try all three partitions on your computer system and pick the one which has the smallest turn-around time! If rather you are of an analytical bent of mind, read on.

The general answer is: if it is possible to balance the load this way, then a one-dimensional partition in the direction with the greatest cell count is best. To find out why, read on.

A partition suitable for concurrent computing should be a good try at all of the following: (a) balance the load (equal division of computation volume), (b) minimize the total inter-subdomain boundary surface (minimize total inter-subdomain communication), (c) minimize the maximum shared boundary surface between any two subdomains, and (d) minimize the maximum (over the set of subdomains) number of neighbors.

How many subdomains should you partition the domain into? (Since you did not ask this question, you could skip this para, but I felt the trends were worth retelling). The answer represents a trade-off among the fiscal costs of computation and communication, and the time costs of the same two factors. If the fiscal costs of computation (processors) and communication (links and switches) are the same, then only the relative timings matter. Assume perfect concurrency of communication. In this case, one should partition into a number of subdomains such that the average [volume to surface] ratio of the subdomains is proportional to the average ratio of [communication wallclocktime between two cells on two different processors to the computational wallclocktime for a cell before information has to be exchanged]. Basically, you want as many subdomains as will make the maximum communication time between two subdomains roughly equal to the maximum computational time in a subdomain before it needs to exchange data again. A finer partition would decrease the volume to surface ratio and lead to diminishing returns from the extra processors which could remain unpurchased or be put to other tasks. I will assume fiscal costs for computation and communication are comparable in the following.

In special cases, you may find a single partition that meets all four criteria, but generally there is conflict among these criteria, and then trade-offs must be made.

The trade-off between criterion (a) and criteria (b), (c), and (d) depends on the time cost of computation in a cell (tn) versus that of communication between two cells on different processors (te). If te .gt. tn, then one should emphasize criteria (b), (c), (d) over criterion (a), and vice-versa.

The trade-offs among criteria (b), (c) and (d) would depend on the degree of concurrency of the communication among subdomains. This is dependent on the topology of the network linking your cores/processors. If there is zero concurrency (all communication must travel through a bottleneck master node), criterion (b) would be most important. If there is perfect concurrency at the subdomain level, criteria (c) and (d) would be more important than (b). If there is perfect concurrency at the message level, criterion (d) would be less important than (c).

All three of your candidate partitions satisfy criterion (a) exactly, so there is no distinction among them there. Let nx and ny be the numbers of cells in the x and y directions of your domain mesh. Let us assume periodic boundary conditions in both x and y directions (the conclusions would be somewhat different if the bcs are local, since you are partitioning into such a small number of subdomains, the domain boundaries matter as much as the interior). Then the maximum number of neighbors of among subdomains is the same (two) for all three partitions, and hence criterion (d) does not distinguish among the three, either.

For partition 1, the total inter-subdomain boundary is 4*ny, while the maximum boundary of any single subdomain is 2*ny. By symmetry, for partition 2, the total is 4*nx, while the max single is 2*nx. For partition 3, the total is 2*nx+2*ny, while the max single is nx+ny.

If nx=ny, it is clear that all three partitions have the same total boundary of 4*nx, and the same max single boundary of 2*nx, so there is no distinction among them. If nx .gt. ny, clearly partition 1 has both the least total boundary and the least max single boundary. If nx .lt. ny, it is partition 2 that has the advantage. Thus, your best choices are partition 1 if nx .gt. ny and partition 2 if nx .lt. ny. This can be stated more generally as: if it is possible to balance the load this way, then a one-dimensional partition in the direction with the greatest cell count is best.