README.MPI.halo3d

Communication Motif: Halo3D

Description:

Nearest neighbor communications are *very* common in scalable DOE
applications. In this pattern, each MPI rank communications with ranks
that are adjacent to it in each Cartesian dimension. The "halo"
exchanged is the data on each face. The Halo3D pattern included in Ember
is the simplest representation of this communication approach and
represents codes which are typically structured (i.e. have well defined
problem dimensions and that are regular).

In most DOE implementations (although not all) of Halo3D, an
MPI_Allreduce operation is executed every n iterations (in some cases
n=1) which executes either a sum, min or max over the global problem
domain. This is *not* included in the Ember implementation so we have
broad applicability.


Parameters for the Halo3D Motif:

mpirun ./halo3d \
	-nx <Local Domain Size in X-Dimension> \
	-ny <Local Domain Size in Y-Dimension> \
	-nz <Local Domain Size in Z-Dimension> \
	-pex <Processors in X-Dimension> \
	-pey <Processors in Y-Dimension> \
	-pez <Processors in Z-Dimension>
	-iterations <Number of Iterations to Execute, default is 1> \
	-vars <Number of variables in each grid cell> \
	-sleep <Number of nanoseconds to sleep/compute for>


Example: 256 rank run with a local (per rank) data grid of 20x20x20

mpirun -n 256 ./halo3d \
	-nx 20 \
	-ny 20 \
	-nz 20 \
	-pex 8 \
	-pey 8 \
	-pez 4 \
	-iterations 100 \
	-vars 8 \
	-sleep 2000

Output:

Example:

#                 Time KBytesXchng/Rank-Max            MB/S/Rank
              0.013865             150.0000           10818.6126

When run the motif will complete reporting the time taken, the number of
KB send/received by a rank in the middle of the processor grid (ranks
around the edge will have lower communication volume because on some
faces they have no neighbors). A benchmarked bandwidth is reported for
the rank in the middle of the processor grid.