-
Notifications
You must be signed in to change notification settings - Fork 8
/
README.MPI.halo3d
60 lines (45 loc) · 1.88 KB
/
README.MPI.halo3d
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
Communication Motif: Halo3D
Description:
Nearest neighbor communications are *very* common in scalable DOE
applications. In this pattern, each MPI rank communications with ranks
that are adjacent to it in each Cartesian dimension. The "halo"
exchanged is the data on each face. The Halo3D pattern included in Ember
is the simplest representation of this communication approach and
represents codes which are typically structured (i.e. have well defined
problem dimensions and that are regular).
In most DOE implementations (although not all) of Halo3D, an
MPI_Allreduce operation is executed every n iterations (in some cases
n=1) which executes either a sum, min or max over the global problem
domain. This is *not* included in the Ember implementation so we have
broad applicability.
Parameters for the Halo3D Motif:
mpirun ./halo3d \
-nx <Local Domain Size in X-Dimension> \
-ny <Local Domain Size in Y-Dimension> \
-nz <Local Domain Size in Z-Dimension> \
-pex <Processors in X-Dimension> \
-pey <Processors in Y-Dimension> \
-pez <Processors in Z-Dimension>
-iterations <Number of Iterations to Execute, default is 1> \
-vars <Number of variables in each grid cell> \
-sleep <Number of nanoseconds to sleep/compute for>
Example: 256 rank run with a local (per rank) data grid of 20x20x20
mpirun -n 256 ./halo3d \
-nx 20 \
-ny 20 \
-nz 20 \
-pex 8 \
-pey 8 \
-pez 4 \
-iterations 100 \
-vars 8 \
-sleep 2000
Output:
Example:
# Time KBytesXchng/Rank-Max MB/S/Rank
0.013865 150.0000 10818.6126
When run the motif will complete reporting the time taken, the number of
KB send/received by a rank in the middle of the processor grid (ranks
around the edge will have lower communication volume because on some
faces they have no neighbors). A benchmarked bandwidth is reported for
the rank in the middle of the processor grid.