forked from ssrg-vt/popcorn-compiler
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathAPPLICATIONS
324 lines (254 loc) · 15 KB
/
APPLICATIONS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
---------------------
Building applications
---------------------
The toolchain will build multi-ISA binaries suitable from running and migrating
across heterogeneous systems running Popcorn's OS (please see "INSTALL" for
installing the compiler toolchain). The toolchain builds applications using
the following procedure:
1. Compile each source code file (.c) into an LLVM bitcode file (.bc)
2. Run optimizations over the LLVM bitcode, and run several custom passes that
adjust linkage and request stack transformation metadata be added
3. Run each bitcode file through the LLVM backend for every architecture in the
system (e.g., AArch64 & x86) to generate an object file (.o) per ISA
4. Link each set of per-ISA object files together using modified gold linker to
generate an unaligned binary per ISA
5. Run the alignment tool, which iteratively aligns sections and re-links the
binary to produce a per-ISA linker script that enforces a common layout
6. Do a final linking (using the generated linker scripts) and post-process
binaries to generate stack transformation metadata
This process is complicated, so a template makefile is available in
"util/Makefile.template". To use this makefile, create a folder for your
application and copy all of your application's source code files into the
folder. Additionally, copy the template into the folder and rename it to
"Makefile" (without the quotes). There are several fields at the top of the
file that need to be set:
POPCORN : the installation folder for the compiler toolchain, which is
/usr/local/popcorn by default.
ARM64_LIBGCC : directory of libgcc & libgcc_eh for aarch64 compiler (set to
Ubuntu-specific location by default)
X86_64_LIBGCC : directory of libgcc & libgcc_eh for x86-64 compiler (set to
Ubuntu-specific location by default)
BIN : name of the binary
SRC : all source code filenames
After setting these fields, the makefile will follow the process above to
compile the application for each architecture. The makefile will generate
several intermediate directories:
align : alignment tool work directory
build_aarch64 : contains aarch64 object files, vanilla (unaligned) binary
and linker map used for alignment
build_x86-64 : same as build_aarch64 but for x86-64
ir : optimized & modified (ISA-agnostic) LLVM bitcode files
The makefile will build a binary per-architecture with post-fixes of *_aarch64
and *_x86-64, and which are ready to run and be migrated (see "Running
applications" below). There are several standard recipes which can be used to
build the application in varying steps:
make / make all : build the application to completion
make vanilla : build the application, but do not align (vanilla
binaries are in build_<arch>)
make vanilla-<arch> : build vanilla binary for <arch>
make aligned : build & align the application (not ready to run, needs
final post-processing step)
make aligned-<arch> : build & align the application for <arch>
make post_process : post-process aligned binaries to add stack
transformation metadata, binaries are ready to run
make check : run several sanity checks over the multi-ISA binaries
to check for obvious problems*
make clean : torch all generated files
make stack-depth : build application against libstack-depth.a, which can
be used to generate call chain information for the
application
* make check is intended to be very verbose and as of right now will always end
in an error. See "ISSUES" for more information.
The next sections will go into more detail about each of the compilation steps
above, and in particular, how compilation is configured at each step in the
process.
1. Compiling to LLVM bitcode
This step uses standard compiler flags to generate LLVM bitcode from source
files. There is, however, special configuration needed:
-O0 : disable optimizations, which will be run in the
middle-end
-g -gdwarf-aranges : force clang/LLVM to emit ".debug_aranges" section
(needed by the stack transformation runtime)
-nostdinc : don't include glibc headers (musl-libc headers will be
used instead)
-finstrument-functions : add migration points to function entry and exit
-fno-vectorize : disable vectorization (not supported)
2. Optimize & customize LLVM bitcode
In addition to using aggressive optimizations, several passes are run which
modify the linkage of several variables and add stack transformation intrinsics
into the IR:
-disable-loop-vectorization
-disable-slp-vectorization : disable vectorization (not supported)
-associate-literal : generate symbol names for anonymous string
literals so they can be aligned
-section-static : place static global variables into their own
sections (similarly to -fdata-sections)
-stack-info : run live-value analysis & add stackmap intrinsics
into the LLVM IR
3. Compile LLVM bitcode into per-architecture object files
The toolchain compiles each of the LLVM bitcode files that comprise the
application into a per-ISA binary. There are a lot of special configuration
options needed to prepare the application for alignment:
-O0 : disable optimizations -- required because compiling
with optimizations enabled (even after running "opt
-O3") changes the bitcode, screwing up the analysis
and placement of stackmap intrinsics by -stack-info
-ffunction-sections
-fdata-sections : put each function & global data symbol into its own
ELF section in the object file (needed for
alignment)
-fno-common : put uninitialized global variables in data sections
rather than common blocks (needed for alignment)
-mllvm -optimize-regalloc : use an optimized register allocator -- required
because compiling with the unoptimized "fast" LLVM
register allocator prevents generation of some
types of metadata needed for stack transformation
4. Link per-architecture unaligned binaries
Use the modified gold linker to generate per-architecture binaries. These
binaries are unaligned, but will run homogeneously. The linker also prints
out a map file, which contains symbol sizes and alignments. This is used by
the alignment tool to find & place symbols that must be aligned.
5. Align binaries
The alignment tool uses a series of bash scripts to iteratively align and link
the binaries. The tool works out of the align/ directory -- the needed scripts
are copied into the directory. At the end of alignment, the tool will generate
a linker script per-architecture which lays out symbols common across both
binaries at identical virtual memory addresses, and moves architecture-specific
symbols into disjoint memory ranges to prevent collisions.
6. Final linking & post-processing
The gold linker is run one final time using the generated linker scripts to
build aligned binaries. The stack transformation metadata generation tool can
be run once the final function layout has been determined, as it uses offsets
from the beginnings of functions to calculate return addresses for individual
call sites. The metadata generation tool post-processes each generated binary
and adds several ELF sections containing stack transformation information.
At this point, a multi-ISA binary has been generated and is ready to run.
--------------------
Running applications
--------------------
Running multi-ISA binaries and migrating them between architectures requires
special handling. It is assumed that there is *not* a shared filesystem across
architectures in the platform, because the architecture-specific version of the
application is aliased to a single application name for the loader.
1. Putting applications in the correct location in the filesystem
The heterogeneous binary loader needs to be able to find the per-architecture
binary, and the stack transformation runtime needs to be able to access
architecture-specific rewriting metadata. Thus, the binaries need to be
placed in well-known places in the filesystem.
For example, consider running a multi-ISA binary named "app" out of the folder
"~/working_dir".
- Copy all generated binaries, i.e., app_aarch64 and app_x86-64, to
~/working_dir on every machine.
- Make a hardlink from the binary for the given architecture to the name of
the application without the architecture suffix. For example, on aarch64
make a hardlink from app_aarch64 to app, while on x86 make a hardlink from
app_x86-64 to app. This is so that the heterogeneous binary loader is able
to find the architecture-specific version of the binary using the same name
and filesystem location on each architecture. Additionally, the stack
transformation runtime searches for each architecture-specific version of the
metadata by appending the architecture name to the suffix of the executing
binary.
- For example, the filesystems should be set up as follows on each
architecture:
On aarch64:
~/working_dir:
app_aarch64
app_x86-64
app (hardlink from app_aarch64)
On x86-64:
app_aarch64
app_x86-64
app (hardlink from app_x86-64)
NOTE: you *must* run applications from the same working directory as where the
binaries are located, e.g., change to directories to ~/working_dir before
running the application.
2. Running applications in the namespace
Applications must be run from inside the heterogeneous container in order to be
migratable across architectures. Users must enter the heterogeneous container
using Linux's namespaces functionality, which we'll refer to as the "Popcorn
namespace".
NOTE: you *must* be root when entering the namespace!
- Change to the 'tool/namespace' directory
- Run the "launch_ns.sh" script:
$ cd tool/namespace
$ sudo su
$ ./launch_ns.sh
launching a new namespace -cimnpu
- This spawns a child in the namespace, which brings up a rudimentary shell
(note that it doesn't have environment variables, auto-complete, etc.). Run
the "popcorn.sh" script to bring up a fully-functioning shell:
$ ./popcorn.sh
switching to popcorn ns!
- This brings up bash and can be used like any other shell. You can now run
the application as normal.
3. Migrating applications
NOTE: this section assumes that you've entered the Popcorn Linux namespace and
have successfully started running an application.
There are a couple of ways to migrate applications. The first, discussed in the
lib/migration/INSTALL, involves the use of environment variables. See that
file for more information on how to use this method.
The second way of migrating applications involves the use of the proc
filesystem. Within /proc/ are two new files called /proc/popcorn_ps and
/proc/popcorn_ps1. These files are similar to the traditional `ps` program in
that they list all binaries running in the Popcorn namespace. Specifically:
/proc/popcorn_ps -- contains all applications running on ARM
/proc/popcorn_ps1 -- contains all applications running on x86
There are several things to note:
1. Examining /proc/popcorn_ps1 on ARM will show ALL processes running
on x86.
2. Applications do not show up on ARM (in /proc/popcorn_ps) until they have
successfully migrated at least once. At that point, they will appear
until completion on either architecture.
The contents of these files are formatted as follows. Note that the full
details of this formatting can be found by examining the "popcorn_ps_read"
function in the kernel source code (at kernel/popcorn/sched_server.c):
<Binary Name> <Global Info> <Thread 0 Info> <Thread 0 Load>;
- <Binary Name> is the name of the application
- <Global Info> consists of a number of fields, delimited by colons. We
care only about the 2nd field, which contains the "global" PID.
This PID is identical between both x86 and ARM after migration
occurs, allowing you to differentiate identically-named processes.
Before migration, this value is "-1"; after migration, it is the same
as the x86-specific PID.
- <Thread 0 Info> consists of a number of fields, also delimited by colons.
The fields of interest are the 1st, 2nd, and 3rd fields.
The 1st field is the architecture-specific PID. This is used for
migration (see below)!
The 2nd and 3rd fields are used to indicate where the application is
currently running:
- If both fields are 0, then the application has not migrated yet.
- If the 2nd field is 1 and the 3rd field is 0, then the
application is running on the remote architecture/machine.
- If the 2nd field is 0 and the 3rd field is 1, then the
application is running on the current architecture/machine, but
it *has* migrated previously.
- <Thread 0 Load> is not of interest for the purposes of migration.
An example from /proc/popcorn_ps1 *after* migration has already occurred is
shown here:
test 1:2134843:1:144314 2134843:0:1:0:0 46:0;
- "test" is the <Binary Name>
- 2134843 is the global PID and the x86 PID.
- In <Thread 0 Info>, we see that the 2nd field is 0 and the 3rd field
is 1. This means that the process is running locally (on x86), and it
*has* previously migrated.
It's also worth noting that for multi-threaded applications, the
<Thread Info> and <Thread Load> strings will be repeated once per thread.
Only the PID from <Thread 0 Info> is needed for migration, however, because
triggering migration is currently implemented as a per-process mechanism.
Migration is achieved using these two files in conjunction with a third
per-process file: /proc/<Popcorn PID>/mtrig. Note that the mtrig file is not
unique to processes running in the Popcorn namespace, but its use is only
defined for those processes that are running in the namespace.
To migrate an application:
- On the machine that you are migrating *from*, run "cat /proc/popcorn_ps"
or "cat /proc/popcorn_ps1" (without quotes), depending on if you are
trying to migrate an application from ARM or x86, respectively.
- Find the architecture-specific PID of the application that you want to
migrate.
- As a superuser, run "echo X > /proc/<PID>/mtrig" (without quotes), where:
X is either 0 or 1:
- If you want the application to migrate to/remain on x86, use 0
- If you want the application to migrate to/remain on ARM, use 1
<PID> is the architecture-specific PID of the application
Note that you do *not* have to be in the Popcorn namespace to trigger a
migration, you must only have superuser privileges.