Skip to content

Commit

Permalink
Asyncify: Add an "add list", rename old lists (#2910)
Browse files Browse the repository at this point in the history
Asyncify does a whole-program analysis to figure out the list of functions
to instrument. In
emscripten-core/emscripten#10746 (comment)
we realized that we need another type of list there, an "add list" which is a
list of functions to add to the instrumented functions list, that is, that we
should definitely instrument.

The use case in that link is that we disable indirect calls, but there is
one special indirect call that we do need to instrument. Being able to add
just that one can be much more efficient than assuming all indirect calls in
a big codebase need instrumentation. Similar issues can come up if we
add a profile-guided option to asyncify, which we've discussed.

The existing lists were not good enough to allow that, so a new option
is needed. I took the opportunity to rename the old ones to something
better and more consistent, so after this PR we have 3 lists as follows:

* The old "remove list" (previously "blacklist") which removes functions
from the list of functions to be instrumented.

* The new "add list" which adds to that list (note how add/remove are
clearly parallel).

* The old "only list" (previously "whitelist") which simply replaces the
entire list, and so only those functions are instrumented and no other.

This PR temporarily still supports the old names in the commandline
arguments, to avoid immediate breakage for our CI.
  • Loading branch information
kripken authored Jun 13, 2020
1 parent 49f2443 commit d26f90b
Show file tree
Hide file tree
Showing 9 changed files with 291 additions and 64 deletions.
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,11 @@ full changeset diff at the end of each section.
Current Trunk
-------------

- Add Asyncify "add list" that adds to the list of functions to be instrumented.
Rename old lists to be clearer and more consistent with that, so now there is
"remove list" to remove, "add list" to add, and "only list" which if set means
that only those functions should be instrumented and nothing else.

v94
---

Expand Down
135 changes: 87 additions & 48 deletions src/passes/Asyncify.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -225,33 +225,45 @@
// If that is true for your codebase, then this can be extremely useful
// as otherwise it looks like any indirect call can go to a lot of places.
//
// --pass-arg=asyncify-blacklist@name1,name2,name3
// --pass-arg=asyncify-asserts
//
// If the blacklist is provided, then the functions in it will not be
// instrumented even if it looks like they need to. This can be useful
// if you know things the whole-program analysis doesn't, like if you
// know certain indirect calls are safe and won't unwind. But if you
// get the list wrong things will break (and in a production build user
// input might reach code paths you missed during testing, so it's hard
// to know you got this right), so this is not recommended unless you
// really know what are doing, and need to optimize every bit of speed
// and size. '*' wildcard matching supported.
// This enables extra asserts in the output, like checking if we in
// an unwind/rewind in an invalid place (this can be helpful for manual
// tweaking of the only-list / remove-list, see later).
//
// As with --asyncify-imports, you can use a response file here.
// For manual fine-tuning of the list of instrumented functions, there are lists
// that you can set. These must be used carefully, as misuse can break your
// application - for example, if a function is called that should be
// instrumented but isn't because of these options, then bad things can happen.
// Note that even if you test this locally then users running your code in
// production may reach other code paths. Use these carefully!
//
// --pass-arg=asyncify-whitelist@name1,name2,name3
// Each of the lists can be used with a response file (@filename, which is then
// loaded from the file). You can also use '*' wildcard matching in the lists.
//
// If the whitelist is provided, then only the functions in the list
// will be instrumented. Like the blacklist, getting this wrong will
// break your application. '*' wildcard matching supported.
// --pass-arg=asyncify-removelist@name1,name2,name3
//
// As with --asyncify-imports, you can use a response file here.
// If the "remove-list" is provided, then the functions in it will be
// *removed* from the list of instrumented functions. That is, they won't
// be instrumented even if it looks like they need to be. This can be
// useful if you know things the safe whole-program analysis doesn't, e.g.
// that certain code paths will not be taken, where certain indirect calls
// will go, etc.
//
// --pass-arg=asyncify-asserts
// --pass-arg=asyncify-addlist@name1,name2,name3
//
// This enables extra asserts in the output, like checking if we in
// an unwind/rewind in an invalid place (this can be helpful for manual
// tweaking of the whitelist/blacklist).
// If the "add-list" is provided, then the functions in the list will be
// *added* to the list of instrumented functions, that is, they will be
// instrumented even if otherwise we think they don't need to be. As by
// default everything will be instrumented in the safest way possible,
// this is only useful if you use ignore-indirect and use this to fix up
// some indirect calls that *do* need to be instrumented, or if you will
// do some later transform of the code that adds more call paths, etc.
//
// --pass-arg=asyncify-onlylist@name1,name2,name3
//
// If the "only-list" is provided, then *only* the functions in the list
// will be instrumented, and nothing else.
//
// TODO When wasm has GC, extending the live ranges of locals can keep things
// alive unnecessarily. We may want to set locals to null at the end
Expand Down Expand Up @@ -465,7 +477,7 @@ class ModuleAnalyzer {
// the top-most part is still marked as changing the state; so things
// that call it are instrumented. This is not done for the bottom.
bool isTopMostRuntime = false;
bool inBlacklist = false;
bool inRemoveList = false;
};

typedef std::map<Function*, Info> Map;
Expand All @@ -475,14 +487,16 @@ class ModuleAnalyzer {
ModuleAnalyzer(Module& module,
std::function<bool(Name, Name)> canImportChangeState,
bool canIndirectChangeState,
const String::Split& blacklistInput,
const String::Split& whitelistInput,
const String::Split& removeListInput,
const String::Split& addListInput,
const String::Split& onlyListInput,
bool asserts)
: module(module), canIndirectChangeState(canIndirectChangeState),
fakeGlobals(module), asserts(asserts) {

PatternMatcher blacklist("black", module, blacklistInput);
PatternMatcher whitelist("white", module, whitelistInput);
PatternMatcher removeList("remove", module, removeListInput);
PatternMatcher addList("add", module, addListInput);
PatternMatcher onlyList("only", module, onlyListInput);

// Rename the asyncify imports so their internal name matches the
// convention. This makes replacing them with the implementations
Expand Down Expand Up @@ -574,12 +588,12 @@ class ModuleAnalyzer {
}
});

// Functions in the blacklist are assumed to not change the state.
// Functions in the remove-list are assumed to not change the state.
for (auto& pair : scanner.map) {
auto* func = pair.first;
auto& info = pair.second;
if (blacklist.match(func->name)) {
info.inBlacklist = true;
if (removeList.match(func->name)) {
info.inRemoveList = true;
info.canChangeState = false;
}
}
Expand Down Expand Up @@ -609,24 +623,33 @@ class ModuleAnalyzer {
scanner.propagateBack([](const Info& info) { return info.canChangeState; },
[](const Info& info) {
return !info.isBottomMostRuntime &&
!info.inBlacklist;
!info.inRemoveList;
},
[](Info& info) { info.canChangeState = true; },
scanner.IgnoreIndirectCalls);

map.swap(scanner.map);

if (!whitelistInput.empty()) {
// Only the functions in the whitelist can change the state.
if (!onlyListInput.empty()) {
// Only the functions in the only-list can change the state.
for (auto& func : module.functions) {
if (!func->imported()) {
map[func.get()].canChangeState = whitelist.match(func->name);
map[func.get()].canChangeState = onlyList.match(func->name);
}
}
}

blacklist.checkPatternsMatches();
whitelist.checkPatternsMatches();
if (!addListInput.empty()) {
for (auto& func : module.functions) {
if (!func->imported() && addList.match(func->name)) {
map[func.get()].canChangeState = true;
}
}
}

removeList.checkPatternsMatches();
addList.checkPatternsMatches();
onlyList.checkPatternsMatches();
}

bool needsInstrumentation(Function* func) {
Expand Down Expand Up @@ -977,7 +1000,7 @@ struct AsyncifyFlow : public Pass {
}

// Given a function that is not instrumented - because we proved it doesn't
// need it, or depending on the whitelist/blacklist - add assertions that
// need it, or depending on the only-list / remove-list - add assertions that
// verify that property at runtime.
// Note that it is ok to run code while sleeping (if you are careful not
// to break assumptions in the program!) - so what is actually
Expand Down Expand Up @@ -1266,23 +1289,38 @@ struct Asyncify : public Pass {
String::Split listedImports(stateChangingImports, ",");
auto ignoreIndirect = runner->options.getArgumentOrDefault(
"asyncify-ignore-indirect", "") == "";
String::Split blacklist(
String::trim(read_possible_response_file(
runner->options.getArgumentOrDefault("asyncify-blacklist", ""))),
",");
String::Split whitelist(
std::string removeListInput =
runner->options.getArgumentOrDefault("asyncify-removelist", "");
if (removeListInput.empty()) {
// Support old name for now to avoid immediate breakage TODO remove
removeListInput =
runner->options.getArgumentOrDefault("asyncify-blacklist", "");
}
String::Split removeList(
String::trim(read_possible_response_file(removeListInput)), ",");
String::Split addList(
String::trim(read_possible_response_file(
runner->options.getArgumentOrDefault("asyncify-whitelist", ""))),
runner->options.getArgumentOrDefault("asyncify-addlist", ""))),
",");
std::string onlyListInput =
runner->options.getArgumentOrDefault("asyncify-onlylist", "");
if (onlyListInput.empty()) {
// Support old name for now to avoid immediate breakage TODO remove
onlyListInput =
runner->options.getArgumentOrDefault("asyncify-whitelist", "");
}
String::Split onlyList(
String::trim(read_possible_response_file(onlyListInput)), ",");
auto asserts =
runner->options.getArgumentOrDefault("asyncify-asserts", "") != "";

blacklist = handleBracketingOperators(blacklist);
whitelist = handleBracketingOperators(whitelist);
removeList = handleBracketingOperators(removeList);
addList = handleBracketingOperators(addList);
onlyList = handleBracketingOperators(onlyList);

if (!blacklist.empty() && !whitelist.empty()) {
Fatal() << "It makes no sense to use both a blacklist and a whitelist "
"with asyncify.";
if (!onlyList.empty() && (!removeList.empty() || !addList.empty())) {
Fatal() << "It makes no sense to use both an asyncify only-list together "
"with another list.";
}

auto canImportChangeState = [&](Name module, Name base) {
Expand All @@ -1302,8 +1340,9 @@ struct Asyncify : public Pass {
ModuleAnalyzer analyzer(*module,
canImportChangeState,
ignoreIndirect,
blacklist,
whitelist,
removeList,
addList,
onlyList,
asserts);

// Add necessary globals before we emit code to use them.
Expand Down
162 changes: 162 additions & 0 deletions test/passes/[email protected]
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
(module
(type $none_=>_none (func))
(type $i32_=>_none (func (param i32)))
(type $none_=>_i32 (func (result i32)))
(import "env" "import" (func $import))
(memory $0 1 2)
(global $__asyncify_state (mut i32) (i32.const 0))
(global $__asyncify_data (mut i32) (i32.const 0))
(export "asyncify_start_unwind" (func $asyncify_start_unwind))
(export "asyncify_stop_unwind" (func $asyncify_stop_unwind))
(export "asyncify_start_rewind" (func $asyncify_start_rewind))
(export "asyncify_stop_rewind" (func $asyncify_stop_rewind))
(export "asyncify_get_state" (func $asyncify_get_state))
(func $foo
(local $0 i32)
(local $1 i32)
(if
(i32.eq
(global.get $__asyncify_state)
(i32.const 2)
)
(nop)
)
(local.tee $0
(block $__asyncify_unwind
(block
(block
(if
(i32.eq
(global.get $__asyncify_state)
(i32.const 2)
)
(block
(i32.store
(global.get $__asyncify_data)
(i32.add
(i32.load
(global.get $__asyncify_data)
)
(i32.const -4)
)
)
(local.set $1
(i32.load
(i32.load
(global.get $__asyncify_data)
)
)
)
)
)
(if
(i32.eq
(global.get $__asyncify_state)
(i32.const 0)
)
(call $nothing)
)
)
(return)
)
)
)
(block
(i32.store
(i32.load
(global.get $__asyncify_data)
)
(local.get $0)
)
(i32.store
(global.get $__asyncify_data)
(i32.add
(i32.load
(global.get $__asyncify_data)
)
(i32.const 4)
)
)
)
(nop)
)
(func $bar
(call $nothing)
)
(func $nothing
(nop)
)
(func $asyncify_start_unwind (param $0 i32)
(global.set $__asyncify_state
(i32.const 1)
)
(global.set $__asyncify_data
(local.get $0)
)
(if
(i32.gt_u
(i32.load
(global.get $__asyncify_data)
)
(i32.load offset=4
(global.get $__asyncify_data)
)
)
(unreachable)
)
)
(func $asyncify_stop_unwind
(global.set $__asyncify_state
(i32.const 0)
)
(if
(i32.gt_u
(i32.load
(global.get $__asyncify_data)
)
(i32.load offset=4
(global.get $__asyncify_data)
)
)
(unreachable)
)
)
(func $asyncify_start_rewind (param $0 i32)
(global.set $__asyncify_state
(i32.const 2)
)
(global.set $__asyncify_data
(local.get $0)
)
(if
(i32.gt_u
(i32.load
(global.get $__asyncify_data)
)
(i32.load offset=4
(global.get $__asyncify_data)
)
)
(unreachable)
)
)
(func $asyncify_stop_rewind
(global.set $__asyncify_state
(i32.const 0)
)
(if
(i32.gt_u
(i32.load
(global.get $__asyncify_data)
)
(i32.load offset=4
(global.get $__asyncify_data)
)
)
(unreachable)
)
)
(func $asyncify_get_state (result i32)
(global.get $__asyncify_state)
)
)
Loading

0 comments on commit d26f90b

Please sign in to comment.