-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add aggressive-memory-offset-merging pass to merge offset and previous const #6972
base: main
Are you sure you want to change the base?
Conversation
// load offset=0 | ||
// i32.add | ||
// i32.const A + B | ||
// other expr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the existing OptimizeAddedConstants pass can do something like this, using the assumption of --low-memory-unused
(it will also fold A + B
into the offset, saving the add).
Does that achieve what you want in this pass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can test it with wasm-opt --optimize-added-constants --low-memory-unused
. Or wasm-opt -O3 --low-memory-unused
(that pass is enabled when that flag is set).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This direction of these two passes are different. This pass want to "merge offset to add/sub const". It will not reduce the code size in most of cases. Just simplify the execution of load / store instructions for the interpret / baseline compiler. --optimize-added-constants --low-memory-unused
actually can provide better optimization because it will "merge add/sub const into offset".
For this pass, low-memory-unused
is not enough, it needs more loose assumption which memory visit should never out of bound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the example. Ok, so IIUC you want to optimize
(i32.load offset=16
(i32.sub
(local.get $x)
(i32.const 20)
)
)
into
(i32.load
(i32.sub
(local.get $x)
(i32.const 4)
)
)
Unless I'm missing something, I think that could be optimized under the assumption of low memory unused. We don't do it atm, looks like we only handle adds, but subtracts could be added. The reason it is valid to optimize is that the first expression is really
((x - 20) & 0xffffff) + 16
(i32.sub
can overflow/underflow, which the &
handles; offset=16
does not overflow so no fixup is needed there). The question is then whether that is equal to
(x - 4) & 0xffffff
By the assumption of low memory unused, x >= 1024
. That means the &
operations in both cases never do any work. So x - 20 + 16 === x - 4
, and optimizing is valid.
Maybe I got something wrong though, what do you think @HerrCai0907 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it again. If binaryen's approach to traps in optimization is to treat them as UB, then the optimization does not even require this assumption. Because it will change result iff overflow happen during cal offset (trap -> visit incorrect address).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Binaryen doesn't treat traps as UB, or at least not in sense that LLVM might. By default we do not remove traps at all, though we do not care about the identity of traps (so we can reorder two traps). We do have options to do more, and ignore/remove certain traps:
Lines 122 to 173 in bc36c02
// Optimize assuming things like div by 0, bad load/store, will not trap. | |
// This is deprecated in favor of trapsNeverHappen. | |
bool ignoreImplicitTraps = false; | |
// Optimize assuming a trap will never happen at runtime. This is similar to | |
// ignoreImplicitTraps, but different: | |
// | |
// * ignoreImplicitTraps simply ignores the side effect of trapping when it | |
// computes side effects, and then passes work with that data. | |
// * trapsNeverHappen assumes that if an instruction with a possible trap is | |
// reached, then it does not trap, and an (unreachable) - that always | |
// traps - is never reached. | |
// | |
// The main difference is that in trapsNeverHappen mode we will not move | |
// around code that might trap, like this: | |
// | |
// (if (condition) (code)) | |
// | |
// If (code) might trap, ignoreImplicitTraps ignores that trap, and it might | |
// end up moving (code) to happen before the (condition), that is, | |
// unconditionally. trapsNeverHappen, on the other hand, does not ignore the | |
// side effect of the trap; instead, it will potentially remove the trapping | |
// instruction, if it can - it is always safe to remove a trap in this mode, | |
// as the traps are assumed to not happen. Where it cannot remove the side | |
// effect, it will at least not move code around. | |
// | |
// A consequence of this difference is that code that puts a possible trap | |
// behind a condition is unsafe in ignoreImplicitTraps, but safe in | |
// trapsNeverHappen. In general, trapsNeverHappen is safe on production code | |
// where traps are either fatal errors or assertions, and it is assumed | |
// neither of those can happen (and it is undefined behavior if they do). | |
// | |
// TODO: deprecate and remove ignoreImplicitTraps. | |
// | |
// Since trapsNeverHappen assumes a trap is never reached, it can in principle | |
// remove code like this: | |
// | |
// (i32.store ..) | |
// (unreachable) | |
// | |
// The trap isn't reached, by assumption, and if we reach the store then we'd | |
// reach the trap, so we can assume that isn't reached either, and TNH can | |
// remove both. We do have a specific limitation here, however, which is that | |
// trapsNeverHappen cannot remove calls to *imports*. We assume that an import | |
// might do things we cannot understand, so we never eliminate it. For | |
// example, in LLVM output we might see this: | |
// | |
// (call $abort) ;; a noreturn import - the process is halted with an error | |
// (unreachable) | |
// | |
// That trap is never actually reached since the abort halts execution. In TNH | |
// we can remove the trap but not the call right before it. | |
bool trapsNeverHappen = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I think we should put this optimization under trapsNeverHappen
options.
Back to the example
(i32.load offset=16 (i32.sub (local.get $x) (i32.const 20)))
and
(i32.load offset=0 (i32.sub (local.get $x) (i32.const 4)))
if x is 16, in original code, it will visit 0x1'0000'0000C but optimized code will visit 0xC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would agree if we didn't have low-memory-unused, which I thought we agreed above is correct for this? We don't need both flags here, as low-memory-unused defines exactly this situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But in the previous case, when x is 16, it will visit 0x1'0000'0000C. This address is definitely out of low memory range. I think it will make me and maybe other person confused about the accurate meaning of low-memory-unused
. Since in theory level, they are some differences.
8a8715b
to
cd8dd32
Compare
cd8dd32
to
faba886
Compare
This pass is more aggressive optimization which assumes memory visit instruction will not overflow during arithmetic operations.
Out of bound memory usages are either UB (e.g. C++) or protected / checked in high level language design (Assemblyscript, Go, etc...).
Introducing this pass can reduce the work load of WASM runtime and may helpful to code size in some corner cases.