add aggressive-memory-offset-merging pass to merge offset and previous const #6972

HerrCai0907 · 2024-09-26T14:52:26Z

This pass is more aggressive optimization which assumes memory visit instruction will not overflow during arithmetic operations.
Out of bound memory usages are either UB (e.g. C++) or protected / checked in high level language design (Assemblyscript, Go, etc...).
Introducing this pass can reduce the work load of WASM runtime and may helpful to code size in some corner cases.

kripken · 2024-09-26T18:32:45Z

src/passes/AggressiveMemoryOffsetMerging.cpp

+//   load offset=0
+//     i32.add
+//       i32.const A + B
+//       other expr


I believe the existing OptimizeAddedConstants pass can do something like this, using the assumption of --low-memory-unused (it will also fold A + B into the offset, saving the add).

Does that achieve what you want in this pass?

You can test it with wasm-opt --optimize-added-constants --low-memory-unused. Or wasm-opt -O3 --low-memory-unused (that pass is enabled when that flag is set).

This direction of these two passes are different. This pass want to "merge offset to add/sub const". It will not reduce the code size in most of cases. Just simplify the execution of load / store instructions for the interpret / baseline compiler. --optimize-added-constants --low-memory-unused actually can provide better optimization because it will "merge add/sub const into offset".
For this pass, low-memory-unused is not enough, it needs more loose assumption which memory visit should never out of bound.

The usecase is in AssemblyScript which is a language has internal GC. And the GC info is stored before the object. It will produce the code load(pos=&ptr[-20], offset=16). and it can be simplify a bit in this pass.

Thanks for the example. Ok, so IIUC you want to optimize

(i32.load offset=16 (i32.sub (local.get $x) (i32.const 20) ) )

into

(i32.load (i32.sub (local.get $x) (i32.const 4) ) )

Unless I'm missing something, I think that could be optimized under the assumption of low memory unused. We don't do it atm, looks like we only handle adds, but subtracts could be added. The reason it is valid to optimize is that the first expression is really

((x - 20) & 0xffffff) + 16

(i32.sub can overflow/underflow, which the & handles; offset=16 does not overflow so no fixup is needed there). The question is then whether that is equal to

(x - 4) & 0xffffff

By the assumption of low memory unused, x >= 1024. That means the & operations in both cases never do any work. So x - 20 + 16 === x - 4, and optimizing is valid.

Maybe I got something wrong though, what do you think @HerrCai0907 ?

I think it again. If binaryen's approach to traps in optimization is to treat them as UB, then the optimization does not even require this assumption. Because it will change result iff overflow happen during cal offset (trap -> visit incorrect address).

Binaryen doesn't treat traps as UB, or at least not in sense that LLVM might. By default we do not remove traps at all, though we do not care about the identity of traps (so we can reorder two traps). We do have options to do more, and ignore/remove certain traps:

binaryen/src/pass.h

Lines 122 to 173 in bc36c02

// Optimize assuming things like div by 0, bad load/store, will not trap.

// This is deprecated in favor of trapsNeverHappen.

bool ignoreImplicitTraps = false;

// Optimize assuming a trap will never happen at runtime. This is similar to

// ignoreImplicitTraps, but different:

//

// * ignoreImplicitTraps simply ignores the side effect of trapping when it

// computes side effects, and then passes work with that data.

// * trapsNeverHappen assumes that if an instruction with a possible trap is

// reached, then it does not trap, and an (unreachable) - that always

// traps - is never reached.

//

// The main difference is that in trapsNeverHappen mode we will not move

// around code that might trap, like this:

//

// (if (condition) (code))

//

// If (code) might trap, ignoreImplicitTraps ignores that trap, and it might

// end up moving (code) to happen before the (condition), that is,

// unconditionally. trapsNeverHappen, on the other hand, does not ignore the

// side effect of the trap; instead, it will potentially remove the trapping

// instruction, if it can - it is always safe to remove a trap in this mode,

// as the traps are assumed to not happen. Where it cannot remove the side

// effect, it will at least not move code around.

//

// A consequence of this difference is that code that puts a possible trap

// behind a condition is unsafe in ignoreImplicitTraps, but safe in

// trapsNeverHappen. In general, trapsNeverHappen is safe on production code

// where traps are either fatal errors or assertions, and it is assumed

// neither of those can happen (and it is undefined behavior if they do).

//

// TODO: deprecate and remove ignoreImplicitTraps.

//

// Since trapsNeverHappen assumes a trap is never reached, it can in principle

// remove code like this:

//

// (i32.store ..)

// (unreachable)

//

// The trap isn't reached, by assumption, and if we reach the store then we'd

// reach the trap, so we can assume that isn't reached either, and TNH can

// remove both. We do have a specific limitation here, however, which is that

// trapsNeverHappen cannot remove calls to *imports*. We assume that an import

// might do things we cannot understand, so we never eliminate it. For

// example, in LLVM output we might see this:

//

// (call $abort) ;; a noreturn import - the process is halted with an error

// (unreachable)

//

// That trap is never actually reached since the abort halts execution. In TNH

// we can remove the trap but not the call right before it.

bool trapsNeverHappen = false;

Then I think we should put this optimization under trapsNeverHappen options.
Back to the example

(i32.load offset=16 (i32.sub (local.get $x) (i32.const 20)))

and

(i32.load offset=0 (i32.sub (local.get $x) (i32.const 4)))

if x is 16, in original code, it will visit 0x1'0000'0000C but optimized code will visit 0xC.

I would agree if we didn't have low-memory-unused, which I thought we agreed above is correct for this? We don't need both flags here, as low-memory-unused defines exactly this situation.

But in the previous case, when x is 16, it will visit 0x1'0000'0000C. This address is definitely out of low memory range. I think it will make me and maybe other person confused about the accurate meaning of low-memory-unused. Since in theory level, they are some differences.

kripken reviewed Sep 26, 2024

View reviewed changes

HerrCai0907 requested a review from kripken September 29, 2024 01:27

HerrCai0907 force-pushed the opt-sub-load-offset branch from 8a8715b to cd8dd32 Compare October 3, 2024 15:42

implement this optimize offset to load constant

faba886

HerrCai0907 force-pushed the opt-sub-load-offset branch from cd8dd32 to faba886 Compare October 19, 2024 21:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add aggressive-memory-offset-merging pass to merge offset and previous const #6972

add aggressive-memory-offset-merging pass to merge offset and previous const #6972

HerrCai0907 commented Sep 26, 2024

kripken Sep 26, 2024

kripken Sep 26, 2024

HerrCai0907 Sep 27, 2024 •

edited

Loading

HerrCai0907 Sep 27, 2024

kripken Sep 30, 2024

HerrCai0907 Oct 19, 2024

kripken Oct 21, 2024

HerrCai0907 Oct 22, 2024

kripken Oct 22, 2024

HerrCai0907 Oct 24, 2024

	// Optimize assuming things like div by 0, bad load/store, will not trap.
	// This is deprecated in favor of trapsNeverHappen.
	bool ignoreImplicitTraps = false;
	// Optimize assuming a trap will never happen at runtime. This is similar to
	// ignoreImplicitTraps, but different:
	//
	// * ignoreImplicitTraps simply ignores the side effect of trapping when it
	// computes side effects, and then passes work with that data.
	// * trapsNeverHappen assumes that if an instruction with a possible trap is
	// reached, then it does not trap, and an (unreachable) - that always
	// traps - is never reached.
	//
	// The main difference is that in trapsNeverHappen mode we will not move
	// around code that might trap, like this:
	//
	// (if (condition) (code))
	//
	// If (code) might trap, ignoreImplicitTraps ignores that trap, and it might
	// end up moving (code) to happen before the (condition), that is,
	// unconditionally. trapsNeverHappen, on the other hand, does not ignore the
	// side effect of the trap; instead, it will potentially remove the trapping
	// instruction, if it can - it is always safe to remove a trap in this mode,
	// as the traps are assumed to not happen. Where it cannot remove the side
	// effect, it will at least not move code around.
	//
	// A consequence of this difference is that code that puts a possible trap
	// behind a condition is unsafe in ignoreImplicitTraps, but safe in
	// trapsNeverHappen. In general, trapsNeverHappen is safe on production code
	// where traps are either fatal errors or assertions, and it is assumed
	// neither of those can happen (and it is undefined behavior if they do).
	//
	// TODO: deprecate and remove ignoreImplicitTraps.
	//
	// Since trapsNeverHappen assumes a trap is never reached, it can in principle
	// remove code like this:
	//
	// (i32.store ..)
	// (unreachable)
	//
	// The trap isn't reached, by assumption, and if we reach the store then we'd
	// reach the trap, so we can assume that isn't reached either, and TNH can
	// remove both. We do have a specific limitation here, however, which is that
	// trapsNeverHappen cannot remove calls to imports. We assume that an import
	// might do things we cannot understand, so we never eliminate it. For
	// example, in LLVM output we might see this:
	//
	// (call $abort) ;; a noreturn import - the process is halted with an error
	// (unreachable)
	//
	// That trap is never actually reached since the abort halts execution. In TNH
	// we can remove the trap but not the call right before it.
	bool trapsNeverHappen = false;

add aggressive-memory-offset-merging pass to merge offset and previous const #6972

Are you sure you want to change the base?

add aggressive-memory-offset-merging pass to merge offset and previous const #6972

Conversation

HerrCai0907 commented Sep 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HerrCai0907 Sep 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HerrCai0907 Sep 27, 2024 •

edited

Loading