Skip to content

Commit

Permalink
Merge #2012
Browse files Browse the repository at this point in the history
2012: Refactor singlepass init stack assembly r=syrusakbary a=slave5vw

<!-- 
Prior to submitting a PR, review the CONTRIBUTING.md document for recommendations on how to test:
https://github.com/wasmerio/wasmer/blob/master/CONTRIBUTING.md#pull-requests

-->

# Description
<!-- 
Provide details regarding the change including motivation,
links to related issues, and the context of the PR.
-->
<!-- 
Prior to submitting a PR, review the CONTRIBUTING.md document for recommendations on how to test:
https://github.com/wasmerio/wasmer/blob/master/CONTRIBUTING.md#pull-requests

-->

The old method generated 1 mov assembly instruction per stack slot.
This takes up 15 bytes * slots count size. So more local stack variables use, the larger the code size.
The suggested method is the same as c memset.
Initialize with a fixed instructions (24 bytes) using the rep stosq assembly instruction. It works on a range basis.
https://www.felixcloutier.com/x86/stos:stosb:stosw:stosd:stosq

However, in this way, the skip_stack_guard_page test fails.
https://github.com/wasmerio/wasmer/blob/1b49fe8475e335c6cdbb702a7100ba044201afd0/lib/vm/src/trap/traphandlers.rs#L124
The reason is that DF Flag is 0, so it is accessed in reverse order and initialized.
However, The vm trap handler checks the last address and determines it as a heap oob without range.
the stack oob is correct, and the test condition fails because it is a stack oob occurrence.

I think there are two ways. 
1. improve stack oob check in VM trap handler 
-> Abstract. There is no information on the range of access, so this may not be possible.
2. pushf/popf/std(set DF=1), and access to sequential order
-> Not recommended. Unnecessarily increases assembly size.

So for now, I added it to ignores.txt like cranelift/llvm.


# Review

- [ ] Add a short description of the the change to the CHANGELOG.md file


Co-authored-by: Jiyong Ha <[email protected]>
Co-authored-by: losfair <[email protected]>
Co-authored-by: Jiyong Ha <[email protected]>
Co-authored-by: Syrus Akbary <[email protected]>
  • Loading branch information
5 people authored Jan 14, 2021
2 parents 945d950 + a217bad commit 3806321
Show file tree
Hide file tree
Showing 2 changed files with 42 additions and 5 deletions.
4 changes: 4 additions & 0 deletions lib/compiler-singlepass/src/emitter_x64.rs
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,7 @@ pub trait Emitter {
fn emit_xchg(&mut self, sz: Size, src: Location, dst: Location);
fn emit_lock_xadd(&mut self, sz: Size, src: Location, dst: Location);
fn emit_lock_cmpxchg(&mut self, sz: Size, src: Location, dst: Location);
fn emit_rep_stosq(&mut self);

fn emit_btc_gpr_imm8_32(&mut self, src: u8, dst: GPR);
fn emit_btc_gpr_imm8_64(&mut self, src: u8, dst: GPR);
Expand Down Expand Up @@ -1176,6 +1177,9 @@ impl Emitter for Assembler {
}
}

fn emit_rep_stosq(&mut self) {
dynasm!(self ; rep stosq);
}
fn emit_btc_gpr_imm8_32(&mut self, src: u8, dst: GPR) {
dynasm!(self ; btc Rd(dst as u8), BYTE src as i8);
}
Expand Down
43 changes: 38 additions & 5 deletions lib/compiler-singlepass/src/machine.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,12 @@ use crate::emitter_x64::*;
use crate::x64_decl::{new_machine_state, X64Register};
use smallvec::smallvec;
use smallvec::SmallVec;
use std::cmp;
use std::collections::HashSet;
use wasmer_compiler::wasmparser::Type as WpType;

const NATIVE_PAGE_SIZE: usize = 4096;

struct MachineStackOffset(usize);

pub struct Machine {
Expand Down Expand Up @@ -447,18 +450,48 @@ impl Machine {
}
}

// Initialize all normal locals to zero.
for i in n_params..n {
a.emit_mov(Size::S64, Location::Imm32(0), locations[i]);
}

// Load vmctx into R15.
a.emit_mov(
Size::S64,
Self::get_param_location(0),
Location::GPR(GPR::R15),
);

// Stack probe.
//
// `rep stosq` writes data from low address to high address and may skip the stack guard page.
// so here we probe it explicitly when needed.
for i in (n_params..n).step_by(NATIVE_PAGE_SIZE / 8).skip(1) {
a.emit_mov(Size::S64, Location::Imm32(0), locations[i]);
}

// Initialize all normal locals to zero.
let mut init_stack_loc_cnt = 0;
let mut last_stack_loc = Location::Memory(GPR::RBP, i32::MAX);
for i in n_params..n {
match locations[i] {
Location::Memory(_, _) => {
init_stack_loc_cnt += 1;
last_stack_loc = cmp::min(last_stack_loc, locations[i]);
}
Location::GPR(_) => {
a.emit_mov(Size::S64, Location::Imm32(0), locations[i]);
}
_ => unreachable!(),
}
}
if init_stack_loc_cnt > 0 {
// Since these assemblies take up to 24 bytes, if more than 2 slots are initialized, then they are smaller.
a.emit_mov(
Size::S64,
Location::Imm64(init_stack_loc_cnt as u64),
Location::GPR(GPR::RCX),
);
a.emit_xor(Size::S64, Location::GPR(GPR::RAX), Location::GPR(GPR::RAX));
a.emit_lea(Size::S64, last_stack_loc, Location::GPR(GPR::RDI));
a.emit_rep_stosq();
}

// Add the size of all locals allocated to stack.
self.stack_offset.0 += static_area_size - callee_saved_regs_size;

Expand Down

0 comments on commit 3806321

Please sign in to comment.