Conversation
Native uses of alloc_array are duplicated in the native test files. Those will be removed when the tests are ported to Rust.
This also changes how we store the next free index a little bit. The constant `FULL` is gone, when the next free location is equal to the array length that's how we know the table is full now. Also, `FREE_SLOT` no longer next free location shifted left, it holds the next free location directly (not shifted).
- Introduced `as_blah` methods to convert a SkewedPtr or *Obj to other object types. These methods check the type in debug mode. - Improve error reporting, we know show details in assertion failures - Temporarily link with debug runtime to enable sanity checking by default for now
nomeata
left a comment
There was a problem hiding this comment.
Very nice! Only minor comments
| // NOTE: For this to work heap addresses need to be greater than the largest value for object | ||
| // headers. Currently this holds. TODO: Document this better. | ||
| let mut header = (*obj).tag; | ||
| while header > TAG_NULL { | ||
| // TODO: is `header > TAG_NULL` the best way to distinguish a tag from a pointer? |
There was a problem hiding this comment.
This is a bit hairy. Maybe leave a note in the cod where we define header tags that these tags must be smaller than any possible dynamic pointer?
Also, this would limit our wiggle room if we want to put variant tags or array lengths etc into the header… (#916)
There was a problem hiding this comment.
I think the fact that you can distinguish headers from pointers is unfortunately fundamental to the threading algorithm, unless there's some other technique that can be used.
There was a problem hiding this comment.
Right. It’ll take away roughly ¼ of the possible header values. Still plenty of room for other things.
| @@ -0,0 +1,242 @@ | |||
| use crate::alloc; | |||
There was a problem hiding this comment.
I assume the code in here has just moved, and isn't substantially changed right?
There was a problem hiding this comment.
Yes, no need to review this file.
| STACK_LEN += Words(1); | ||
| } | ||
|
|
||
| pub unsafe fn pop_mark_stack() -> Option<usize> { |
There was a problem hiding this comment.
I should try to get rid of this Option as well, I think it could also improve perf. (#2522 (comment))
There was a problem hiding this comment.
I don't understand why but if I apply the same technique here the benchmark starts using more instructions. No idea why. The new pop_mark_stack is actually smaller in generated Wasm. This is the new version I've tried:
pub unsafe fn pop_mark_stack() -> usize {
if STACK_LEN.0 == 0 {
0
} else {
STACK_LEN -= Words(1);
*(STACK_PTR.payload_addr() as *mut usize).add(STACK_LEN.0 as usize)
}
}There was a problem hiding this comment.
Here's the patch for this change:
diff --git a/rts/motoko-rts-tests/src/mark_stack.rs b/rts/motoko-rts-tests/src/mark_stack.rs
index 0499a31bf..4938f252a 100644
--- a/rts/motoko-rts-tests/src/mark_stack.rs
+++ b/rts/motoko-rts-tests/src/mark_stack.rs
@@ -24,7 +24,7 @@ fn test_(objs: Vec<usize>) -> TestResult {
for obj in objs.iter().rev() {
let popped = pop_mark_stack();
- if popped != Some(*obj) {
+ if popped != *obj {
free_mark_stack(); // TODO: Does not really free
return TestResult::error(format!(
"Unexpected object popped, expected={:?}, popped={:?}",
diff --git a/rts/motoko-rts/src/gc/mark_compact.rs b/rts/motoko-rts/src/gc/mark_compact.rs
index ef8a814ee..5890d67db 100644
--- a/rts/motoko-rts/src/gc/mark_compact.rs
+++ b/rts/motoko-rts/src/gc/mark_compact.rs
@@ -80,7 +80,11 @@ unsafe fn push_mark_stack(obj: SkewedPtr, heap_base: u32) {
}
unsafe fn mark_stack(heap_base: u32) {
- while let Some(obj) = pop_mark_stack() {
+ loop {
+ let obj = pop_mark_stack();
+ if obj == 0 {
+ break;
+ }
mark_fields(obj as *mut Obj, heap_base);
}
}
diff --git a/rts/motoko-rts/src/mark_stack.rs b/rts/motoko-rts/src/mark_stack.rs
index be87acc6f..1bf8145b9 100644
--- a/rts/motoko-rts/src/mark_stack.rs
+++ b/rts/motoko-rts/src/mark_stack.rs
@@ -50,12 +50,11 @@ pub unsafe fn push_mark_stack(obj: usize) {
STACK_LEN += Words(1);
}
-pub unsafe fn pop_mark_stack() -> Option<usize> {
+pub unsafe fn pop_mark_stack() -> usize {
if STACK_LEN.0 == 0 {
- None
+ 0
} else {
STACK_LEN -= Words(1);
- let p = *(STACK_PTR.payload_addr() as *mut usize).add(STACK_LEN.0 as usize);
- Some(p)
+ *(STACK_PTR.payload_addr() as *mut usize).add(STACK_LEN.0 as usize)
}
}There was a problem hiding this comment.
Any idea why the cost went up?
There was a problem hiding this comment.
Here's the diff of before vs. after this patch (mo-rts.wasm):
--- mo-rts.wat 2021-06-14 15:44:34.892004845 +0300
+++ ../../motoko/rts/mo-rts.wat 2021-06-14 15:44:34.180013747 +0300
@@ -19343,7 +19343,7 @@
local.get 5
i32.const 1
i32.add
- local.set 10
+ local.set 8
block ;; label = @1
local.get 5
i32.const 5
@@ -19352,7 +19352,7 @@
local.tee 2
i32.eqz
br_if 0 (;@1;)
- local.get 10
+ local.get 8
i32.const 8
i32.add
local.set 1
@@ -19402,7 +19402,6 @@
local.tee 2
i32.const 15796
i32.add
- local.tee 8
local.get 1
i32.const -1
i32.add
@@ -19419,21 +19418,27 @@
i32.const 8
i32.add
i32.load
+ local.tee 1
+ i32.eqz
+ br_if 1 (;@1;)
+ local.get 1
local.get 4
call $motoko_rts::gc::mark_compact::mark_fields::hc1bca295efbdeb41
- local.get 8
+ global.get 1
+ i32.const 15796
+ i32.add
i32.load
local.tee 1
br_if 0 (;@2;)
end
end
block ;; label = @1
- local.get 10
+ local.get 8
i32.load offset=4 align=1
local.tee 2
i32.eqz
br_if 0 (;@1;)
- local.get 10
+ local.get 8
i32.const 8
i32.add
local.set 1So the new version, after a lot of inlining, ends up being larger. No idea why though. This code is for compacting_gc, so mark_stack is inlined in mark_compact, which is inlined in compacting_gc. Somehow the version with this change has more stack traffic.
There was a problem hiding this comment.
Maybe it would be shorter if you just stored the sentinel value 0 at the bottom of the stack and avoided doing the conditional testing on pop?
There was a problem hiding this comment.
And maybe just maintain pointer to top of stack rather than computing from length on push and pop
There was a problem hiding this comment.
And maybe just maintain pointer to top of stack rather than computing from length on push and pop
This made a benchmark use 2.3% less instructions. Pushed as 836992c.
There was a problem hiding this comment.
Maybe it would be shorter if you just stored the sentinel value 0 at the bottom of the stack and avoided doing the conditional testing on pop?
We would need a conditional to avoid popping the sentinel value, right? Otherwise a push after popping the sentinel value would break the stack. So I think we can't remove the conditional in pop_mark_stack.
|
I guess one thing we could consider is using stable memory for the mark stack and bits, but maybe that's too slow. Certainly not for this PR. |
Perf changes in CanCan backend: - Copying GC: -0.81% - Compacting GC: +3.06%
|
LGTM. |
Co-authored-by: Claudio Russo <claudio@dfinity.org>
| ) { | ||
| // The array and the objects pointed by the array are all static so we don't evacuate them. We | ||
| // only evacuate fields of objects in the array. | ||
| for i in 0..roots.len() { |
There was a problem hiding this comment.
Do we actually know that all the values in roots are pointers to heap objects and never unboxed value or even static objects? Otherwise this code might crash, right?
There was a problem hiding this comment.
My reasoning is that env_static_roots eventually calls visit_pointer_fields, but there's no check that the obj is a proper object.
There was a problem hiding this comment.
Yeah, so the static heap (other than static data/constants) have a an array of MutBoxes (roots in this function). These MutBoxes also live in static heap but they can point to dynamic heap. This code evacuates those MutBox fields.
There was a problem hiding this comment.
Right, just remembered that after (re)seeing the explanation to a previous comment. Sorry about the gaslighting
Co-authored-by: Claudio Russo <claudio@dfinity.org>
Makes a CanCan backend benchmark use 2.3% less instructions
This merges #2223 and master and enables the new collector with a new moc flag
--compacting-gc. The goal is to (1) avoid bitrot in #2223 (2) allow easytesting and benchmarking of different collectors.
We include both collectors in the Wasms. Binary sizes grow between 0.8% (CanCan
backend) to 1.8% (in simple tests in
run-drun).Some benchmark results here:
#2033 (comment)
An improvement to the compacting GC is currently being implemented in branch
osa1/compacting_gc_3.