-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test the soundness of vm::Value
#40
Comments
vm::Thread
and vm::World
is unclearvm::Thread
and vm::World
is unclear
Regarding the first checkbox ( This suggests a way to drop the On its own, this just replaces all that refcount traffic with a bunch of extra references instead. It may be better to introduce something like a |
This moves us closer to a sound and safe API as discussed in #40. Because a `vm::Value` may point to an array, users must check its type and perhaps increment its refcount anywhere they duplicate the value. As part of overhauling `vm::Value`, this commit also slightly tweaks the NaN-boxing scheme. The quiet bit is now always set, moving tagged values up to the top of the available bit-pattern range. This simplifies decoding such that all real values are contiguous. (See #43) To avoid introducing new, redundant refcount traffic, there are now also `vm::ValueRef` and `vm::ArrayRef` types that share a representation with their owned counterparts, but can only exist as a borrow. The interpreter relies entirely on codegen to provide it with sound bytecode that does not violate host language rules. This means that each opcode handler must carefully choose whether to borrow, move, or clone its operands, so that codegen can rely on that behavior.
This moves us closer to a sound and safe API as discussed in #40. Because a `vm::Value` may point to an array, users must check its type and perhaps increment its refcount anywhere they duplicate the value. As part of overhauling `vm::Value`, this commit also slightly tweaks the NaN-boxing scheme. The quiet bit is now always set, moving tagged values up to the top of the available bit-pattern range. This simplifies decoding such that all real values are contiguous. (See #43) To avoid introducing new, redundant refcount traffic, there are now also `vm::ValueRef` and `vm::ArrayRef` types that share a representation with their owned counterparts, but can only exist as a borrow. The interpreter relies entirely on codegen to provide it with sound bytecode that does not violate host language rules. This means that each opcode handler must carefully choose whether to borrow, move, or clone its operands, so that codegen can rely on that behavior.
Entity lists are now copy-on-write. This lets the interpreter keep iterators pointing into them while (potentially) mutating them, and preserves GML `with` semantics in the presence of such mutation. In particular, this makes `instance_create` and `instance_destroy` memory safe when called from GML, as described by #40.
Entity lists are now copy-on-write. This lets the interpreter keep iterators pointing into them while (potentially) mutating them, and preserves GML `with` semantics in the presence of such mutation. In particular, this makes `instance_create` and `instance_destroy` memory safe when called from GML, as described by #40.
vm::Thread
and vm::World
is unclearvm::Value
I've addressed the known issues here, so the remaining work is to make sure we have good test coverage. This might include running the test suite under Miri and other sanitizers, as well as fuzzing. |
Edit: This has been addressed; this issue now tracks testing to ensure this doesn't get broken.
The VM stack contains three kinds of raw pointers, with relatively unclear rules around when and how they can be used soundly. This makes it hard to determine whether (modifications to) the interpreter are correct. Worse, it exposes the entirely-safe engine APIs to unsoundness- for example, this is the main blocker for implementing
instance_create
andinstance_destroy
.These are the three kinds of pointers:
vm::Value
can encode avm::Array
, which is itself anRc<UnsafeCell>
. However, becausevm::Value
isCopy
, theRc
is not helpful at this level- it's there to implement copy-on-write arrays at the GML level.vm::Row
is an intermediateptr::NonNull
into avm::Array
. It is intended to be "short-lived" (i.e. not stored in instances) but we may still want Dejavu to be allowed to produce optimized bytecode such that it lives across calls to other scripts or even engine APIs.with
iterator is aptr::NonNull
to some array ofvm::Entity
s owned by thevm::Thread
orvm::World
. It is "short-lived" in the same way asvm::Row
, but unlikevm::Row
is required by GML semantics to live across calls in the body of thewith
loop.There are two separate-but-related problems we need to avoid:
vm::Row
andwith
iterators because they are confined to the VM stack, but it's still there.What I'd like to do to build confidence that this is sound:
Value
contains anRc
but is alsoCopy
- maybe this is okay if we can enforce that there is always a corresponding liveRc
on the VM stack, or maybe we need to drop theCopy
impl. (Makevm::Value
non-Copy
#45)with
and instances work so it doesn't immediately explode if you e.g.instance_create
in awith
body. @Zekka suggested making the arrays copy-on-write, which solves the use-after-free issue; we just need to work out the specifics. (Implementinstance_destroy
based on copy-on-write entity lists #46)The text was updated successfully, but these errors were encountered: