Skip to content

Commit

Permalink
Merge #54
Browse files Browse the repository at this point in the history
54: Fix the concurrency semantics for `Gc<T>`. r=ltratt a=jacob-hughes

This makes two major changes to the API:

1. It removes the requirement that `T: Sync` for `Gc<T>`.
2. It makes `Gc<T> : Send + Sync` if `T: Send + Sync`, fixing the ergonomic problems raised in #49.

`Sync`'s purpose is to ensure that two threads can access the data in `T` in a thread-safe way. In other words, it implies that `T` has synchronisation guarantees. Originally, this was added as a constraint on `T` because any finalizer for `T` would run on a separate thread. However, it's now safe to remove this as a constraint because softdevteam/alloy#30 guarantees that a finalizer won't run early. This means that even without synchronized access, a race can't happen, because it's impossible for the finalizer to access `T`'s data while it's still in use by the mutator.

However, even though `Gc<T>` can now implement `Send` -- [thanks to multi-threaded collection support](softdevteam/alloy#31) -- `Gc<T>` still requires that `T: Send`, because `T` could be a type with shared ownership which aliases. This is necessary because off-thread finalization could mutate shared memory without synchronisation. An example using `Rc` makes this clear: 
```rust

struct Inner(Rc<usize>);

fn foo() {
    let rc = Rc::new(123);
    {
        let gc = Gc::new(Inner::new(Rc::clone(rc)));
    } 
    // Assume `gc` is dead here, so it will be finalized in parallel on a separate thread.
    // This means `Rc::drop` gets called which has the potential to race with
    // any `Rc` increment / decrement on the main thread.
    force_gc();
    
    // Might race with gc's finalizer
    bar(Rc::clone(rc));
}

```

Since finalizing any non-`Send` value can cause UB, we still disallow the construction of `Gc<T: !Send>` completely. This is certainly the most conservative approach. There are others:

- Not invoking finalizers for non-`Send` values. This is valid, since finalizers are not guaranteed to run. However, it's not exactly practical, it would mean that any `Gc<Rc<...>>` type structure would _always_ leak.
- Finalize `!Send` values on their mutator thread. This is really dangerous in the general case, because if any locks are held on the shared data by the mutator, this will deadlock (it's basically a variant of the async-signal-safe problem). However, if `T` is `!Send`, deadlocking is unlikely [although not impossible!], and could be an acceptable compromise. It's out of the scope of this PR, but it's something I've been playing around with.

Co-authored-by: Jake Hughes <[email protected]>
  • Loading branch information
bors[bot] and jacob-hughes authored Apr 9, 2021
2 parents 5de2689 + 17cf277 commit e6f7b58
Show file tree
Hide file tree
Showing 5 changed files with 186 additions and 33 deletions.
10 changes: 10 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,16 @@ gc_stats = []
libc = "*"
allocator = { path = "allocator", optional = true }

[dev-dependencies]
lang_tester = "0.3"
tempfile = "3.2"


[[test]]
name = "gc_tests"
path = "gc_tests/run_tests.rs"
harness = false

[build-dependencies]
rerun_except = "0.1"
num_cpus = "1.12"
Expand Down
67 changes: 67 additions & 0 deletions gc_tests/run_tests.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
use std::{env, path::PathBuf, process::Command};

use lang_tester::LangTester;
use tempfile::TempDir;

fn deps_path() -> String {
let mut path = PathBuf::new();
path.push(env::var("CARGO_MANIFEST_DIR").unwrap());
path.push("target");
#[cfg(debug_assertions)]
path.push("debug");
#[cfg(not(debug_assertions))]
path.push("release");
path.push("deps");

path.to_str().unwrap().to_owned()
}

fn main() {
let rustc = env::var("RUSTGC").expect("RUSTGC environment var not specified");
// We grab the rlibs from `target/<debug | release>/` but in order
// for them to exist here, they must have been moved from `deps/`.
// Simply running `cargo test` will not do this, instead, we must
// ensure that `cargo build` has been run before running the tests.

#[cfg(debug_assertions)]
let build_mode = "--debug";
#[cfg(not(debug_assertions))]
let build_mode = "--release";

Command::new("cargo")
.args(&["build", build_mode])
.env("RUSTC", &rustc.as_str())
.output()
.expect("Failed to build libs");

let tempdir = TempDir::new().unwrap();
LangTester::new()
.test_dir("gc_tests/tests")
.test_file_filter(|p| p.extension().unwrap().to_str().unwrap() == "rs")
.test_extract(|s| {
Some(
s.lines()
.take_while(|l| l.starts_with("//"))
.map(|l| &l[2..])
.collect::<Vec<_>>()
.join("\n"),
)
})
.test_cmds(move |p| {
let mut exe = PathBuf::new();
exe.push(&tempdir);
exe.push(p.file_stem().unwrap());

let mut compiler = Command::new(&rustc);
compiler.args(&[
"-L",
deps_path().as_str(),
p.to_str().unwrap(),
"-o",
exe.to_str().unwrap(),
]);

vec![("Compiler", compiler), ("Run-time", Command::new(exe))]
})
.run();
}
50 changes: 50 additions & 0 deletions gc_tests/tests/multithreaded.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
// Run-time:
// status: success
#![feature(rustc_private)]

extern crate libgc;

use std::alloc::GcAllocator;
use std::{thread, time};
use std::sync::atomic::{AtomicBool, Ordering};
use libgc::Gc;

#[global_allocator]
static ALLOCATOR: GcAllocator = GcAllocator;

struct PanicOnDrop(String);

impl Drop for PanicOnDrop {
fn drop(&mut self) {
eprintln!("Finalizer called. Object erroneously collected");
}

}

static mut NO_CHILD_EXISTS: AtomicBool = AtomicBool::new(true);

fn main() {
for _ in 1..10 {
thread::spawn(child);
}

while(unsafe { NO_CHILD_EXISTS.load(Ordering::SeqCst) }) {};

// This should collect no garbage, because the call stacks of each child
// thread should be scanned for roots.
GcAllocator::force_gc();

// If there's a problem, a finalizer will print to stderr. Lets wait an
// appropriate amount of time for this to happen.
thread::sleep(time::Duration::from_millis(10));
}

fn child() {
unsafe { NO_CHILD_EXISTS.store(false, Ordering::SeqCst)};
let gc = Gc::new(String::from("Hello world!"));

// Wait a bit before dying, ensuring that the thread stays alive long enough
// cross the force_gc call.
thread::sleep(time::Duration::from_millis(10));
}

85 changes: 54 additions & 31 deletions src/gc.rs
Original file line number Diff line number Diff line change
Expand Up @@ -38,23 +38,39 @@ pub fn gc_init() {
///
/// `Gc<T>` automatically dereferences to `T` (via the `Deref` trait), so
/// you can call `T`'s methods on a value of type `Gc<T>`.
///
/// `Gc<T>` will implement `Sync` as long as `T` implements `Sync`. `Gc<T>`
/// will always implement `Send` because it requires `T` to implement `Send`.
/// This is because if `T` has a finalizer, it will be run on a seperate thread.
#[derive(PartialEq, Eq)]
pub struct Gc<T: ?Sized + Send + Sync> {
ptr: NonNull<GcBox<T>>,
pub struct Gc<T: ?Sized + Send> {
ptr: GcPointer<T>,
_phantom: PhantomData<T>,
}

impl<T: ?Sized + Unsize<U> + Send + Sync, U: ?Sized + Send + Sync> CoerceUnsized<Gc<U>> for Gc<T> {}
impl<T: ?Sized + Unsize<U> + Send + Sync, U: ?Sized + Send + Sync> DispatchFromDyn<Gc<U>>
for Gc<T>
/// This zero-sized wrapper struct is needed to allow `Gc<T>` to have the same
/// `Send` + `Sync` semantics as `T`. Without it, the inner `NonNull` type would
/// mean that a `Gc` never implements `Send` or `Sync`.
#[derive(PartialEq, Eq)]
struct GcPointer<T: ?Sized>(NonNull<GcBox<T>>);

unsafe impl<T> Send for GcPointer<T> {}
unsafe impl<T> Sync for GcPointer<T> {}

impl<T: ?Sized + Unsize<U> + Send, U: ?Sized + Send> CoerceUnsized<Gc<U>> for Gc<T> {}
impl<T: ?Sized + Unsize<U> + Send, U: ?Sized + Send> DispatchFromDyn<Gc<U>> for Gc<T> {}

impl<T: ?Sized + Unsize<U> + Send, U: ?Sized + Send> CoerceUnsized<GcPointer<U>> for GcPointer<T> {}
impl<T: ?Sized + Unsize<U> + Send, U: ?Sized + Send> DispatchFromDyn<GcPointer<U>>
for GcPointer<T>
{
}

impl<T: Send + Sync> Gc<T> {
impl<T: Send> Gc<T> {
/// Constructs a new `Gc<T>`.
pub fn new(v: T) -> Self {
Gc {
ptr: unsafe { NonNull::new_unchecked(GcBox::new(v)) },
ptr: unsafe { GcPointer(NonNull::new_unchecked(GcBox::new(v))) },
_phantom: PhantomData,
}
}
Expand Down Expand Up @@ -97,20 +113,20 @@ impl<T: Send + Sync> Gc<T> {
}

pub fn unregister_finalizer(&mut self) {
let ptr = self.ptr.as_ptr() as *mut GcBox<T>;
let ptr = self.ptr.0.as_ptr() as *mut GcBox<T>;
unsafe {
GcBox::unregister_finalizer(&mut *ptr);
}
}
}

impl Gc<dyn Any + Send + Sync> {
pub fn downcast<T: Any + Send + Sync>(&self) -> Result<Gc<T>, Gc<dyn Any + Send + Sync>> {
impl Gc<dyn Any + Send> {
pub fn downcast<T: Any + Send>(&self) -> Result<Gc<T>, Gc<dyn Any + Send>> {
if (*self).is::<T>() {
let ptr = self.ptr.cast::<GcBox<T>>();
let ptr = self.ptr.0.cast::<GcBox<T>>();
Ok(Gc::from_inner(ptr))
} else {
Err(Gc::from_inner(self.ptr))
Err(Gc::from_inner(self.ptr.0))
}
}
}
Expand All @@ -125,14 +141,14 @@ pub fn needs_finalizer<T>() -> bool {
std::mem::needs_finalizer::<T>()
}

impl<T: ?Sized + Send + Sync> Gc<T> {
impl<T: ?Sized + Send> Gc<T> {
/// Get a raw pointer to the underlying value `T`.
pub fn into_raw(this: Self) -> *const T {
this.ptr.as_ptr() as *const T
this.ptr.0.as_ptr() as *const T
}

pub fn ptr_eq(this: &Self, other: &Self) -> bool {
this.ptr.as_ptr() == other.ptr.as_ptr()
this.ptr.0.as_ptr() == other.ptr.0.as_ptr()
}

/// Get a `Gc<T>` from a raw pointer.
Expand All @@ -146,43 +162,43 @@ impl<T: ?Sized + Send + Sync> Gc<T> {
/// size and alignment of the originally allocated block.
pub fn from_raw(raw: *const T) -> Gc<T> {
Gc {
ptr: unsafe { NonNull::new_unchecked(raw as *mut GcBox<T>) },
ptr: unsafe { GcPointer(NonNull::new_unchecked(raw as *mut GcBox<T>)) },
_phantom: PhantomData,
}
}

fn from_inner(ptr: NonNull<GcBox<T>>) -> Self {
Self {
ptr,
ptr: GcPointer(ptr),
_phantom: PhantomData,
}
}
}

impl<T: Send + Sync> Gc<MaybeUninit<T>> {
impl<T: Send> Gc<MaybeUninit<T>> {
/// As with `MaybeUninit::assume_init`, it is up to the caller to guarantee
/// that the inner value really is in an initialized state. Calling this
/// when the content is not yet fully initialized causes immediate undefined
/// behaviour.
pub unsafe fn assume_init(self) -> Gc<T> {
let ptr = self.ptr.as_ptr() as *mut GcBox<MaybeUninit<T>>;
let ptr = self.ptr.0.as_ptr() as *mut GcBox<MaybeUninit<T>>;
Gc::from_inner((&mut *ptr).assume_init())
}
}

impl<T: ?Sized + fmt::Display + Send + Sync> fmt::Display for Gc<T> {
impl<T: ?Sized + fmt::Display + Send> fmt::Display for Gc<T> {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
fmt::Display::fmt(&**self, f)
}
}

impl<T: ?Sized + fmt::Debug + Send + Sync> fmt::Debug for Gc<T> {
impl<T: ?Sized + fmt::Debug + Send> fmt::Debug for Gc<T> {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
fmt::Debug::fmt(&**self, f)
}
}

impl<T: ?Sized + Send + Sync> fmt::Pointer for Gc<T> {
impl<T: ?Sized + Send> fmt::Pointer for Gc<T> {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
fmt::Pointer::fmt(&(&**self as *const T), f)
}
Expand Down Expand Up @@ -254,26 +270,33 @@ impl<T> GcBox<MaybeUninit<T>> {
}
}

impl<T: ?Sized + Send + Sync> Deref for Gc<T> {
impl<T: ?Sized + Send> Deref for Gc<T> {
type Target = T;

fn deref(&self) -> &Self::Target {
unsafe { &*(self.ptr.as_ptr() as *const T) }
unsafe { &*(self.ptr.0.as_ptr() as *const T) }
}
}

/// `Copy` and `Clone` are implemented manually because a reference to `Gc<T>`
/// should be copyable regardless of `T`. It differs subtly from `#[derive(Copy,
/// Clone)]` in that the latter only makes `Gc<T>` copyable if `T` is.
impl<T: ?Sized + Send + Sync> Copy for Gc<T> {}
impl<T: ?Sized + Send> Copy for Gc<T> {}

impl<T: ?Sized + Send + Sync> Clone for Gc<T> {
impl<T: ?Sized + Send> Clone for Gc<T> {
fn clone(&self) -> Self {
*self
}
}

impl<T: ?Sized + Hash + Send + Sync> Hash for Gc<T> {
impl<T: ?Sized> Copy for GcPointer<T> {}

impl<T: ?Sized> Clone for GcPointer<T> {
fn clone(&self) -> Self {
*self
}
}
impl<T: ?Sized + Hash + Send> Hash for Gc<T> {
fn hash<H: Hasher>(&self, state: &mut H) {
(**self).hash(state);
}
Expand Down Expand Up @@ -308,23 +331,23 @@ mod test {
struct S2 {
y: u64,
}
trait T: Send + Sync {
trait T: Send {
fn f(self: Gc<Self>) -> u64
where
Self: Send + Sync;
Self: Send;
}
impl T for S1 {
fn f(self: Gc<Self>) -> u64
where
Self: Send + Sync,
Self: Send,
{
self.x
}
}
impl T for S2 {
fn f(self: Gc<Self>) -> u64
where
Self: Send + Sync,
Self: Send,
{
self.y
}
Expand Down
7 changes: 5 additions & 2 deletions src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
#![cfg_attr(feature = "rustgc", feature(gc))]
#![cfg_attr(not(feature = "standalone"), feature(gc))]
#![cfg_attr(not(feature = "standalone"), feature(rustc_private))]
#![feature(core_intrinsics)]
#![feature(allocator_api)]
#![feature(alloc_layout_extra)]
Expand All @@ -23,7 +24,9 @@ pub mod stats;
#[cfg(feature = "standalone")]
pub use allocator::GcAllocator;

#[cfg(not(feature = "standalone"))]
pub use std::alloc::GcAllocator;

pub use gc::Gc;

#[cfg(feature = "standalone")]
pub static ALLOCATOR: GcAllocator = GcAllocator;

0 comments on commit e6f7b58

Please sign in to comment.