From 7a79fe0ea59237f9ad127b1be212612110b3c46d Mon Sep 17 00:00:00 2001 From: Niko Matsakis Date: Mon, 24 Mar 2014 09:44:46 -0400 Subject: [PATCH 1/3] Add first draft of opt-in builtin traits RFC --- active/0000-opt-in-builtin-traits.md | 429 +++++++++++++++++++++++++++ 1 file changed, 429 insertions(+) create mode 100644 active/0000-opt-in-builtin-traits.md diff --git a/active/0000-opt-in-builtin-traits.md b/active/0000-opt-in-builtin-traits.md new file mode 100644 index 00000000000..74a95c1c7e2 --- /dev/null +++ b/active/0000-opt-in-builtin-traits.md @@ -0,0 +1,429 @@ +- Start Date: 2014-03-24 +- RFC PR #: (leave this empty) +- Rust Issue #: (leave this empty) + +# Summary + +- Rather than determining membership in the builtin traits + automatically, use `impl` (and `#\[deriving]`) declarations as with + other traits. +- The compiler will check that for each such `impl` declaration the + type meets certain criteria (i.e., to implement `Send` for a struct + `S`, all fields of `S` must have types which are `Send`). +- To check for membership in a builtin trait, we employ a slightly + modified version of the standard trait matching algorithm. + Modifications are needed because the language cannot express (yet) + the full set of impls we would require. +- Rename `Pod` trait to `Copy`. + +# Motivation + +In today's Rust, there are a number of builtin traits (sometimes +called "kinds"): `Send`, `Share`, and `Pod` (in the future, perhaps +`Sized`, but the details of that differ and will addressed in the DST +RFC). These are expressed as traits, but they are quite unlike other +traits in certain ways. One way is that they do not have any methods; +instead, implementing a trait like `Send` indicates that the type has +certain properties (defined below). The biggest difference, though, is +that these traits are not implemented manually by users. Instead, the +compiler decides automatically whether or not a type implements them +based on the contents of the type. + +This RFC argues to change this system and instead have users manually +implement the builtin traits for new types that they define. +Naturally there would be `#[deriving]` options as well for +convenience. The compiler's rules (e.g., that a sendable value cannot +reach a non-sendable value) would still be enforced, but at the point +where a builtin trait is explicitly implemented, rather than being +automatically deduced. + +There are a couple of reasons to make this change: + +1. **Consistency.** All other traits are opt-in, including very common + traits like `Eq` and `Clone`. It is somewhat surprising that the + builtin traits act differently. +2. **API Stability.** The builtin traits that are implemented by a + type are really part of its public API, but unlike other similar + things they are not declared. This means that seemingly innocent + changes to the definition of a type can easily break downstream + users. For example, imagine a type that changes from POD to non-POD + -- suddenly, all references to instances of that type go from + copies to moves. Similarly, a type that goes from sendable to + non-sendable can no longer be used as a message. By opting in to + being POD (or sendable, etc), library authors make explicit what + properties they expect to maintain, and which they do not. +3. **Pedagogy.** Many users find the distinction between pod types + (which copy) and linear types (which move) to be surprising. Making + pod-ness opt-in would help to ease this confusion. +4. **Safety and correctness.** In the presence of unsafe code, + compiler inference is unsound, and it is unfortunate that users + must remember to "opt out" from inapplicable kinds. There are also + concerns about future compatibility. Even in safe code, it can also + be useful to impose additional usage constriants beyond those + strictly required for type soundness. + +More details about these points are provided after the +`Detailed design` section. + +# Detailed design + +I will first cover the existing builtin traits and define what they +are used for. I will then explain each of the above reasons in more +detail. Finally, I'll give some syntax examples. + +## The builtin traits + +We currently define the following builtin traits: + +- `Send` -- a type that deeply owns all its contents. + (Examples: `int`, `~int`, `Cell`, not `&int` or `Rc`) +- `Pod` -- "plain old data" which can be safely copied via memcpy. + (Examples: `int`, `&int`, not `~int` or `&mut int`) +- `Share` -- a type which is threadsafe when accessed via an `&T` + reference. (Examples: `int`, `~int`, `&int`, `&mut int`, + `Atomic`, not `Cell` or `Rc`) + +Note that `Pod` is a proper subset of `Send`, but `Send` and `Share` +are unrelated: + +- `Cell` is `Send` but not `Share`. +- `&uint` is `Share` but not `Send`. + +## Proposed syntax + +Under this proposal, for a struct or enum to be considered send, +share, or pod, those traits must be explicitly implemented: + + struct Foo { ... } + impl Send for Foo { } + impl Pod for Foo { } + impl Share for Foo { } + +As usual, deriving forms would be available. + +Builtin traits can only be implemented for struct or enum types and +only within the crate in which that struct or enum is defined (see the +section on *Matching and Coherence* below). Whenever a builtin trait is +implemented, the compiler will enforce that all fields or that +struct/enum are of a typed which implements the trait. + + struct Foo<'a> { x: &'a int } + + // ERROR: Cannot implement `Send` because the field `x` has type + // `&'a int` which is not sendable. + impl<'a> Send for Foo<'a> { } + +For generic types, conditional impls are often required to avoid +errors. In the case of `Option`, for example, we must know that the +type `T` implements (e.g.) `Send` before we can implement `Send` for +`Option`: + + enum Option { Some(T), None } + impl Send for Option { } // ERROR: T may not implement `Send` + +Rewriting that code using a conditional impl would be fine: + + enum Option { Some(T), None } + impl Send for Option { } // ERROR: T may not implement `Send` + +(This is of course precisely what `#[deriving(Send)]` would generate.) + +## Naming of Pod + +Part of the proposal is to rename `Pod` to `Copy` so as to better +align the names of the builtin traits (they would not all be verbs). + +## Copy and linearity + +One of the most important aspects of this proposal is that the `Copy` +trait would be something that one "opts in" to. This means that +structs and enums would *move by default* unless their type is +explicitly declared to be `Copy`. So, for example, the following code +would be in error: + + struct Point { x: int, y: int } + ... + let p = Point { x: 1, y: 2 }; + let q = p; // moves p + print(p.x); // ERROR + +To allow that example, one would have to impl `Copy` for `Point`: + + struct Point { x: int, y: int } + impl Copy for Point { } + ... + let p = Point { x: 1, y: 2 }; + let q = p; // copies p, because Point is Pod + print(p.x); // OK + +Effectively this change introduces a three step ladder for types: + +1. If you do nothing, your type is *linear*, meaning that it moves + from place to place and can never be copied in any way. (We need a + better name for that.) +2. If you implement `Clone`, your type is *cloneable*, meaning that it + moves from place to place, but it can be explicitly cloned. This is + suitable for cases where copying is expensive. +3. If you implement `Copy`, your type is *copyable*, meaning that + it is just copied by default without the need for an explicit + clone. This is suitable for small bits of data like ints or + points. + +What is nice about this change is that when a type is defined, the +user makes an *explicit choice* between these three options. + +## Matching and coherence + +In general, determining whether a type implements a builtin trait can +follow the existing trait matching algorithm, but it will have to be +somewhat specialized. The problem is that we are somewhat limited in +the kinds of impls that we can write, so some of the implementations +we would want must be "hard-coded". + +Specifically we are limited around tuples, fixed-length array types, +proc types, closure types, and trait types: + +- *Fixed-length arrays:* A fixed-length array `[T, ..n]` is `Send/Copy/Share` + if `T` is `Send/Copy/Share`, regardless of `n`. (Conceivably, we could + also say that if `n` is `0`, then `[T, ..n]` is `Send/Copy/Share` regardless + of `T`). +- *Tuples*: A tuple `(T_0, ..., T_n)` is `Send/Copy/Share` depending + if, for all `i`, `T_i` is `Send/Copy/Share`. +- *Closures*: A closure type `|T_0, ..., T_n|:K -> T_n+1` is never + `Send` nor `Copy`. It is `Share` iff `K` is `Share`. +- *Procs*: A proc type `proc(T_0, ..., T_n):K -> T_n+1` is + never `Copy`. It is `Send/Share` iff `K` is `Send/Share`. +- *Trait objects*: A trait object type `Trait:K` (assuming DST here ;) is + never `Copy`. It may be `Send/Share` iff `K` is `Send/Share`. + +We cannot currently express the above conditions using impls. We may +at some point in the future grow the ability to express some of them. +For now, though, these "impls" will be hardcoded into the algorithm. + +Otherwise, the complete list of builtin impls is roughly like this +(undoubtedly I am missing a few things): + + trait Send; + trait Share; + trait Copy; // aka Pod + + impl Copy for "scalars like uint, u8, etc" { } + impl Copy for *T { } + impl<'a,T> Copy for &'a T { } + + impl Send for "scalars like uint, u8, etc" { } + impl for *T { } + impl for ~T { } + + impl Share for "scalars like uint, u8, etc" { } + impl for *T { } + impl for ~T { } + impl<'a,T:Share> for &'a T { } + impl<'a,T:Share> for &'a mut T { } // (if this surprises you, see * below) + +Per the usual coherence rules, since we will have the above impls in +`libstd`, and we will have impls for types like tuples and +fixed-length arrays baked in, the only impls that end users are +permitted to write are impls for struct and enum types that they +define themselves. This is simply an extra coherence rule, hard-coded +because some of the impls (e.g., for tuples) are hard-coded. + +(\*) Wait, `&mut T` is `Share`? How is that threadsafe? + +Somewhat surprisingly, `&mut T` is share. Remember, a type `U` is +share if all possible operations on `&U` are threadsafe. In this case, +`U` is `&mut T`, this means we have to consider what operations are +possible on a `& &mut T`. In that case ,the `&mut T` is found in an +aliasable location and hence is immutable (if you can find a counter +example, that's definitely a bug). + +# Implementation plan + +Here is a loose implementation plan that @flaper87 and I worked +out. No doubt things will change along the way. + +1. Create a nicely encapsulated subroutine S to check whether type T + meets bound B For example, to test that some type T is Pod. @eddyb + did something recently you can use as an example, where he added + some code to do vtable matching for the Drop trait from trans. One + catch is that we will definitely want some sort of cache. + +2. Modify the vtable code to handle builtin bounds and add builtin + impls (see below) + - We'll need special code to accommodate the types detailed above + +3. Use the subroutine S in moves.rs to do the "is pod" check. + +4. Same for rustc::middle::kind, except that we should move the "check + bounds on type parameters" into type check. + - Why do this? Because these checks will now be so close to vtable + matching it no longer makes sense to do them in `kind.rs` + +5. Check to make sure that the impls the user provides are safe: + - User-defined impls can only apply to enums or structs + - If implementing a builtin trait T for a struct type S, each + field of S must impl T + - same for enums, but "for each variant, for each argument" essentially + + +# Expanded motivation + +Now that the detailed design is presented, I wanted to expand more on +the motivation. + +## Consistency + +This change would bring the builtin traits more in line with other +common traits, such as `Eq` and `Clone`. On a historical note, this +proposal continues a trend, in that both of those operations used to +be natively implemented by the compiler as well. + +## API Stability + +The set of builtin traits implemented by a type must be considered +part of its public inferface. At present, though, it's quite invisible +and not under user control. If a type is changed from `Pod` to +non-pod, or `Send` to non-send, no error message will result until +client code attempts to use an instance of that type. In general we +have tried to avoid this sort of situation, and instead have each +declaration contain enough information to check it indepenently of its +uses. Issue #12202 describes this same concern, specifically with +respect to stability attributes. + +Making opt-in explicit effectively solves this problem. It is clearly +written out which traits a type is expected to fulfill, and if the +type is changed in such a way as to violate one of these traits, an +error will be reported at the `impl` site (or `#[deriving]` +declaration). + +## Pedagogy + +When users first start with Rust, ownership and ownership transfer is +one of the first things that they must learn. This is made more +confusing by the fact that types are automatically divided into pod +and non-pod without any sort of declaration. It is not necessarily +obvious why a `T` and `~T` value, which are *semantically equivalent*, +behave so differently by default. Makes the pod category something you +opt into means that types will all be linear by default, which can +make teaching and leaning easier. + +## Safety and correctness: unsafe code + +For safe code, the compiler's rules for deciding whether or not a type +is sendable (and so forth) are perfectly sound. However, when unsafe +code is involved, the compiler may draw the wrong conclusion. For such +cases, types must *opt out* of the builtin traits. + +In general, the *opt out* approach seems to be hard to reason about: +many people (including myself) find it easier to think about what +properties a type *has* than what properties it *does not* have, +though clearly the two are logically equivalent in this binary world +we programmer's inhabit. + +More concretely, opt out is dangerous because it means that types with +unsafe methods are generally *wrong by default*. As an example, +consider the definition of the `Cell` type: + + struct Cell { + priv value: T + } + +This is a perfectly ordinary struct, and hence the compiler would +conclude that cells are freezable (if `T` is freezable) and so forth. +However, the *methods* attached to `Cell` use unsafe magic to mutate +`value`, even when the `Cell` is aliased: + + impl Cell { + pub fn set(&self, value: T) { + unsafe { + *cast::transmute_mut(&self.value) = value + } + } + } + +To accommodate this, we currently use *marker types* -- special types +known to the compiler which are considered nonpod and so forth. Therefore, +the full definition of `Cell` is in fact: + + pub struct Cell { + priv value: T, + priv marker1: marker::InvariantType, + priv marker2: marker::NoFreeze, + } + +Note the two markers. The first, `marker1`, is a hint to the variance +engine indicating that the type `Cell` must be invariant with respect +to its type argument. The second, `marker2`, indicates that `Cell` is +non-freeze. This then informs the compiler that the referent of a +`&Cell` can't be considered immutable. The problem here is that, if +you don't know to opt-out, you'll wind up with a type definition that +is unsafe. + +This argument is rather weakened by the continued necessity of a +`marker::InvariantType` marker. This could be read as an argument +towards explicit variance. However, I think that in this particular +case, the better solution is to introduce the `Mut` type described +in #12577 -- the `Mut` type would give us the invariance. + +Using `Mut` brings us back to a world where any type that uses +`Mut` to obtain interior mutability is correct by default, at least +with respect to the builtin kinds. Types like `Atomic` and +`Volatile`, which guarantee data race freedom, would therefore have +to *opt in* to the `Share` kind, and types like `Cell` would simply +do nothing. + +## Safety and correctness: future compatibility + +Another concern about having the compiler automatically infer +membership into builtin bounds is that we may find cause to add new +bounds in the future. In that case, existing Rust code which uses +unsafe methods might be inferred incorrectly, because it would not +know to opt out of those future bounds. Therefore, any future bounds +will *have* to be opt out anyway, so perhaps it is best to be +consistent from the start. + +## Safety and correctness: semantic constraints + +Even if type safety is maintained, some types ought not to be copied +for semantic reasons. An example from the compiler is the +`Datum` type, which is used in code generation to represent +the computed result of an rvalue expression. At present, the type +`Rvalue` implements a (empty) destructor -- the sole purpose of this +destructor is to ensure that datums are not consumed more than once, +because this would likely correspond to a code gen bug, as it would +mean that the result of the expression evaluation is consumed more +than once. Another example might be a newtype'd integer used for +indexing into a thread-local array: such a value ought not to be +sendable. And so forth. Using marker types for these kinds of +situations, or empty destructors, is very awkward. Under this +proposal, users needs merely refrain from implementing the relevant +traits. + +# Alternatives and counterarguments + +The downsides of this proposal are: + +- There is some annotation burden. I had intended to gather statistics + to try and measure this but have not had the time. + +- If a library forgets to implement all the relevant traits for a + type, there is little recourse for users of that library beyond pull + requests to the original repository. This is already true with + traits like `Eq` and `Ord`. However, as SiegeLord noted on IRC, that + you can often work around the absence of `Eq` with a newtype + wrapper, but this is not true if a type fails to implement `Send` or + `Copy`. This danger (forgetting to implement traits) is essentially + the counterbalance to the "forward compatbility" case made above: + where implementing traits by default means types may implement too + much, forcing explicit opt in means types may implement too little. + One way to mitigate this problem would be to have a lint for when an + impl of some kind (etc) would be legal, but isn't implemented, at + least for publicly exported types in library crates. + +What other designs have been considered? What is the impact of not doing this? + +# Unresolved questions + +Do we want some kind of shorthand for common trait combinations? I +originally proposed `Data` but we couldn't settle on what a useful set +of trait combinations would be. This can easily be added later. From 1950e7529d6640f865c32872502c96bcc14c06cc Mon Sep 17 00:00:00 2001 From: Niko Matsakis Date: Tue, 25 Mar 2014 13:03:48 -0400 Subject: [PATCH 2/3] Amend RFC to discuss Unsafe --- active/0000-opt-in-builtin-traits.md | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/active/0000-opt-in-builtin-traits.md b/active/0000-opt-in-builtin-traits.md index 74a95c1c7e2..3da41d1f0de 100644 --- a/active/0000-opt-in-builtin-traits.md +++ b/active/0000-opt-in-builtin-traits.md @@ -12,8 +12,8 @@ `S`, all fields of `S` must have types which are `Send`). - To check for membership in a builtin trait, we employ a slightly modified version of the standard trait matching algorithm. - Modifications are needed because the language cannot express (yet) - the full set of impls we would require. + Modifications are needed because the language cannot express the + full set of impls we would require. - Rename `Pod` trait to `Copy`. # Motivation @@ -103,9 +103,11 @@ As usual, deriving forms would be available. Builtin traits can only be implemented for struct or enum types and only within the crate in which that struct or enum is defined (see the -section on *Matching and Coherence* below). Whenever a builtin trait is -implemented, the compiler will enforce that all fields or that -struct/enum are of a typed which implements the trait. +section on *Matching and Coherence* below). Whenever a builtin trait +is implemented, the compiler will enforce that all fields or that +struct/enum are of a type which implements the trait (or else of +`Unsafe` type, which matches all traits, see *Matching and +Coherence*). struct Foo<'a> { x: &'a int } @@ -237,6 +239,12 @@ possible on a `& &mut T`. In that case ,the `&mut T` is found in an aliasable location and hence is immutable (if you can find a counter example, that's definitely a bug). +Moreover, there is one further exception to the rules. The +`Unsafe` type is *always* considered to implement all builtin +traits, no matter the type `T`. The motivation here is that we want to +be able to permit a type like `Mutex` to be `Share` even if it closes +over data that is not `Share`. + # Implementation plan Here is a loose implementation plan that @flaper87 and I worked @@ -262,10 +270,9 @@ out. No doubt things will change along the way. 5. Check to make sure that the impls the user provides are safe: - User-defined impls can only apply to enums or structs - If implementing a builtin trait T for a struct type S, each - field of S must impl T + field of S must have a type that implements S. - same for enums, but "for each variant, for each argument" essentially - # Expanded motivation Now that the detailed design is presented, I wanted to expand more on From 70964e61cf02a4c155191865886d2acfa78aeb1b Mon Sep 17 00:00:00 2001 From: Niko Matsakis Date: Tue, 25 Mar 2014 17:13:11 -0400 Subject: [PATCH 3/3] Tweak language to be more restrictive for now --- active/0000-opt-in-builtin-traits.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/active/0000-opt-in-builtin-traits.md b/active/0000-opt-in-builtin-traits.md index 3da41d1f0de..446b7381cb6 100644 --- a/active/0000-opt-in-builtin-traits.md +++ b/active/0000-opt-in-builtin-traits.md @@ -240,10 +240,11 @@ aliasable location and hence is immutable (if you can find a counter example, that's definitely a bug). Moreover, there is one further exception to the rules. The -`Unsafe` type is *always* considered to implement all builtin -traits, no matter the type `T`. The motivation here is that we want to -be able to permit a type like `Mutex` to be `Share` even if it closes -over data that is not `Share`. +`Unsafe` type is *always* considered to implement `Share`, no +matter the type `T`. `Send` and `Copy` are implemented if `T` is +`Send` and `Copy`. The motivation here is that we want to be able to +permit a type like `Mutex` to be `Share` even if it closes over data +that is not `Share`. # Implementation plan