From 62fe495d35f279a24eaef9b61dbe491145c2fb82 Mon Sep 17 00:00:00 2001 From: Aaron Turon Date: Tue, 3 Jun 2014 15:10:44 -0700 Subject: [PATCH] RFC: Add unboxed, abstract return types --- 0000-abstract-return-types.md | 403 ++++++++++++++++++++++++++++++++++ 1 file changed, 403 insertions(+) create mode 100644 0000-abstract-return-types.md diff --git a/0000-abstract-return-types.md b/0000-abstract-return-types.md new file mode 100644 index 00000000000..fd7fc5dd06a --- /dev/null +++ b/0000-abstract-return-types.md @@ -0,0 +1,403 @@ +- Start Date: 2014-06-02 +- RFC PR #: (leave this empty) +- Rust Issue #: (leave this empty) + +# Summary + +Allow functions to return _unboxed abstract types_, written `impl Trait` for +"some hidden type `T` such that `T: Trait`". + +These types specify a trait bound, but hide the concrete type implementing the +bound. Unlike trait objects, but like generics, the actual type of an unboxed +abstract type is statically known and sized, and will thus generate +statically-dispatched code. + +Optionally, allow unboxed abstract type syntax to be used elsewhere (function +arguments, structs) as a shorthand for generics. + +# Motivation + +In today's Rust, you can write a function signature like +````rust +fn consume_iter_static>(iter: I) +fn consume_iter_dynamic(iter: Box>) +```` +In both cases, the function does not depend on the exact type of the argument. +The type is held "abstract", and is assumed only to satisfy a trait bound. +* In the `_static` version using generics, +each use of the function is specialized to a concrete, statically-known type, +giving static dispatch, inline layout, and other performance wins. +* In the `_dynamic` version using trait objects, the concrete argument type is + only known at runtime using a vtable. + +On the other hand, while you can write +````rust +fn produce_iter_dynamic() -> Box> +```` +you _cannot_ write something like +````rust +fn produce_iter_static() -> Iterator +```` +That is, in today's Rust, abstract return types can only be written using trait objects, which +can be a significant performance penalty. This RFC proposes "unboxed abstract +types" as a way of achieving signatures like `produce_iter_static`. Like +generics, unboxed abstract types guarantee static dispatch and inline data +layout. + +Here are some problems that unboxed abstract types solve or mitigate: + +* _Returning unboxed closures_. The ongoing work on unboxed closures expresses + closures using traits. Sugar for closures generates an anonymous type + implementing a closure trait. Without unboxed abstract types, there is no way + to use this sugar while returning the resulting closure unboxed, because there + is no way to write the name of the generated type. + +* _Leaky APIs_. Functions can easily leak implementation details in their return + type, when the API should really only promise a trait bound. For example, a + function returning `Rev>` is revealing exactly how the iterator + is constructed, when the function should only promise that it returns _some_ + type implementing `Iterator`. Using newtypes/structs with private fields + helps, but is extra work. Unboxed abstract types make it as easy to promise only + a trait bound as it is to return a concrete type. + +* _Complex types_. Use of iterators in particular can lead to huge types: + ````rust + Chain>>>, SkipWhile<'a, u16, Map<'a, &u16, u16, slice::Items>>> + ```` + Even when using newtypes to hide the details, the type still has to be written + out, which can be very painful. Unboxed abstract types only require writing the + trait bound. + +* _Documentation_. In today's Rust, reading the documentation for the `Iterator` + trait is needlessly difficult. Many of the methods return new iterators, but + currently each one returns a different type (`Chain`, `Zip`, `Map`, `Filter`, + etc), and it requires drilling down into each of these types to determine what + kind of iterator they produce. + +In short, unboxed abstract types make it easy for a function signature to +promise nothing more than a trait bound, and do not generally require the +function's author to write down the concrete type implementing the bound. + +# Detailed design + +> *Note*: this design borrows from +> +> * https://github.com/mozilla/rust/issues/11455, +> * https://github.com/mozilla/rust/issues/10448 and +> * https://github.com/mozilla/rust/issues/11196. + +The basic idea is to allow code like the following: +````rust +pub fn produce_iter_static() -> impl Iterator { + range(0, 10).rev().map(|x| x * 2).skip(2) +} +```` +where `impl Iterator` should be understood as "some type `T` such that `T: +Iterator`. Notice that the function author does not have to write down any +concrete iterator type, nor does the function's signature reveal those details +to its clients. But the type promises that _there exists_ some concrete type. + +This code is roughly equivalent to +````rust +pub struct Result_produce_iter_static( + iter::Skip>>> +); + +impl Iterator for Result_produce_iter_static { + fn next(&mut self) -> Option { + match *self { + Result_produce_iter_static(ref mut r) => r.next() + } + } +} + +pub fn produce_iter_static() -> Result_produce_iter_static { + Result_produce_iter_static( + range(0, 10).rev().map(|x| x * 2).skip(2) + ) +} +```` + +That is, using an unboxed abstract type is _semantically_ equivalent to +introducing an _anonymous_ newtype where: +* the representation is private +* the newtype implements the given trait bound by delegating to the representation + +An unboxed abstract return type `impl Trait` has several implications for the compiler: + +* _Typechecking the function_: The compiler should check that the inferred actual + return type `T` of the function satisfies the trait bound `Trait`. This is a local check. + +* _Typechecking the clients_: The compiler should treat each use of `impl Trait` + as a fresh unknown type that satisfies the bound `Trait`. Thus, function + clients cannot depend on the concrete return type, only the bound, which + retains local typechecking. + +* _Code generation_: The compiler should record the concrete return type `T` for + the function for the purposes of specialization. Clients that use the return + type should generate the same code as if they has used the concrete type `T`, + despite that typechecking prevents them from relying on anything about `T` + except the trait bound `Trait`. (This can be viewed as a form of + monomorphization for abstract types, which stand for a single type, in contrast + to generics (universal types), which stand for a family of types.) + +So, for example, given the function signatures +````rust +fn produce_iter_static() -> impl Iterator +fn consume_iter_static>(iter: I); +```` +the expression +````rust +consume_iter_static(produce_iter_static()) +```` +should typecheck, and should compile using static dispatch. On the other hand, +````rust +let iter: iter::Skip<_> = produce_iter_static(); +```` +should fail to typecheck, since it relies on the concrete return type of +`produce_iter_static`, which is hidden behind the unboxed abstract return +type. + +The subsections below fill in several details necessary for implementing this +high-level design. + +## Generics and lifetime parameters + +A function returning an unboxed abstract type might also use generic types or +lifetime parameters: + +````rust +impl SomeCollectionType { + fn iter<'a>(&'a self) -> impl Iterator<&'a T> { ... } +} +```` + +In general, the concrete return type may be parameterized by any type or +lifetime variable in scope. These parameters may also be mentioned in the trait +bound for the unboxed abstract type. + +Just as is currently done for trait objects, the typechecker must ensure that +lifetime parameters are not stripped when using an unboxed abstract type. +For example (adapted from @glaebhoerl): +````rust +fn evil>(iter: Iter) -> Box> { + box iter as Box> +} +```` +generates the error "value may contain references; add `'static` bound to +`Iter`". Since unboxed abstract types are the statically-dispatched equivalent +to returning trait objects, the same logic should apply: +````rust +fn evil>(iter: Iter) -> Box> { + box iter as Box> +} +```` +should not be allowed for the same reason. + +## Possible add-on: unboxed abstract types in non-return positions + +_**Note**: this is an optional feature._ + +So far, the RFC has assumed that the new use of `impl` for unboxed abstract types +is only allowed for the return type of a function. In principle, it makes sense +to allow unboxed abstract types anywhere that a type is allowed. + +**Note**: the extensions below are somewhat orthogonal. For example, the + extension to include function arguments does not depend on including `struct` + types. + +### Function arguments + +Instead of writing +````rust +fn extend>(&mut self, iterator: I) +```` +we could write +````rust +fn extend(&mut self, iterator: impl Iterator) +```` +These two signatures are almost equivalent, but the first has an _explicit_ type +argument (which can be provided using `::` syntax) while the second has an +_implicit_ type argument. + +Using unboxed abstract types in arguments makes (simple) static and dynamic +dispatch syntactically closer: +````rust +fn extend_static(&mut self, iterator: impl Iterator) +fn extend_dynamic(&mut self, iterator: &Iterator) +```` + +It may be especially useful for passing unboxed closures as arguments in a +pleasant way: +````rust +fn use_fn>(f: T) +```` +versus +````rust +fn use_fn(f: impl Fn) +```` + +### Nested results + +Rather than only supporting + +````rust +pub fn produce_iter_static() -> impl Iterator +```` + +unboxed abstract types could also be permitted in nested form: + +````rust +pub fn produce_iter_box() -> Box> +pub fn produce_iters() -> (impl Iterator, impl Iterator) +```` + +Each use of `impl` would mark a distinct abstract type, so `produce_iters` +could provide different concrete iterator types for the first and second +components of the tuple. + +### Structs and other compound types + +**Note**: the motivation for this extension is purely consistency — + allowing `impl Trait` wherever a type is allowed. There is no known concrete + use case, which is perhaps a strong argument against including the + feature. (Personally, I think this is probably a bad idea.) + +If we truly want to allow `impl Trait` everywhere, that would include `struct` fields as well. +````rust +struct Foo { + s: impl Set; +} +```` +would be equivalent to +````rust +struct Foo> { + s: T; +} +```` +except that the type argument would be treated as _implicit_, meaning that +you would write +````rust +fn use_foo(f: &Foo) { ... } +```` +and the function `use_foo` would itself implicitly parameterize over the +`Set` implementation. Similarly, +````rust +fn make_foo() -> Foo { ... } +```` +would implicitly treat the `Set` field as an abstract type, which can be +implemented by an arbitrary (but hidden) concrete type. + +## Possible add-on: type equality and `Self` + +_**Note**: this is an optional feature._ + +Although the clients of an unboxed abstract types do not know its concrete type, +they can rely on there being _some_ consistent concrete type. For example, given +a function +````rust +fn collect_to_set>(iter: I) -> impl Set +```` +the code +````rust +let s1 = collect_to_set(iter1); +let s2 = collect_to_set(iter2); +s1.is_subset(s2) +```` +should typecheck as long as `iter1` and `iter2` are iterators over the same +type, since the underlying concrete type implementing `Set` is guaranteed to +be the same. + +More generally, the typechecker could treat each unboxed abstract type as +implicitly parameterized over all of the in-scope type and lifetime variables. +Two uses of the same existential type with the same parameters should be treated +as the same type. In terms of the newtype expansion suggested earlier, this is the +equivalent of having the newtype take all of the in-scope type and lifetime +parameters. + +## Staging + +This RFC can be accepted and/or implemented in stages, in the following order of priority: + +* Unboxed abstract return types, by far the most important use-case +* Unboxed abstract argument types +* Unboxed abstract types in structs/enums/tuples +* Equality/Self for unboxed abstract types + +The first two bullets would affect library API stabilization; the others are nice-to-haves. + +# Drawbacks + +The main downside is complexity. It can already be difficult to understand the +distinction between generics and trait objects, and this RFC proposes another +generics-like mechanism for unboxed abstract return types. This drawback is +discussed in more detail in the following section on Alternatives. + +Other drawbacks are specific to the optional add-ons. In particular, adding +`impl Trait` fields to `struct`s adds a somewhat scary form of implicitness, +since code using the `struct` will be silently monomorphized without any +explicit use of generics. + +# Alternatives + +The proposal here, with all the add-ons, ultimately allows `impl Trait` to be +used wherever a type can be used. This is partly for consistency and partly as +a lighter weight way to write generics. The downside is that we would be adding +yet another distinction to the type system, and quite a bit of implicit +parameterization/monomorphization. + +## A very conservative alternative + +A more targeted approach would be to _just_ tackle return types, and try to make +them look more akin to generics: +````rust +fn produce_iter_static() -> _ : Iterator +```` +This more conservative proposal would be easier to implement, and would add less +complexity to the language, while still addressing the primary use case for the +RFC. + +## A somewhat conservative alternative + +Finally, a midpoint would be to use `impl Trait` but restrict it to function +signatures, not allowing it in `struct` fields. That design: + +1. retains the benefit of syntactically lightweight static dispatch +2. keeps the nice property that, by looking at a function signature alone, it is + possible to tell exactly where monomorphization will happen. + +This design might also make it possible for the implicit type parameters for +`impl Trait` to be provided using `::` syntax. + +# Unresolved questions + +## Multiple trait bounds + +The design should _definitely_ be generalized to support multiple trait bounds +(`impl Trait1 + Trait2`), since it is common for a return type to implement +several traits of interest. (Iterator return types are a good example). This +should not be any harder to implement than the basic design, but the syntax may +need some thought. + +## Naming an abstract return type + +The above proposal for type equality allows multiple uses of the same +existential to be treated as equal types. However, there is no way to name +that _concrete_ type, so it cannot be stored in a `struct` or passed as an +argument. + +**Note**: this is different from using `impl Trait` in a `struct` field, which + allows _any_ type implementing `Trait`. What we cannot do is have a `struct` + field whose type is "whatever concrete type function `foo` returns". + +Thinking again of the signature +````rust +fn collect_to_set>(iter: I) -> impl Set +```` +we could allow naming the concrete result type by a path like +`collect_to_set::::impl`. The only way to get a value of this type is by +calling `collect_to_set`. Thinking in terms of the newtype encoding of unboxed +abstract types, this is just a way to _name_ the concrete newtype used. + +This should be considered as part of any design for associated types, which also +involve paths to types.