- Feature Name:
infer_singleton
- Start Date: 2018-10-13
- RFC PR: _
- Rust Issue: _
Allow value inference of singleton types with _
in expression positions.
While the cases in which this RFC will apply are in the grand scheme
of things not that major, it does provide a reduction in some tedious
and boring boilerplate where trivial values such as PhantomData
need
to be constructed.
While you may think that writing out Empty { _marker: PhantomData }
and similar is more readable because it is more explicit,
we argue that explicitly writing out trivially and unambiguously
inferable expressions instead contributes to line noise that obscures
the important code that carries the intent the author has.
By writing out _
, we can communicate to the reader that the value
is trivial, so don't bother thinking about it.
Pattern syntax work best when the construction syntax is "the same".
For example, we can construct a struct literal with Foo(1, 2)
and then we can match on that exact same case with Foo(1, 2)
.
In pattern matching, we can use the pattern _
to signify "don't care".
However, there doesn't exist an expression equivalent to the _
pattern.
For singleton types, this RFC proposes the equivalent.
While this won't work for non-singleton types, we do gain a modicum of
unification. We argue that such gains in consistency improves learning.
Also note that _
is used in type contexts to mean "please infer" just like
proposed for expressions in this RFC. Furthermore, with const
generics,
situations will occur where const
parameters can be inferred because they are
uniquely determined by some other value or a type. In such situations,
you should be able to write _
to for example infer the size of an array.
This RFC would be consistent with such inference.
-
A Zero-Sized Type (ZST) is a type for which
std::mem::size_of::<TheType>()
evaluates to0
. -
A singleton type or unit type is a type which can only be constructed in one way (modulo side effects) wherefore it carries no information. This is not always the same as a ZST but it often is. One property of a singleton type
S
is that for any typeT
, it is always possible to define a non-diverging functionfn(T) -> S
. This is because all singleton types are terminal objects.
What is proposed in this RFC is to allow all singleton types, that is,
all zero sized types for which all fields are visible in the current scope,
and for which the type is not #[non_exhaustive]
,
to be inferred with _
in expression contexts.
A trivial but uninteresting example is:
fn foo() {
_ // The value () is inferred for the type ().
}
This also works:
fn foo() -> Result<(), u8> {
Ok(_) // We still infer () for _ here, so we get Ok(()).
}
Aggregate types also work:
struct Empty<T> {
_marker: PhantomData<T>,
}
impl<T> Empty<T> {
pub fn new() -> Self { _ }
}
fn pair_empties() -> (Empty<String>, Empty<u8>) { _ }
fn singleton_array() -> [(); 2] { _ }
fn empty_array() -> [String; 0] { _ }
But visibility is taken into account:
mod foo {
pub struct Empty<T> {
_marker: PhantomData<T>,
}
impl<T> Empty<T> {
pub fn new() -> Self { _ } // Ok!
}
}
fn bar() -> Empty<T> {
_
// ^ Error! Cannot infer `Empty { _marker: PhantomData }` because
// the field `_marker` is not visible in this context.
}
And so is #[non_exhaustive]
:
#[non_exhaustive]
struct Foo;
fn bar() -> Foo {
_
// ^ Error! The type `Foo` is `#[non_exhaustive]`.
}
Enums which only have a single inhabited variant with singleton types can also
be constructed with _
:
enum Result<T, E> {
Ok(T),
Err(E),
}
fn always_ok() -> Result<(), !> {
// We have:
enum Result { Ok(()), Err(!) }
// and `Result::Err` is an uninhabited variant.
_ // Becomes: Ok(_) which becomes Ok(())
}
To the expression grammar of the language, we add:
expr : ... | "_" ;
Let:
-
x
,v
, andf
range over valid identifiers. -
Γ
be a well-formed typing environment. -
τ
denote a well-formed expression in contextΓ
. -
σ
denote a well-formed type in contextΓ
. -
const(τ)
be a predicate onτ
, in contextΓ
, holding iffτ
is evaluable at compile time. -
uninhabited(σ)
be a predicate onσ
, in contextΓ
, holding iffσ
is uninhabited. -
non_exhaustive(σ)
be a predicate onσ
, in contextΓ
, holding iffσ
has#[non_exhaustive]
applied to it. -
visible(ρ)
denote if the projectionρ
is visible inΓ
.
Let singleton(σ)
be a predicate on σ
in context Γ
which holds iff
σ
is a singleton type.
It is defined inductively like so:
-
Whatever type
σ
is,PhantomData<σ>
is alwayssingleton
. For example,PhantomData<String>
issingleton
even thoughString
is not.----------------------------- SingPhantomData Γ ⊢ singleton(PhantomData<σ>)
-
A tuple is
singleton
iff all the types it is made up of are alsosingleton
. It follows from this rule that the zero-tuple()
issingleton
.Γ ⊢ singleton(σ0), ..., Γ ⊢ singleton(σn) ----------------------------------------- SingTuple Γ ⊢ singleton( (σ0, ..., σn) )
-
An array is
singleton
if the element type issingleton
.Γ ⊢ singleton(σ) Γ ⊢ τ : usize Γ ⊢ const(τ) --------------------- SingArray Γ ⊢ singleton([σ; τ])
-
An array of size
0
is alwayssingleton
even if the element type isn't.Γ ⊢ τ : usize Γ ⊢ const(τ) Γ ⊢ τ ⟶ 0usize --------------------- SingArrayZero Γ ⊢ singleton([σ; τ])
-
A struct of form
struct x { f0: σ0, ..., fn: σn }
issingleton
iff#[non_exhaustive]
does not apply and all fields are visible andsingleton
.Tuple structs are treated as named structs but with integer fields instead of normal identifiers.
Structs of form
struct x;
are treated the same asstruct x {}
.Γ ⊢ σ = struct x<...> { f0 : σ0, ..., fn : σn } Γ ⊢ non_exhaustive(σ) = ⊥ Γ ⊢ visible(x.f0), ..., Γ ⊢ visible(x.fn) Γ ⊢ singleton(σ0), ..., Γ ⊢ singleton(σn) ----------------------------------------------- SingStruct Γ ⊢ singleton(σ)
-
An enum is
singleton
iff:#[non_exhaustive]
does not apply, it has a variant where all field types aresingleton
, and all other variants are uninhabited.Tuple and unit variants are, as with structs, treated the same way as if they had named fields.
Γ ⊢ σ = enum x<...> { v0 { f00: σ00, ..., f0m: σ0m }, ..., vn { fn0: σn0, ..., fnm: σnm }, } Γ ⊢ non_exhaustive(σ) = ⊥ Γ ⊢ ∃ i. [ singleton(σi0), ..., singleton(σin) ∧ ∀ j ≠ i. uninhabited( (σj0, ..., σjn) ) ] -------------------------------------------------- SingEnum Γ ⊢ singleton(σ)
Visibility does not apply because
pub
is forced for all fields and variants.Note that the rules wrt. inhabitedness entail that
Result<(), !>
issingleton
because it has one inhabited variantOk(..)
and all the field types of that aresingleton
(because()
is). -
All other types are not
singleton
.
We introduce the term typing rule:
Γ ⊢ _ ⇝ σ
Γ ⊢ singleton(σ)
---------------- TmInferSingleton
Γ ⊢ _ : σ
This means that if _
infers to the type σ
, which is a singleton
type,
then _
is a well-formed expression typed at σ
in the context Γ
.
If the type σ
is not singleton
in Γ
then a type error will be raised.
This entails that if we don't know enough about the type, e.g. if we have
(T,)
where singleton(T)
cannot be derived because it is either a
type unification variable or an explicitly quantified type variable,
then we cannot write (_,)
or _
.
We define an elaboration of _
into a Rust with explicitly given values with
the relation ⇝iv
. The elaboration is given in small-step style.
To fully elaborate into a Rust without _
at the expression level,
apply the transformation below everywhere _
recursively until a fixed point.
-
When
_
is inferred to be of typePhantomData<T>
for some typeT
then it elaborates toPhantomData
.Γ ⊢ σ1 = PhantomData<σ0> Γ ⊢ _ : σ1 ------------------------ InferValPhantomData Γ ⊢ _ ⇝iv PhantomData
-
When
_
is inferred to be of a tuple type which issingleton
, then it elaborates to(_, ..., _)
.Γ ⊢ _ : (σ0, ... σn) Γ ⊢ singleton(σ0), ..., Γ ⊢ singleton(σn) ----------------------------------------- InferValTuple Γ ⊢ _ ⇝iv (_, ..., _)
-
When
_
is inferred to be of an array type which issingleton
, then it elaborates to[_, array_size]
.Γ ⊢ τ : usize Γ ⊢ const(τ) Γ ⊢ _ : [σ; τ] Γ ⊢ singleton(σ) ---------------- InferValArray Γ ⊢ _ ⇝iv [_; τ]
-
When
_
is inferred to be of an array of size0
it elaborates to[]
.Γ ⊢ τ : usize Γ ⊢ const(τ) Γ ⊢ τ ⟶ 0usize Γ ⊢ _ : [σ; τ] ---------------- InferValArrayZero Γ ⊢ _ ⇝iv []
Rules 3 and 4 are confluent because
[]
and[_; 0]
are equivalent. -
When
_
is inferred to a struct, it elaborates tox { f0: _, ..., fn: _ }
. As in the section onsingleton
s, we treat tuple and unit structs as if they had named fields.Γ ⊢ σ = struct x<...> { f0 : σ0, ..., fn : σn } Γ ⊢ non_exhaustive(σ) = ⊥ Γ ⊢ visible(x.f0), ..., visible(x.fn) Γ ⊢ singleton(σ0), ..., singleton(σn) Γ ⊢ _ : σ ----------------------------------------------- InferValStruct Γ ⊢ _ ⇝iv x { f0: _, ..., fn: _ }
-
When
_
is inferred to an enum with asingleton
variant and where all other variants are uninhabited, then_
elaborates tox::vi { fi0: _, ..., fin: _ }
. Here,i
denotes the solesingleton
variant and wherefi0, ... fin
denote the fields of said variant.As before, we treat tuple and unit variants as if they had named fields.
Γ ⊢ σ = enum x<...> { v0 { f00: σ00, ..., f0m: σ0m }, ..., vn { fn0: σn0, ..., fnm: σnm }, } Γ ⊢ non_exhaustive(σ) = ⊥ Γ ⊢ ∃ i. [ singleton(σi0), ..., singleton(σin) ∧ ∀ j ≠ i. uninhabited( (σj0, ..., σjn) ) ] Γ ⊢ _ : σ ----------------------------------------------- InferValEnum Γ ⊢ _ ⇝iv x::vi { fi0: _, ..., fin: _ }
-
Otherwise there's nothing to elaborate.
Elaboration is guaranteed to terminate. We can see this by considering the size
of types that additional _
expressions left by elaboration are inferred to.
Eventually, all _
left will infer to a base-case type which are not aggregates.
This proposal has no effect on the dynamic semantics of Rust since elaboration
eliminates all uses of _
to explicitly given values.
Depending on your point of view, this might:
-
Increase language complexity (though a rather simple mechanism is used that is similar to how
_
is used in pattern matching and type contexts and which will also need to be allowed for value inference with const generics). -
Reduce readability. We give rationale for why we think this isn't the case in the motivation.
Claims around reduced readability go along the same lines as they do for inference in general (including inferring types). Annotating explicitly that a binding or expression is of a certain type can help readability, and yet we do have type inference. Arguably, writing out types is more helpful than explicitly writing out expressions for singleton values is. By replacing such expressions with
_
you know syntactically that it is singleton; which you wouldn't know fromFoo::Bar
.
We already use _
in type and pattern matching contexts to denote
"please infer" or its dual "I don't care". It therefore makes sense to use
the same concrete syntax for inferring values as used for types to make the
language minimally additionally complex. By doing so, we improve or don't
make the learning experience notably worse.
The semantics and where _
can be used are fairly conservative.
Singleton types are those where the user explicitly hasn't declared that they
might add additional fields or variants and where there is unambiguously one
value that inhabits the given type. We infer only those types because we don't
want to be opinionated and make arbitrary choices for the user.
In the dependently typed language Agda, which has typed equality, there are eta rules which allow you to for example write:
record A : Set where
constructor B
foo : A
foo = _
This would be the semantic equivalent of writing:
enum A { B }
fn main() { // Not necessary in Agda but it is in Rust.
let foo: A = _;
}
in Rust which this RFC would allow.
Meanwhile, Coq and Idris, which are also dependently typed languages with untyped equality you cannot get away with writing that. However, in Idris, you can still employ proof search to achieve the same thing:
infer : {a : Type} -> {auto x : a} -> a
infer {x} = x;
data A = B
foo : A
foo = infer
The proposed mechanism in this RFC is similar to a proof search.
-
Should
&'a T
wheresingleton(T)
be inferable with_
?Technically, these are not singleton types because the address is observable. However, the observability of the address is often uninteresting for most use cases. This also applies to
&'a mut T
.
In cases where you would write Foo(42, _, _, _)
you could gain in
ergonomics by allowing all the _
s to be inferred with Foo(42, ...)
.
As we don't know how frequently this would occur, we punt on this for the
time being. One conjecture is that having many singleton typed fields is
somewhat unusual and that one or two would be more common.
The expression form _
could be made more useful, depending on your viewpoint,
by allowing it to desugar to T::default()
in cases where T
is not a
singleton
type. A drawback to this would be that we can no longer be sure
that the meaning of _
is always trivial and so less information is given
to the reader by the expression form _
. This possibly interacts with ...
per the notes in the previous section.
If we would add some form of GADTs, either in full or parts of it,
there would likely be more cases in which types should be singleton
.
For example, if we have:
enum Foo<T> {
Bar,
Baz(T) where T: Copy
}
Then singleton(Foo<String>)
should presumably hold because
Foo::Baz(_)
cannot be constructed due to String: Copy
not holding.
Thus, the only inhabited variant would be Foo::Bar
and that one is singleton
.