Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an expression for direct access to an enum's discriminant #3607

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

scottmcm
Copy link
Member

@scottmcm scottmcm commented Apr 7, 2024

Third-party derive macros want to be able to use the same direct discriminant access to the enum for which they're implementing a trait that the core derive macro does, which seems entirely reasonable.

To allow that without imposing a new category of major breaking changes on library authors, this proposes a safe built-in way for getting the discriminant as a primitive integer -- not an opaque mem::Discriminant -- for anything on which you can see private fields.

Rendered

@VitWW
Copy link

VitWW commented Apr 7, 2024

As alternative we could have not magical field b.enum#discriminant, but magical "type"/"trait" enum: enum::discriminant(&b)

let b = Enum::Tuple(true);
assert_eq!(enum::discriminant(&b), 13);

```rust
let d = foo().bar.enum#discriminant;
```
already gets highlighting on the `enum` in my editor without needing any updates.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the mis-highligting of enum#discriminant on GitHub treating it as if 3 tokens makes the code pretty hard to parse on human eye though 🤣

As this expression is an *r-value*, not a *place*, `&foo.enum#discriminant` returns
a reference to a temporary, aka is the same as `&{foo.enum#discriminant}`.
It does *not* return a reference to the memory in which the discriminant is
stored -- not even for types that do store the discriminant directly.
Copy link
Member

@RalfJung RalfJung Apr 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of making this an rvalue, I think we should make it a "read-only move-only place" -- a place you can read from but not write to or create a pointer to. That would be more forward-compatible with actually making this a move-only place once these exist. The difference is that with a move-only place I would think &foo.enum#discriminant is an error rather than creating a temporary.

let m = &mut foo.enum#discriminant; *m = 5; would be accepted under your proposal, but probably not behave the way the user expects. If they want the copy they can always write let m = &mut {foo.enum#discriminant}; (and the error message can indicate that).

There's also subtle differences between place and value expressions wrt whether let _ = (*ptr).enum#discriminant performs a load or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the advantage of a "read-only move-only place" over a (r-)value expression?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've spelled this out, I think?

  1. let m = &mut foo.enum#discriminant; *m = 5; would be accepted under your proposal, but probably not behave the way the user expects.
  2. That would be more forward-compatible with actually making this a move-only place once these exist.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a benefit of special method .enum#discriminant() over a special field .enum#discriminant is that the reference footgun noted above is avoided without introducing any new features. Users already know that method calls produce temporary values.

@RalfJung
Copy link
Member

RalfJung commented Apr 7, 2024

As alternative we could have not magical field b.enum#discriminant, but magical "type"/"trait" enum: enum::discriminant(&b)

let b = Enum::Tuple(true);
assert_eq!(enum::discriminant(&b), 13);

Trait impls are global, so this fails to achieve the goal of this RFC to restrict access to the discriminant to the crate that declared the enum.

Maybe one day it could be allowed, but for now this RFC sticks only things that
can be allowed in safe code without worries.

## Couldn't this be a magic macro?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your mention of a discriminant! macro reminded me about the possibility of allowing obtaining the discriminant of an enum variant “type” instead of a value, similar to how offset_of! also only requires the type and not a specific value (so that you don’t need to worry about initialising a value first).

However, giving access to variant discriminants based on their type is a different use case than when you already have a value. Perhaps whatever syntax is chosen in the end could support both (with the initial self visibility restriction)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think that getting the numeric discriminant, or a mem::Discriminant, without needing an instance would be a good feature. But I think it's separable from this RFC, because the motivation listed here doesn't need it, and the goal of this RFC to be particularly focused to hopefully avoid the "well nothing else is happening so let's just stabilize what exists on nightly" trap.

@kennytm
Copy link
Member

kennytm commented Apr 7, 2024

It seems the primary reason of this RFC is that .enum#discriminant is pub(in self) so that, as mentioned in rust-lang/rust#106418, it allows author of the enum to reorder any variants without breaking change.

But according to #1105, having a > b in 1.1.0 and a < b in 1.2.0 is not considered a necessary condition of breakage, unless the variant order is explicitly documented:

In general, APIs are expected to provide explicit contracts for their behavior via documentation, and behavior that is not part of this contract is permitted to change in minor revisions.

So I don't see much advantage of this RFC over e.g. allow users to use std::intrinsics::discrimant_value() (e.g. via std::mem::discriminant(&a).value()).


Sure, it could, like `offset_of!`.

I don't think `enum_discriminant!(foo)` is really better than `foo.enum#discriminant`, though.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I do see the argument, I do think there is a point in favour of discriminant_of! - the spending of weirdness budget.
Magic macros are nothing new in rust at this point (addr_of!, offset_of!), and in theory, both of those could have been implemented with magic fields (or even bespoke syntax, eg. &raw).
I think the RFC needs a strong argument on why this weirdness budget should be spent on this feature. Implementing custom derives is presumably a lot more rare than working with addr_of! in pointer handling code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another benefit of magic macros: you can easily get the discriminant of a variant without having to have/create a value:

enum E {
    A(Arc<Mutex<String>>) = 3,
    B,
}

const _: () = assert!(discriminant_of!(E::A) == 3);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@programmerjake such magic macro cannot exist in this form, it will not able to disambiguate the follow case

#[repr(u8)]
enum F {
    A { a: u8 } = 1,
    B = 2,
}

impl F {
    const A: Self = Self::B;
}

const _: () = assert!(discriminant_of!(F::A) == 2);

since the primary motivation of .enum#discriminant is to support #[derive(PartialOrd)], the argument to that macro is usually an expression, meaning F::A will be referring to that associated const

Yes you could make a separate macro to only handle discriminant_of_variant!(F::A) == 1, but it will not be able to solve the problems this RFC is handling.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I would prefer custom syntax for new fundamental concepts, rather than magic macros. Any kind of macros makes the code harder to analyze, both automatically and for humans. One needs to ensure that the magic macro wasn't shadowed by a non-magic user macro (which can be quite nontrivial, given the textual scoping possible with macro_rules!), and to remember that some macros actually act as built-in syntax. Conversely, special syntax cannot be overridden, and anyone who can parse it likely can remember that it has special semantics.

On the other hand, magic syntax doesn't have a natural place to attach documentation, so it may be less discoverable and harder to learn.

@ehuss ehuss added the T-lang Relevant to the language team, which will review and decide on the RFC. label Apr 7, 2024
Should the enum's declared `repr` not be the type you actually want, you can always
use `.enum#discriminant` and *then* `as` cast it -- or hopefully `.into()` or
something else with clearer intent -- into the type you need.

Copy link
Member

@Mark-Simulacrum Mark-Simulacrum Apr 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a proposed alternative, something like this (some formatting etc would need to be corrected):

Suggested change
## Private method on Discriminant<T>
We could instead have the compiler generate something like this for any enum declaration:
```rust
enum Bar {}
// generated:
impl Discriminant<Bar> {
fn discriminant(&self) -> <Bar as DiscriminantKind>::Discriminant { ... }
}
\```
This would function equivalently to a `pub(in self)` function defined by the user, callable only within the module. If we want to avoid a coherence-break of the impl, this could be implemented as synthesizing an unnameable trait that's only visible in the module:
```rust
trait BarDiscriminantAccessor { fn discriminant(&self) -> <Bar as DiscriminantKind>::Discriminant; }
impl BarDiscriminantAccessor for Discriminant<Bar> {
fn discriminant(&self) -> <Bar as DiscriminantKind>::Discriminant { ... }
}
\```
This solution provides the same visibility/privacy as the solution proposed in the main RFC, but avoids introducing any new syntax or even particularly new concepts to the language (just auto-generated trait impls, but that's nothing a derive macro couldn't do on stable today).

Copy link
Member Author

@scottmcm scottmcm Apr 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: I misread the suggestion, and this is basically all irrelevant.

The problem with generating an inherent impl is that it's a hard break for any enum that's defined an inherent method called discriminant already.

A voldemort trait is an interesting idea, but I do kinda worry about the impact of that many traits existing in terms of how it'd impact things like coherence checks. Also if it's unnamable then there'd be no way to UFCS it, so you couldn't wrap it in a public inherent method called discriminant since trying to call it would just get the inherent one again.

Basically the same kinds of reasons that led me to enum#ing it for the field too. I guess it could be an auto-generated function named enum#discriminant, rather than the field? It's just not obvious to me if that's materially easier or "fewer concepts".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the impl is not on the enum. It's on Discriminant. So I don't think there's any compatibility concern or naming problem - you'd implement the public method on your enum.

It feels pretty unlikely that people are implementing traits on Discriminant today since you can't actually do much useful in it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks Mark! I'd missed that detail.

@scottmcm
Copy link
Member Author

scottmcm commented Apr 9, 2024

So I don't see much advantage of this RFC over e.g. allow users to use std::intrinsics::discrimant_value() (e.g. via std::mem::discriminant(&a).value()).

Well, changing types is still a major breaking change, right? So if this returns isize for enums without reprs, then changing to a better repr type would still be a major breaking change, which it isn't today since that's not exposed. (Other than in layout, but layout is explicitly minor.)

And TBH I don't think that "the actual ordering isn't really a semver property unless written out in docs" is realistic, no matter what it says in that RFC. APIs like https://doc.rust-lang.org/std/collections/struct.BTreeMap.html#method.range mean that entirely reasonable things like "I want to get first/everything/last of a particular variant from this map" are forced to take a dependency on the order of things.

(For example, if you have #[derive(...)] enum Foo { A(i32), B(String), C(i32) }, getting the last B from the BTreeMap means .range(..C(0)).next_back(), and there's no efficient way to write that without depending on the Ord staying the same.)

The RFC also says

This policy will likely require some revision over time, to become more explicit and perhaps lay out some best practices.

and I think this is a care where it's de facto different now.

For example, we don't even document that Less < Equal for cmp::Ordering -- sure, it has explicit variants, but if changing those has been a "minor" change this whole time, then you can't rely on those in the rustdoc either.

And we didn't document None < Some(_) until https://doc.rust-lang.org/1.56.0/std/option/index.html#comparison-operators -- that's roughly 6 years after 1.0.

So if even the standard library hasn't been documenting the orders for its enums, even for things where people have clearly been relying on them, I don't think it's fair to say people are wrong for expecting it not to change.

To me that note in 1105 is more about things like how the exact bytes from a compression algorithm, for example, shouldn't be considered a semver guarantee, since the actual postcondition is that the bytes are decompressible back to the original data. But with an output of bool, it's really hard for me to say that the exact result isn't the semver-relevant post-condition for PartialOrd::lt, say.

And of course there's things like

let foo: [u8; char::UNICODE_VERSION.0 as _] = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14];

where obviously you deserve the breakage you'll get (though for that one it's particularly true since it's documented as being able to change).


As a specific example where being at least crate-private is important to leave options open:

Today, None is 0 but Ok is 0, and that means that ControlFlow can't match both, which means that the ? desugaring can't be a NOP for both, only for one of them.

It would be nice to change the discriminants on one of them (probably Result) to be the other way around to simplify that codegen, but I think that's just far more practical without introducing a way for people to depend on the currently-invisible values.

@kennytm
Copy link
Member

kennytm commented Apr 9, 2024

@scottmcm

Well, changing types is still a major breaking change, right?

I'm pretty sure changing #[repr] is a breaking change, with or without discriminant_value. At the very least it makes the following code break if you change from #[repr(u8)] to #[repr(u16)].

#[repr(u8)]
enum Enum {
    A = 1,
}
let _: u8 = unsafe { std::mem::transmute(Enum::A) };

Now, #1105 did say "not all breaking changes are major"1, and as you have said "layout is explicitly minor"2. But if we accept breaking transmute I don't see why not accepting breaking PartialOrd of the discriminant values (not even the enum values themselves).

(For example, if you have #[derive(...)] enum Foo { A(i32), B(String), C(i32) }

If the enum explicitly derived PartialOrd, it would be the enum author's responsibility to not change the variant order because they are going to change Foo::A(1) < Foo::C(2), let alone discriminant_value(&Foo::A(1)) < discriminant_value(&Foo::C(2)).

Let's focus on the case where PartialOrd is not derived on the enum. I think everyone in rust-lang/rust#106418 agreed that if E derived PartialOrd there is no problem to impl Ord for Discriminant<E>.

Today, None is 0 but Ok is 0, and that means that ControlFlow can't match both, which means that the ? desugaring can't be a NOP for both, only for one of them.

It would be nice to change the discriminants on one of them (probably Result) to be the other way around to simplify that codegen, but I think that's just far more practical without introducing a way for people to depend on the currently-invisible values.

Result already derived PartialOrd, Ok(3) < Err(3), as mentioned in #3058. Yes you could change the implementation to not use derive to "stabilize" the order without relying on the discriminant value.

enum Result2<T, E> {
    Err(E),
    Ok(T),
}

impl<T: PartialOrd, E: PartialOrd> PartialOrd for Result2<T, E> {
    fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
        match (self, other) {
            (Self::Ok(a), Self::Ok(b)) => a.partial_cmp(b),
            (Self::Err(a), Self::Err(b)) => a.partial_cmp(b),
            _ => discriminant_value(other).partial_cmp(&discriminant_value(self)), // reverse order
        }
    }
}

Then reordering Ok / Err is just going to break those people relying on discriminant_value(&Ok(3)) == 0 && discriminant_value(&Err(3)) == 1 which I'd say let them be broken — basically the stance in rust-lang/rust#106418 (comment).


Anyway I'd expect

  1. If the enum is #[repr(Rust)]3, then third-party crates must not rely on specific value of the discriminants (other than being distinct), because the enum author is free to reorder the variants
  2. If the enum is #[repr(inttype)]4, then the discriminant values are guaranteed stable, because the Rust spec guaranteed the pointer-casting method extract that inttype tag. The enum author should not change the discriminant values nor representation type between minor versions. If they reorder the variants, explicit discriminants should be assigned to maintain stability.
  3. Not sure about #[repr(C)], treat it as the same as #[repr(C, c_int)] here?

Footnotes

  1. heck changing the result of any const fn-evaluable expression is breaking

  2. though nowhere in RFC: Policy on semver and API evolution #1105 I could find the word "layout"

  3. note that Result, Option, ControlFlow are all plain #[repr(Rust)] enums.

  4. note that cmp::Ordering is #[repr(i8)]

@scottmcm
Copy link
Member Author

scottmcm commented Apr 9, 2024

2. though nowhere in 1105 I could find the word "layout"

Well, it's written in other places, like https://doc.rust-lang.org/std/mem/fn.size_of.html's

In general, the size of a type is not stable across compilations, but specific types such as primitives are.

for repr(Rust) types, or how if you ask rustdoc to include layout information it says

Note: Most layout information is completely unstable and may even differ between compilations.

But I guess you're talking about semver on it, but it's not surprising that 1105 doesn't talk about things like repr(transparent) that stabilized 3 years later.

But -- less authoritatively -- rustdoc has been doing things like rust-lang/rust#115439 that also suggest some level of consensus that non-Rust reprs aren't necessarily semver guarantees either.


I suppose mostly I'd rather have things checkable by tools where possible, and everything that's "minor" is a thing that the tool would like to check but we decided to make it so it shouldn't.

And it annoys me when things that are otherwise entirely nominal in Rust (at the language level) today start to care about order, as this would do to enum variants with fields. Things being breaking because the author of the library used the library in a particular way is fine, but language features forcing breaking changes -- even if de jure minor -- feels bad to me when we don't need to.

To me, "minor" is for "well, we wrote ourself into a corner and the only way to allow really any changes at all is to make this de jure allowed even though it's definitely technically breaking", especially when there's an option for the consumer to write code in a way that's resilient to minor breakage. I can't justify deeming this minor in the way that I easily can for the ones that are "well otherwise any addition at all is breaking", like exists for * imports or method resolution.

But those are larger conversations than just this RFC, I suppose. Really the core problem is that there's no way to say which things you're doing are implementation details and which aren't. After all, there's no code marker for the difference between "my type is using repr(C) because I needed it for some internal pointer tricks" and "my type is using repr(C) because it's fixed for this OS FFI and cannot change ever" :/

@kennytm
Copy link
Member

kennytm commented Apr 10, 2024

Well, it's written in other places, like https://doc.rust-lang.org/std/mem/fn.size_of.html's

Thanks. I hope there is a centralized place to look these up 😄 (maybe obi1kenobi/cargo-semver-checks#5?)

But -- less authoritatively -- rustdoc has been doing things like rust-lang/rust#115439 that also suggest some level of consensus that non-Rust reprs aren't necessarily semver guarantees either.

Well I wouldn't count #[repr(transparent)] as non-Rust 😅; besides 115439 only hides the attribute when the inner type is private.

IMO if the #[repr] appears in the rustdoc (like https://doc.rust-lang.org/core/cmp/enum.Ordering.html), it is part of the API contract and changing it should be breaking. That said, std::cmp::Ordering only gained #[repr(i8)] at 1.57 by rust-lang/rust#895071, and the values -1, 0, 1 are shown in the doc only since 1.75 due to a rustdoc improvement rust-lang/rust#116142.

That means at least until 1.57 for non-ABI purpose rust-lang/rust did treat #[repr] change as a minor change2, and until 1.75 breakage due to changed discriminant value was also not obvious.

(If rust-lang/rust#86772 ever got implemented then <E as AsRepr>::Repr would be another source of breakage)

Footnotes

  1. technically not breaking since -1..=1 is within the i8 range it is layout-compatible with the previous #[repr(Rust)]` form

  2. cargo-semver-checks v0.6+ do consider them major breaking change though: Check for #[repr(i*)]/#[repr(u*)] semver violations. obi1kenobi/cargo-semver-checks#29

@RalfJung
Copy link
Member

There are extensive docs on what is and is not semver breaking at https://doc.rust-lang.org/cargo/reference/semver.html.

@kennytm
Copy link
Member

kennytm commented Apr 10, 2024

I'd also like to note that by exploiting the implementation details of Hash it is already possible to extract the discriminant value in safe and stable Rust (without the repr type at compile-time though). Of course we could argue that Discriminant::hash() is not guaranteed to simply call hasher.write_isize().

… Additionally the data passed by most standard library types should not be considered stable between compiler versions.

This means tests shouldn’t probe hard-coded hash values or data fed to a Hasher

use std::mem::Discriminant;

#[derive(PartialEq, Eq, Debug)]
#[non_exhaustive]
pub enum DiscriminantValue {
    Usize(usize),
    Isize(isize),
    U8(u8),
    I8(i8),
    U16(u16),
    I16(i16),
    U32(u32),
    I32(i32),
    U64(u64),
    I64(i64),
    U128(u128),
    I128(i128),
}

impl<E> From<Discriminant<E>> for DiscriminantValue {
    fn from(v: Discriminant<E>) -> Self {
        use std::hash::{Hash, Hasher};
    
        struct DiscriminantValueExtractor(DiscriminantValue);
        
        impl Hasher for DiscriminantValueExtractor {
            fn write(&mut self, _: &[u8]) {
                unreachable!("don't use this as an actual Hasher ;)");
            }
            fn finish(&self) -> u64 {
                unreachable!("don't use this as an actual Hasher ;)")
            }
            
            fn write_u8(&mut self, i: u8) { self.0 = DiscriminantValue::U8(i); }
            fn write_u16(&mut self, i: u16) { self.0 = DiscriminantValue::U16(i); }
            fn write_u32(&mut self, i: u32) { self.0 = DiscriminantValue::U32(i); }
            fn write_u64(&mut self, i: u64) { self.0 = DiscriminantValue::U64(i); }
            fn write_u128(&mut self, i: u128) { self.0 = DiscriminantValue::U128(i); }
            fn write_usize(&mut self, i: usize) { self.0 = DiscriminantValue::Usize(i); }
            fn write_i8(&mut self, i: i8) { self.0 = DiscriminantValue::I8(i); }
            fn write_i16(&mut self, i: i16) { self.0 = DiscriminantValue::I16(i); }
            fn write_i32(&mut self, i: i32) { self.0 = DiscriminantValue::I32(i); }
            fn write_i64(&mut self, i: i64) { self.0 = DiscriminantValue::I64(i); }
            fn write_i128(&mut self, i: i128) { self.0 = DiscriminantValue::I128(i); }
            fn write_isize(&mut self, i: isize) { self.0 = DiscriminantValue::Isize(i); }
        }
        
        let mut extractor = DiscriminantValueExtractor(DiscriminantValue::Isize(0));
        v.hash(&mut extractor);
        extractor.0
    }
}

//------------------------------

enum Option1<T> {
    None,
    Some(T),
}

enum Option2<T> {
    Some(T),
    None,
}

fn main() {
    use std::mem::discriminant;

    let v1 = discriminant(&Option1::Some(&4));
    let v2 = discriminant(&Option2::Some(&5));
    
    let dv1 = DiscriminantValue::from(v1);
    let dv2 = DiscriminantValue::from(v2);
    
    assert_eq!(dv1, DiscriminantValue::Isize(1));
    assert_eq!(dv2, DiscriminantValue::Isize(0));
}

@RalfJung
Copy link
Member

RalfJung commented Apr 10, 2024

IMO if the #[repr] appears in the rustdoc (like https://doc.rust-lang.org/core/cmp/enum.Ordering.html), it is part of the API contract and changing it should be breaking. That said, std::cmp::Ordering only gained #[repr(i8)] at 1.57 by https://github.com/rust-lang/rust/pull/89507[1](https://github.com/rust-lang/rfcs/pull/3607#user-content-fn-1-202b98187017b7952a7d8561312a555c), and the values -1, 0, 1 are shown in the doc only since 1.75 due to a rustdoc improvement rust-lang/rust#116142.

I don't think rustdoc changes like that are subject to FCP by the team(s) that governs semver (t-lang would be involved I assume?), so I don't think rustdoc can be used as a guideline here. In fact rustdoc printing #[repr] even though they were not considered stable guarantees was an open issue for many years that finally got fixed recently. I would say using rustdoc as the arbiter here is a common misconception but not backed by an actual decision process.

We should make it so that rustdoc only prints things that are semver-stable, but that is not where we are. Discrepancies are bugs, but they are not automatically bugs of the form "rustdoc is right and the semver guarantees need to be changed" (or "the unwritten semver rules should be whatever rustdoc does") -- it is just as likely that rustdoc is wrong.

@kennytm
Copy link
Member

kennytm commented Apr 10, 2024

@RalfJung Thanks. Perhaps #1105 should contain to pointer to https://doc.rust-lang.org/cargo/reference/semver.html.

@mattheww
Copy link

I suggest the Lexing section should say how enum#discriminant would be represented as a proc_macro::TokenTree.

@kennytm

This comment was marked as outdated.

@mattheww
Copy link

rfc3101 says that enum#discriminant is, at present, a lexing-time error, and not any kind of token at all.

The statement that enum#discriminant is treated as an Ident for the purposes of proc-macro input is what I would like to see written in the RFC.

(Clearly it's not an identifier for the purposes of the grammar in general, as that would defeat the point.)

There's also a question of how proc-macros can create such a token. Presumably there would be something parallel to Ident::new_raw().

@kennytm
Copy link
Member

kennytm commented Apr 10, 2024

@mattheww I'd suppose Ident::new("enum#discriminant", span) like all other keywords.

Raw identifier is special because r#something is the same as something but losing keyword magic. Yet enum#discriminant acts just as a weirdly spelled keyword, and regular keywords are instantiated using Ident::new().

(Clearly it's not an identifier for the purposes of the grammar in general, as that would defeat the point.)

In #3101 the token enum#discriminant is a RESERVED_IDENTIFIER (in reality an UnknownPrefix error token). It can easily be changed to produce a regular IDENTIFIER (a keyword is still an IDENTIFIER) if this RFC is accepted.

@mattheww
Copy link

Then let's write that in the RFC rather than leaving it for people to suppose.

(To be clear, I'm not suggesting any of this will cause trouble, I just want it to be written down.)

@nikomatsakis
Copy link
Contributor

I'll note in passing that this is the second language feature that "could've been a library" but for privacy -- the other that I'm thinking of being Jack Wrenn's safe transmute proposal. It seems like a fairly common thing that we want to have access to some information scoped to "those who have privileged visibility".

I'm not against this syntax in particular but I do think this should be a clue for future prioritization -- something like scoped impls, or some way to reify privacy, would be helpful to ensure Rust lives up to its value of being Extensible.

@fmease
Copy link
Member

fmease commented Apr 11, 2024

something like scoped impls, or some way to reify privacy

We do have restrictions (approved RFC 3323, tracking issue) which we should be able to leverage in some way or another once they're implemented (rust-lang/rust#106074).

@Ciel-MC
Copy link

Ciel-MC commented Apr 11, 2024

Saw this on TWIR and was wondering if this should be done together with having a “discriminant type” as a “secret enum” declared for all enums that is just the tag and no body, then perhaps a special trait can be used to retrieve the type or the value(FooEnum::Discriminant?).

@kennytm
Copy link
Member

kennytm commented Apr 11, 2024

that special trait already exists https://doc.rust-lang.org/std/marker/trait.DiscriminantKind.html but I think it is perma-unstable for now

Comment on lines +348 to +354
Privacy is the problem.

If we wanted to just expose everything's discriminant to everyone, it'd be easy
to have a trait in core that's auto-implemented for every `enum`.

But to do things in a way that doesn't add a new category of major breaking change,
that gets harder.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to gently push back on the notion that SemVer stability is a major challenge here. As a rule in Rust, the layout characteristics of a type are outside of its SemVer obligations. To my knowledge, none of our layout reflection mechanisms in the mem module carry any SemVer guarantees: The ability to use offset_of! does not imply that the author is guaranteeing field ordering, and neither size_of nor align_of guarantee any stability of size or alignment. PointeeMetadata will almost certainly lack stability guarantees too.

In my work on Project Safe Transmute, we found that different transmutation abstractions in the ecosystem have different takes on layout stability, and so the best thing we could do (both for these consumers, and for consistency with other layout reflection mechanisms) was to not introduce a notion of layout stability.

Given this precedent, I argue that it would actually be surprising for an Enum trait in the mem module to connote SemVer stability.

[summary]: #summary

Enable using **`.enum#discriminant`** on values of enum type from safe code in the same module
to get the numeric value of the variant's discriminant in the numeric type of its `repr`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we looking for a mechanism that provides a field's discriminant, or its tag? The discriminant is the logical value associated with each variant of an enum. The tag is the actual value, in memory, used to distinguish a variant. For unsafe code operating on enums, the tag is the more relevant item.

The tag and discriminant do not necessarily have the same size, alignment, or value. The tag might not even exist. For instance, the Variant below has a logical discriminant of -42isize, but no tag at all:

enum Example {
    Variant = -42
}

It's possible, in principle, to restrict the utility of this RFC to cases where the tag and discriminant are guaranteed to be equal, but doing so is going to significantly increase the complexity of when your proposed accessor is emitted, and it will diminish its usefulness to safe transmute. For example, the zerocopy crate, whenever possible, supports zero-copy operations on types without explicit reprs. This is genuinely useful for folks who need to do zero-copy operations and want to take full advantage of Rust's layout optimizations, and who don't need to worry about layout stability.

All that said: I'd like to suggest that a #tag accessor would probably be more useful. At the time of writing, the tag of an enum is always an Option<intty> (or Option<&intty>?) Is there a foreseeable future in which that isn't the case? If so, perhaps we could provide a more abstracted operation instead (e.g., a &[u8] to variant conversion).

Copy link
Member

@RalfJung RalfJung Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tag is an unstable implementation detail of the Rust layout algorithm. I don't think it should be exposed in the language at all. Exposing the discriminant is the intention of this RFC and it is IMO the right choice.

For instance, we want to keep the door open to a future where there is no single tag, but the information about the discriminant of the active variant is distributed over several "fields".

There is currently no operation on any level of abstraction in Rust that would expose the tag, and I think we should keep it like that. The action "load the tag" doesn't even exist in the Abstract Machine. Everything is defined in terms of the discriminant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confident that:

  1. there are compelling use-cases for reflecting on an enum's tag (just as there have been compelling use-cases for reflecting on all other aspects of the Rust layout algorithm that we already permit), and that
  2. there are some misguided users will reach for this feature believing that it reflects on the tag (without understanding the distinction).

This misconception will be reinforced by the fact that the discriminant and tag are equal in certain well-specified situations. The primary use-case of RFC 2363, per its own stated Motivation, is to control the tag in such cases. (N.B.: The text of that RFC uses the term "discriminant", but its motivating examples entirely concern controlling the tag.)

Now, this RFC (#3607) leads off by citing the stabilization of RFC 2363 as its motivator. Why? Because RFC 2363 only provides a mechanism for set the tag of an enum, not to access it.

The relationship between these two complementary PRs are reflective of the common conflation of tag and discriminant. I'd be reassured if this RFC, which provides a method for accessing the discriminant, provided a motivation that did not heavily cite an RFC concerned almost entirely with controlling the tag.

What are the use-cases for this PR besides reflecting on the memory representation? How well would those use-cases be satisfied by alternatives that don't have any relationship to the tag (e.g., accessing the ordinal of the variant)?

Copy link
Member

@RalfJung RalfJung Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main use-case is PartialEq/PartialOrd-like things on enum variants, isn't it? All use-cases I have seen so far want the discriminant, not the tag.

Nothing in any layer of the language currently has access to the tag, not even the MIR itself. I don't see evidence that it would be useful for anything. OTOH it is clear that by keeping the tag completely unspecified we can do a lot more with layout optimizations in the future. You are trying to turn this RFC into a much more fundamental change than what it wants to be. Maybe you should write your own RFC, as I don't see how it serves the stated motivation for this RFC.

(N.B.: The text of that RFC uses the term "discriminant", but its motivating examples entirely concern controlling the tag.)

I disagree. Where is the tag relevant? They all want the discriminant, and as far as I know they are all perfectly happy with our mem::discriminant, which demonstrates conclusively that they do not need the tag.

There are cases where you care about memory representation, but those are cases where you will then use a repr that makes the tag and the discriminant the same. For other cases we explicitly want to say absolutely nothing about the memory representation and in particular the tag, and this RFC has no interest in changing that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main use-case is PartialEq/PartialOrd-like things on enum variants, isn't it?

What are examples of "PartialEq/PartialOrd-like things" in the crate ecosystem?


You are trying to turn this RFC into a much more fundamental change than what it wants to be. Maybe you should write your own RFC, as I don't see how it serves the stated motivation for this RFC.

I'm not "trying to turn this RFC into a much more fundamental change than what it wants to be" — I'm trying to give @scottmcm feedback on how to better situate the proposal in the broader design space.

A feature doesn't need to address every use case, but its RFC should! An RFC should be as detailed as possible about which use cases it addresses, which use cases it expressly doesn't address, how it interacts with other language features, and how it might be misused. RFCs are the documents in which we record the design process of the Rust programming language. We'll thank ourselves later for doing this work.


OTOH it is clear that by keeping the tag completely unspecified we can do a lot more with layout optimizations in the future.

Yeah, completely agreed.


(N.B.: The text of that RFC uses the term "discriminant", but its motivating examples entirely concern controlling the tag.)

I disagree. Where is the tag relevant?

The Motivation section of RFC 2363 ends with this pattern made generally possible by the RFC:

fn id(&self) -> LonghandId 
    // We can use the unsafe trick again.
    unsafe { *(self as *const Self as *const LonghandId) }
}

That's a transmute! Perhaps this RFC (3607) could illustrate how this particular example can be made safer by reflecting on the discriminant instead of the tag.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jswrenn

(N.B.: The text of that RFC uses the term "discriminant", but its motivating examples entirely concern controlling the tag.)

I disagree. Where is the tag relevant?

The Motivation section of RFC 2363 ends with this pattern made generally possible by the RFC:

I'd disagree that #2363 is "controlling the tag". Rather, it exploited the guarantee that the "tag" and "discriminant" are equivalent for a primitive representation (#[repr(u16)] enum AnimationValue) to extract the "discriminant" through the "tag". Also nowhere in #2363 suggested it is possible to mutate the tag/discriminant.

And I don't know how getting the .enum#tag is useful for #[repr(Rust)] enums either. Consider

enum G {
    A,       // discriminant = 0. tag = 1114112
    B(char), // discriminant = 1. tag = 0? 1114113? the char itself (in the range 0..=1114111)?
    C,       // discriminant = 2. tag = 1114114
}

type H = Option<G>;
// H::None    - discriminant = 0. tag = 1114115
// H::Some(x) - discriminant = 1. tag = 0? 1114116 (would coincide with J::None's tag)? x.tag?

type J = Option<H>;
// J::None    - discriminant = 0. tag = 1114116
// J::Some(x) - discriminant = 1. tag = 0? 1114117? x.tag?

What is G::B('?').enum#tag? If it is 0 or 1114113 then the .enum#tag is not a transmute i.e. not actually reflecting what is stored in memory, while if it is 63 you can't use the .enum#tag to group all G::B(_) together. So I don't see what's the point compared with .enum#discriminant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are examples of "PartialEq/PartialOrd-like things" in the crate ecosystem?

Derives for PartialEq/PartialOrd are the main stated motivation for these RFCs as far as I can see, but the RFC also says "accessing the discriminant is quite useful in various places" without giving a concrete example. I suspect (de)serialization is another case where this could come up.

A feature doesn't need to address every use case, but its RFC should! An RFC should be as detailed as possible about which use cases it addresses, which use cases it expressly doesn't address, how it interacts with other language features, and how it might be misused. RFCs are the documents in which we record the design process of the Rust programming language. We'll thank ourselves later for doing this work.

An RFC also still has a clear scope, and the scope here is working with the discriminant, not the tag.

I guess the RFC could gain a section in "alternatives" that explains why it does not pursue exposing the tag (for the reasons stated above); would that resolve your concern?

That's a transmute!

Indeed. And we only want to allow such transmutes in cases where the repr guarantees that tag and discriminant are the same. That's exactly what I referred to above when I said

"There are cases where you care about memory representation, but those are cases where you will then use a repr that makes the tag and the discriminant the same."

(So, I agree with what @kennytm said.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see this RFC's Reference Level explanation draw the distinction between discriminant and tag, and clarify that this will produce the discriminant (which is not always the tag). If this feature is ever stabilized, that distinction should be vigorously highlighted in its documentation.

(@kennytm, I don't want to derail this conversation with a debate over whether reflecting on default-repr enum layout is ever useful, but I'd invite you to send a message to Project Safe Transmute if you'd like to continue this thread of discussion.)

I think this RFC's Alternatives discussion should raise the issue of exposing the tag, and then cite the potential of multi-field tags as a reason why we could not possible define a general purpose API for doing so without restricting the possibility of future optimizations.

Comment on lines +38 to +40
For example, `#[derive(PartialOrd)]` on an `enum` today uses internal compiler magic to look at discriminants.
It would be nice for other derives in the ecosystem -- there's a whole bunch of things on `enum`s --
to be able to look at the discriminants directly too.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't find this particular motivation to be particularly strong, as written. Variant discriminants (but not tags) can always be determined syntactically: A variant's discriminant is always one greater than the discriminant of the preceding variant, unless it is explicitly provided, in which case the explicit value is used. If the variant is the first variant and does not have an explicit discriminant, its discriminant is implicitly 0. Given this, it's reasonably simple for derives to reflect on the discriminants of an enum's variants — no compiler magic needed.

to write a clever enough safe `match` that compiles down to a no-op in order to get at the discriminant,
but doing so is annoying and fragile.

And accessing the discriminant is quite useful in various places, so it'd be nice for it to be easy.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you enumerate some more of these cases? There's a lot of subtly in this design space; e.g., do users mostly need discriminants or tags? Why do they need them? Are they operating on valid data, initialized (but invalid) bytes, or partially initialized data? There's not quite enough in the Motivation section to convince me that .enum#discriminant is the ideal solution here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For serde, tagging1 an enum variant by the discriminant value rather than the variant name (i.e. generalizing serde_repr) would be very useful.

Footnotes

  1. https://serde.rs/enum-representations.html, nothing to do with rustc's "tag"

Comment on lines +210 to +211
- For cases without `repr(int)`, custom discriminants aren't even allowed,
so those discriminants much not be all that important.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoa, that's not at all what happened here. Rather, we started by implementing the most minimal version of explicit discriminants on fieldful enums that we could get away with. Expanding the scope of explicit discriminants is totally plausible, and is only blocked on us doing a better job at articulating the difference between discriminant and tag.

- For the FFI cases the layout guarantees mean it's already possible to write a
sound and reliable function that reads the discriminant.
- For cases without `repr(int)`, custom discriminants aren't even allowed,
so those discriminants much not be all that important.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo?

Suggested change
so those discriminants much not be all that important.
so those discriminants must not be all that important.

That also helps resolve any concerns about it *looking* like field access -- as
existed for `.await` -- since it's visibly lexically different.

And the lexical space is already reserved,
Copy link
Contributor

@teor2345 teor2345 Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing half a sentence - I’m not sure exactly what was intended here.

Suggested change
And the lexical space is already reserved,
And the lexical space is already reserved, so we can start using it without a broader syntax RFC.

If this turns out to work well, there's a variety of related properties of things
which could be added in ways similar to this.

For example, you could imagine `MyEnum::enum#VARIANT_COUNT` saying how many variants
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like something worth referencing earlier, because it helps justify the unusual syntax. If you’re creating a namespace that could be populated later, then enum#… makes a lot more sense.

Otherwise every time you added something you’d have to be very careful about causing name conflicts.

- A pseudo-field with `#` in the name looks kinda weird.
- There might be a nicer way to do this in the future.

# Rationale and alternatives
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scottmcm there's another outstanding RFC right now that targets a similar use-case: #3604

I don't think it's likely that RFC will be accepted without modifications, but the general gist of it is instead to allow deriving Into and TryFrom on enums. That proposal has the advantages of handling both directions of the conversion and it does not introduce any novel syntax, but it has the disadvantage that it does not provide private mechanism for accessing the discriminant.

Perhaps you could add a discussion here contrasting these two approaches?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another related alternative is creating a new derivable trait that exposes the discrimenant. Which allows the user to opt in to exposing the discrimenant, but doesn't have a way to make it accessible only within the crate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#3604 is irrelevant to this RFC besides field-less (C-style) enum.

If extending #3604 to enum with fields, the Into implementation would be often useless (e.g. Option<T>::into() will return if self.is_none() { 0 } else { 1 }), and the TryFrom implementation will not work at all.

OTOH, accepting this RFC won't accomplish what is proposed in #3604 either, e.g. those enums can't be passed into arguments expecting impl Into<u32>.


Another related alternative is creating a new derivable trait that exposes the discrimenant. Which allows the user to opt in to exposing the discrimenant,

Given that std::mem::Discriminant<E> is already applicable to all types I don't think you need a derivative. Perhaps a marker trait is sufficient.

/* module std::marker */

/// An assertion that the enum has a stable `#[repr]` 
/// and every variant's discriminant value will never change.
///
/// In particular, enums with implicit discriminant values must not reorder their variants.
// (not sure if the trait needs to be unsafe or not)
pub trait StableDiscriminant {}

/* module std::mem */

impl<E: StableDiscriminant> Discriminant<E> {
    pub fn value(&self) -> <E as DiscriminantKind>::Discriminant {
        self.0
    }
}

/* user code */

enum Enum { ... }

impl std::marker::StableDiscriminant for Enum {}

However this will not solve what the RFC's motivation which is #[derive(PartialOrd)], where implementing PartialOrd does not mean the freedom to reorder the enum variants should be taken away (one could change from derive to manual implementation to keep the PartialOrd stable while actually changing the discriminant value).

So the whole issue is still above whether the discriminant value of a regular enum should be considered part of the stable public interface. If not then the whole issue of

but doesn't have a way to make it accessible only within the crate.

can just goes away.

Comment on lines +15 to +16
Today in Rust you can use `as` casts on *field-less* `enum`s to get their discriminants,
but as soon as any variant has fields, that's no longer available.
Copy link
Member

@jswrenn jswrenn Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worse: As soon as an enum has any tuple-like variants, as casting the tuple-like variant will still work, but instead of producing the discriminant, it will instead cast the function pointer of that variant's constructor. This, I think, is the real reason why we shouldn't "just" expand the functionality of as casting to fieldful enums; doing so further entrenches an exceptionally dangerous pattern.


## Why not do *\<more complex feature\>*?

Privacy is the problem.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this proposal fit into the broader SemVer considerations on enum discriminants? Do we currently consider enum discriminants to be under the scope of SemVer? Note, it's already the case that for for fieldless enums without tuple-like variants, as casting provides a public mechanism for reflecting the discriminant.

I think the space of options is something like:

  • We document that discriminant never has SemVer obligations.
  • We document that the discriminants defined in edition 202X and up do not have SemVer obligations, and disallow as casting of enums defined in greater editions.
  • We document that discriminants currently visible only by as-casting have SemVer obligations, and discriminants not visible by as casting do not have SemVer obligations.
  • We document that all discriminants have SemVer obligations (in which case the privacy concern here is moot).

For example, in a macro expansion even the internal magic today ends up being
```rust
let __self_tag = ::core::intrinsics::discriminant_value(self);
let __arg1_tag = ::core::intrinsics::discriminant_value(other);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably not say "tag" when it means "discriminant". The two are not the same in general.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just a copy of what is being expanded from today's #[derive(PartialOrd)] 🤷 it was renamed from __self_vi (meaning Index Value) to __self_tag by rust-lang/rust#99046.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. rust-lang/rust#123919 should fix that.

The RFC should still be updated. :)

- A pseudo-field with `#` in the name looks kinda weird.
- There might be a nicer way to do this in the future.

# Rationale and alternatives
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implicitly-assigned enum discriminants are a rare case in Rust where the lexical ordering of definitions has an implicit, observable and (potentially) SemVer-relevant effect. It's arguably a footgun. This RFC will increase the range of situations in which implicitly-assigned discriminants are observable.

In the interest of starting to nudge folks away from this footgun, perhaps this proposed accessor should be restricted to enums with strictly explicit discriminants?

This restriction might be sensible to adopt as a first step. It will be easier to relax this restriction later, than it will be to add it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the interest of starting to nudge folks away from this footgun, perhaps this proposed accessor should be restricted to enums with strictly explicit discriminants?

It's a no-go until Rust allow assigning an explicit discriminant to variants of a non-primitive representation enum (i.e. plain #[repr(Rust)] without any #[repr(uNN)]).

Forcing #[repr(uNN)] means any user of this RFC must forego layout optimization, but this is not an intrinsic restriction since std::intrinsic::discriminant_value() does work for every kind of enums.

Copy link
Member

@jswrenn jswrenn Apr 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a no-go until Rust allow assigning an explicit discriminant to variants of a non-primitive representation enum (i.e. plain #[repr(Rust)] without any #[repr(uNN)]).

That's totally on the table! I only did not implement that because there was no way to reflect on such discriminants at the time. With this RFC, we almost certainly should remove that restriction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to hear, if there is a formal proposal or issue somewhere it could be added as a prerequisite 😃

But given the motivation, do we expect enum authors in e.g. Edition 2027 must explicitly assign a variant in order to use #[derive(PartialOrd)]? (or, considering 3rd-party proc macros, #[derive_ex(PartialOrd)] / #[derive_where(PartialOrd)])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there is a formal proposal or issue somewhere it could be added as a prerequisite

Yeah, it's RFC2363! That RFC proposes that explicit discriminants be applicable to all enums, even default repr ones. We just didn't implement that yet, because it wasn't possible to soundly reflect on those discriminants.

But given the motivation, do we expect enum authors in e.g. Edition 2027 must explicitly assign a variant in order to use #[derive(PartialOrd)]?

Perhaps! Requiring that would also solve the common footgun that enum variants cannot be automatically reordered and alphabetized without semantic implications. But I could also see it continue being the case that we put off such an edition gated reqiremen on our first party derives indefinitely. A decision here would be outside the scope of this RFC, but should perhaps be mentioned as a future possibility.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's RFC2363! That RFC proposes that explicit discriminants be applicable to all enums, even default repr ones. We just didn't implement that yet, because it wasn't possible to soundly reflect on those discriminants.

seems not in 2363-proper, it is listed as an "unresolved question".

Copy link
Member

@jswrenn jswrenn Apr 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but since the text of the RFC itself doesn't include that restriction, we already have an accepted design doc (and relevant discussions) for this point in the design space.

Given that this question was only raised for lack of a reflection mechanism at the time (which this RFC will resolve), @scottmcm, perhaps let's formally resolve that question in this RFC? A sentence in the summary should suffice: "The implementation of RFC2363 shall also be extended to allow declaring the discriminant of default-repr enums."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RFC restricts visibility to the current module. So it still allows local reasoning. E.g. anything that tries to sort enum variants can look at the module and check if this syntax is used. It's worse than only having to look at the type but at least not spooky action at the distance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps this proposed accessor should be restricted to enums with strictly explicit discriminants?

Various questions like this came up in the design meeting today, and the core answer is that derives want to work even on implicit discriminants, particularly for repr(Rust) enums, so I wouldn't want that restriction in this RFC.

A restriction of some sort in a future RFC that expands this to non-pub(self) use would absolutely make sense, though.

Comment on lines +412 to +413
Those are *much* easier to generate with a proc macro, however, so are not included
in this RFC. They would need separate motivation from what's done here.
Copy link
Member

@jswrenn jswrenn Apr 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

N.B.: The discriminant type is actually outright impossible to reflect upon syntactically in a proc macro, because the discriminant type of repr(C) enums is platform-specific.

This both warrants a correction in the RFC (it's not "much easier to generate"), and should be cited as a motivating reason why this RFC's proposal is quite useful! Even though it's not too difficult to syntactically reflect on the discriminant values, it's impossible to syntactically determine the return type of the derived discriminant accessor should be if the enum is repr(C).


Sure, it could, like `offset_of!`.

I don't think `enum_discriminant!(foo)` is really better than `foo.enum#discriminant`, though.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, making this a magic macro would mean it would be easily found when someone sees it in code. It has a scope, rustdocs in core, etc. The IDE will point to the docs automatically. Conversely a novel syntax fails to reuse all that existing machinery and conventions.

Copy link
Member

@kennytm kennytm Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rustdoc of std can already contain documentation of keywords (like await). Discoverability through IDE is perhaps one of the issues of least concern for this RFC.

Comment on lines +79 to +88
that are defined in the current module. If you want to expose it to others,
feel free to define a method like

```rust
impl Enum {
pub fn discriminant(&self) -> u8 {
self.enum#discriminant
}
}
```
Copy link
Member

@jswrenn jswrenn Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This solution is not generally tenable for repr(C) enums, whose discriminant type is platform-dependent. We would also need some way to reflect on the type of the discriminant. This might be an advantage of an Enum trait (pending the resolution of the SemVer issues raised here).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case wouldn't the type just be std::ffi::c_int?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps? Let's dig.

RFC2195 says:

#[repr(C)] currently just tells Rust to try to pick whatever integer type that a C compiler for the target platform would use for an enum.

And the Rust Unsafe Code Guidelines say:

When a #[repr(Int)]-style annotation is attached to a fieldless enum (one without any data for its variants), it will cause the enum to be represented as a simple integer of the specified size Int. [...] The #[repr(C)] annotation is equivalent, but it selects the same size as the C compiler would use for the given target for an equivalent C-enum declaration.

I'm not an expert in C. From a brief search, the relevant text in the C99 standard appears to be here:

Constraints

The expression that defines the value of an enumeration constant shall be an integer constant expression that has a value representable as an int.

I think this guarantees that c_int is a suitable logical type of the discriminant, but it appears to leave open the possibility that the type used for the tag in C might be different. This is problematic for Rust, because it's commonly assumed that repr(C) enums have the same tag and discriminant types and values.

Copy link
Member

@kennytm kennytm Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't care about privacy / stability / nightly etc there is already the std::marker::DiscriminantKind trait to extract the Discriminant type.

However, currently Rust always resolve this to isize for a #[repr(C)] enum, which is probably wrong?

#![feature(discriminant_kind)]

use std::{marker::DiscriminantKind, mem::size_of};

#[repr(C)]
pub enum E {
    A,
    B,
}

const _: [(); 4] = [(); size_of::<E>()]; // size of E is same as c_int
const _: [(); 8] = [(); size_of::<<E as DiscriminantKind>::Discriminant>()]; // size of discriminant is same as isize

Should the enum's declared `repr` not be the type you actually want, you can always
use `.enum#discriminant` and *then* `as` cast it -- or hopefully `.into()` or
something else with clearer intent -- into the type you need.

Copy link
Member

@joshtriplett joshtriplett May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Syntax variations
In place of the `.enum#discriminant` syntax, we could use something like `::enum::discriminant`, or similar. This would similarly avoid any possible conflict with an existing name, and would evoke existing concepts (associated constants) that people understand.
This would also look more natural when used with names that do not normally permit `.`, such as enum variant names.
## Supporting variant names without construction
We could permit obtaining the discriminant from a variant name, without constructing a value of that variant. For instance, `SomeEnum::Variant2::enum::discriminant` would allow obtaining the discriminant of a `Variant2` variant without manufacturing one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

such as:

This looks like the sentence ends mid-way?
I guess you mean it to refer to the next subsection, but that's the same header level as the current subsection so I found it quite confusing.

more natural when used with names that do not normally permit .

OTOH it looks less natural when used with names that do not normally permit ::, such as all the examples in the RFC:

let a = Enum::Unit;
assert_eq!(a::enum::discriminant, 7);
let b = Enum::Tuple(true);
assert_eq!(b::enum::discriminant, 13);
let c = Enum::Struct { a: 1 };
assert_eq!(c::enum::discriminant, 42);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you mean it to refer to the next subsection, but that's the same header level as the current subsection so I found it quite confusing.

Fair enough; I changed it to be more standalone.

OTOH it looks less natural when used with names that do not normally permit ::, such as all the examples in the RFC:

I understand that. Perhaps we should support both, then? :: for types, and . for values?

@traviscross
Copy link
Contributor

We discussed this in the lang design meeting on 2024-07-10. Some takeaways:

One is that we decided to go with tag rather than discriminant for this and other language work. We were persuaded by its conciseness. While the word "discriminant" has been used in documentation and in the standard library, this is the first time we'd be adding a language feature that has to name the concept. Given that the existing mem::discriminant API is kind of unfortunate, and unloved in enough other ways, we didn't feel we needed to put much weight on that as precedent. We also noted that much of the language of RFC 2195 had also preferred tag.

Two is that we all seemed happy with some kind of postfix syntax here as opposed to e.g. a special macro (like offset_of!). One reason for this is our expectation of using this kind of syntax for more things and pushing further in this direction. (Some of us would probably be unhappy with this choice if this were the only thing for which we ever used this kind of syntax.)

Three is that we didn't feel strongly that namespacing this with enum was necessary. We discussed how that may be more namespacing than needed, and that seemed the general mood. It seems unlikely we're going to collide identifiers here, and we already know due to type checking that we're dealing with an enum after all; we don't need to risk entering the vicinity of Hungarian notation.

Four, we had a good discussion about whether to go with .#tag or .tag. There are good reasons for both. The former can be used more widely, and that seemed to be where we landed as the place to start with this proposal. We discussed how #ident is not a single token (as it wasn't reserved under RFC 3101), but that's OK, as we can just have the compiler treat this sequence of tokens the right way after lexing.

Five, we talked about how to provide this for variant paths in addition to just values. We noted that fieldless variants could be treated as values, so one could say E::Variant.#tag, but that for variants with fields, that wouldn't work, and we may want to provide something like E::Variant::#tag. We talked about whether these should be lowercase, for consistency and using our kind of special privilege as language designers (as with e.g. u8), or whether these should be capitalized to correspond to how they might be viewed as associated constants or associated types (depending on the feature).

Six, we talked about the various ways we might want rustdoc to visualize this, and how that would help to rapidly raise awareness of and familiarity with this feature.

@RalfJung
Copy link
Member

RalfJung commented Jul 11, 2024

The use of "tag" as apparently a synonym for "discriminant" is unfortunate insofar as "tag" exists as a term in the compiler and it is not equivalent to "discriminant" there. It refers to how the discriminant is encoded in memory. For instance, for an Option<&T>, the "tag" is just the entire value, and the "discriminant" is either 0 or 1 depending on whether the tag is null or not. Also see the compiler glossary. #2195 was written before we established consistent terminology here; terminology around variant index / discriminant / tag used to be quite messy in the compiler until they got cleaned up (around 2019 if I recall correctly).

Granted, with this largely being internal compiler terminology, it can be changed. But it will certainly be confusing to people that have worked with enums in the compiler in the past. We have also occasionally used this terminology in opsem and other language discussions, to my knowledge.

@programmerjake
Copy link
Member

programmerjake commented Jul 11, 2024

Four, we had a good discussion about whether to go with .#tag or .tag.

one big problem with the .#tag syntax is that it isn't usable in the very popular quote! macro since it would try to interpolate a tag variable at that point. (.enum#tag, .tag, or .anything#tag don't have that problem since they don't have a separate # token)
e.g. https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=41c8d00dd76a2a296cacaa125334bc92

@kennytm
Copy link
Member

kennytm commented Jul 11, 2024

@traviscross I think we have concluded from #3607 (comment) that this RFC is producing the discriminant, not the tag.

#2195 is irrelevant here, it controls the maximum size of the tag which for now is also determine the maximum of discriminant value. So they can be interchangeable in #[repr].

But for this RFC the ._#thing must be a concrete value, which the "tag" (in-memory value) and "discriminant" (0, 1, 2, ...) very often have distinct values after layout optimization.

@programmerjake
Copy link
Member

programmerjake commented Jul 11, 2024

though, just .tag by itself may also conflict with a reasonable extension of enum pattern types, where you can directly access fields of single-variant pattern types without needing to match, e.g.:

pub enum MyEnum {
    A {
        tag: u8,
        b: i32,
    },
    B,
}

pub fn f(v: MyEnum is MyEnum::A { .. }) {
    // direct field access, since we statically know which variant we have
    println!("{}, {}", v.tag, v.b);
}

@ijackson
Copy link

though, just .tag by itself may also conflict with a reasonable extension of enum pattern types, where you can directly access fields of single-variant pattern types without needing to match, e.g.:

IMO this is another piece of evidence showing that inventing new syntax for this functionality is a bad idea. This is a fairly niche thing to want to do, and does not deserve spending any of our "weirdness budget" on.

@tmccombs
Copy link

tmccombs commented Jul 11, 2024

Two is that we all seemed happy with some kind of postfix syntax here as opposed to e.g. a special macro (like offset_of!).

What about a postfix macro? So it would look something like:

Enum::Variant::discriminent!() or maybe Enum.discriminant!(Variant).

See also #2442

@workingjubilee
Copy link
Member

workingjubilee commented Jul 12, 2024

Four, we had a good discussion about whether to go with .#tag or .tag.

one big problem with the .#tag syntax is that it isn't usable in the very popular quote! macro since it would try to interpolate a tag variable at that point. (.enum#tag, .tag, or .anything#tag don't have that problem since they don't have a separate # token) e.g. https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=41c8d00dd76a2a296cacaa125334bc92

this seems like largely not-a-problem if we finally decide to ship rust-lang/rust#54722 since it uses a different unquoting.

@joshtriplett
Copy link
Member

@programmerjake I don't think we should have dotted access to enum fields unless those fields are explicitly declared as a common field at the top level of the enum. In which case, we could say that you can't declare a common field named "tag". (Whether we should is another question.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.