Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Primitive enum conversion reform #3040

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

illicitonion
Copy link

@illicitonion illicitonion commented Dec 18, 2020

This RFC proposes making it easier to use From, Into, and
potentially TryFrom for conversions between enums with primitive
representations, and their associated primitives. It optionally also
proposes removing or restricting as conversion between these types.

Rendered

illicitonion added a commit to illicitonion/rust that referenced this pull request Dec 18, 2020
Currently this tends to be done using `as`, which has two flaws:
1. It will silently truncate. e.g. if you have a `repr(u64)` enum, and
   you write `value as u8` (perhaps because the repr changed), you get
   no warning or error that truncation will occur.
2. You cannot use enums in code which is generic over `From` or `Into`.

Instead, by allowing a `From` (and accordingly `Into`) implementation to
be easily derived, we can avoid these issues.

There are a number of crates which provide macros to provide this
implementation, as well as `TryFrom` implementations. I believe this is
enough of a foot-gun that it should be rolled into `std`.

See rust-lang/rfcs#3040 for more information.
This RFC proposes making it easier to use `From`, `Into`, and
potentially `TryFrom` for conversions between enums with primitive
representations, and their associated primitives. It optionally also
proposes removing or restricting `as` conversion between these types.

[Rendered](https://github.com/illicitonion/rfcs/blob/enum-as-to-into/text/3040-enum-as-to-into.md)
@scottmcm scottmcm added T-lang Relevant to the language team, which will review and decide on the RFC. T-libs-api Relevant to the library API team, which will review and decide on the RFC. labels Dec 18, 2020
@scottmcm
Copy link
Member

It optionally also proposes removing or restricting as conversion between these types.

My core question for this one is why it should happen for potentially-truncating conversions from enums but not from integer types. It's not obvious to me that enum conversions specifically are particularly error prone than the other ones.


Another potential alternative: provide direct access to the correctly-typed discriminant value with a new derivable trait, and encourage use of that instead of as casts. That would avoid the need to name the type entirely in consumption code. (But of course it has the downside of a new trait, which raises the bar significantly, so may or may not be better.)

@illicitonion
Copy link
Author

It optionally also proposes removing or restricting as conversion between these types.

My core question for this one is why it should happen for potentially-truncating conversions from enums but not from integer types. It's not obvious to me that enum conversions specifically are particularly error prone than the other ones.

I agree that many of these ideas could, and probably should, apply equally to integer types. I think the core difference is that with enums as can mean two different things: "Guarantee not to truncate, convert to the natural repr" and "Convert to some arbitrary type". While you can today write 1_u8 as u8, there isn't really a good reason to do so, whereas there is a reason to write NumberEnum::One as u8 for a NumberEnum with repr(u8).

There are three "severities" of solution I think we can aim at:

  1. Providing a convenient way to do the guaranteed conversion which isn't the same the possibly truncating one. This means that you can get the type safety of knowing that if e.g. a repr changes, the compiler will tell you that your code may now violate assumptions you previously made. This is enum-specific, and doesn't necessarily require deprecating/removing as support for these cases, though a clippy lint would definitely make sense.
  2. Mandate that as casts should only be used for potentially data-altering casts, and remove support for them where they cannot alter data. i.e. if you're casting to a type which is guaranteed not to lose data, you can no longer use as, and should instead u64::from - this would apply equally to integer types.
  3. Remove as casts entirely from integer types and enums, and introduce an explicit method like u8::truncating_from(u64), in the vein of {wrapping,checked,saturating,overflowing}_ methods.

Personally, I would aim for at least 2, ideally also 3, and that these should apply to enum and integer types equally.

Another potential alternative: provide direct access to the correctly-typed discriminant value with a new derivable trait, and encourage use of that instead of as casts. That would avoid the need to name the type entirely in consumption code. (But of course it has the downside of a new trait, which raises the bar significantly, so may or may not be better.)

Thanks for the suggestion - I've added it as an alternative. I think I lean more towards the consistency of From/Into, but I think it's a good suggestion to consider. I've also noted that this is only an alternative to deriving From/Into - whether (and how strongly) we discourage use of as is an orthogonal question to what its replacement should be.

@semicoleon
Copy link

Moving away from the as casts would be great, and From/Into seem like a sensible way to do it. I'm kind of surprised the truncating casts don't already have a lint honestly.

My only concern is the #[derive(Into)] syntax is fairly opaque if you don't know what it's doing already. Most of the other standard library derives are pretty clear.

Is there precedent for naming a derive something other than the relevant type? #[derive(IntoRepr)] would be more obvious. It might confuse people into thinking that was a separate trait but would at least be more searchable than "Into" which is literally all over the place

@illicitonion
Copy link
Author

Is there precedent for naming a derive something other than the relevant type? #[derive(IntoRepr)] would be more obvious. It might confuse people into thinking that was a separate trait but would at least be more searchable than "Into" which is literally all over the place

As far as I can tell, not in std/core.

Of the crates linked from #2783 (comment) the most common solution is to use a Primitive suffix for their derives: e.g. FromPrimitive, ToPrimitive, TryFromPrimitive: https://lib.rs/crates/derive-try-from-primitive, https://lib.rs/crates/enum-primitive-derive, https://lib.rs/crates/enum-tryfrom, https://lib.rs/crates/num_enum

An alternative could be something like #[derive(Into<repr>)] (potentially opening up a future where other types could #[derive(Into<u8>)] or similar).

@semicoleon
Copy link

I was thinking #[derive(Into<repr>)] would require changing how derive macros are declared (I don't think you can have a type parameter like that currently? But maybe there's a way to do it) otherwise I think that syntax would be ideal

@pubfnbar
Copy link

pubfnbar commented Dec 24, 2020

I don't think a lint like enum_truncating_cast should warn about a conversion like e as Int when all the variants of the enum type of e can be represented by the target integer type Int.

Logically speaking, why shouldn't derive(Into) implement also the conversions to all the integer types that the enum representation can be converted to without truncation? I understand that doing this would mean that in some situations type inference wouldn't know what to do with e.into() but to me that only suggests that leaning on From and Into isn't the right solution to this problem.

I think a new trait for converting a value to the underlying/wrapped type/representation of its type would be more useful than this RFC because it would cover not only enums but also struct newtypes (that may wrap a single integer or whatever type). This new trait wouldn't suffer from the type inference issue that Into has because it wouldn't have any parameters - the target type would be specified by an associated type of the trait.

Update:
I don't think anymore that type inference is an issue with Into. Earlier I thought it would be an issue when passing e.into() into a function template that took it as a generic parameter. Therefore I don't think a new trait is necessary. I do think that if it makes sense to allow an enum to be converted to its integer representation via Into, then the enum type should also allow (for user convenience) conversion via Into to all the other integer types that can represent each of its variants.

Update 2:
To clarify why I don't consider type inference a big issue: given a value e of an enum type that has #[repr(u16)] attribute and that implements Into<u16> and Into<u32> and Into<u64>, the following would fail to compile with error[E0282]: type annotations needed:

let value: u16 = e.into() + 1;
//               --^^^^--
//               | |
//               | cannot infer type for type parameter `T` declared on the trait `Into`
//               this method call resolves to `T`

...but with the unstable type_ascription feature, you can write the following:

let value = e.into(): u16 + 1;

@kpreid
Copy link
Contributor

kpreid commented Dec 24, 2020

There is some value in removing <enum> as <integer> from the language that is not about truncation, which I think might be worth mentioning in this RFC: a library may currently have an enum which is convertible to integer, but not intentionally, which means that dependent crates could be broken when the library

  • reorders the variants (which might be seen as a purely syntactic cleanup),
  • adds explicit numbering different from the implicit default (intending to newly support stable numbering), or
  • adds variants with fields to a #[non_exhaustive] enum which previously had no such variants.

Making enum-to-integer conversion opt-in via #[derive(Into)] (or explicit impl, or any other choice of trait) avoids these risks, in the same way that explicit #[derive(Copy)] avoids a type silently stopping being copiable. (A solution of narrower scope than removing enum as entirely would be to allow as only on enums which use the = syntax at least once to explicitly number their variants, but that would be unrelated to this RFC's plan.)

(I'm not any team member; I got here from This Week in Rust 370.)

@kornelski
Copy link
Contributor

Having conversion traits would be great.

The changes to as are not required. Discussion about its potential deprecation could wait until after Into is available and has some adoption.

@illicitonion
Copy link
Author

Thanks for your input, all! I put together #3046 to propose a new IntoUnderlying trait - it has a draft implementation attached. Let's move the Into vs IntoUnderlying discussion over there, and come back to changes to as once that's settled.

@pubfnbar
Copy link

pubfnbar commented Jan 2, 2021

I think the current RFC text uses the term "truncate" incorrectly perhaps throughout but at least in a couple of places under the "Likely incorrect casts" heading. I believe that whenever an integer is cast to a narrower integer type by using the as operator, the integer is said to be "truncated" irrespective of whether or not the mathematical value of the result of the operation is different from the mathematical value of the operand - whereas the current RFC text seems to use the term "truncating cast" to mean only those casts where the resulting mathematical value is different from the original. For example -1_i16 as i8 does in my mind "truncate" the i16 operand to one byte even though the mathematical value of the operand (-1) is the same as the mathematical value of the result of the operation (because i8 can represent the mathematical value -1).

@illicitonion
Under the heading "Correctness issue" you write that: "This silent truncation causes potential correctness issues. For example, if an enum changes its repr from a smaller type to a larger one, casts which used to be non-truncating may silently become truncating, with not even a warning raised". Could you clarify that statement a bit - do you mean that a truncating cast that results in the same mathematical value as the mathematical value of the operand (like my code example above) could also cause correctness issues?

Due to the fact that I don't see how a truncating cast where the resulting mathematical value is the same as the original could cause any issues, I think that we should keep the status quo otherwise except that we should add two new lints:

  1. Something like non_preserving_integer_cast lint that would have the level warn by default and it would apply to all those casts to integer type the compiler knows result in a different mathematical value from the original, such as casts from a unitary (fieldless) enum variant/constant or an integer constant to an integer type that cannot represent the mathematical value of the operand.
  2. Something like potentially_non_preserving_enum_cast lint that would have the level warn by default and it would apply to casts from a unitary enum variable to an integer type that cannot represent all the mathematical values of the enum's variants.

For example:

#[repr(u64)]
enum Number {
    Zero = 0,
    Thousand = 1000,
}

const NUMBER_ZERO: Number = Number::Zero;
const ZERO: u64 = 0;
const THOUSAND: u64 = 1000;

// `non_preserving_integer_cast` would apply to these two lines and warn by default:
let _ = Number::Thousand as u8;
let _ = THOUSAND as u8;

// But `non_preserving_integer_cast` would NOT apply to these three lines:
let _ = Number::Zero as u8;
let _ = NUMBER_ZERO as u8;
let _ = ZERO as u8;

let number = if it_is_tuesday {
    Number::Zero
} else {
    Number::Thousand
};

// `potentially_non_preserving_enum_cast` would apply to this line and warn by default:
let _ = number as u8;

// But `potentially_non_preserving_enum_cast` would NOT apply to this line:
let _ = number as u16;

// Also, `potentially_non_preserving_enum_cast` would NOT apply to these three lines:
let _ = Number::Zero as u8;
let _ = NUMBER_ZERO as u8;
let _ = ZERO as u8;

Update:
Here's one use case where the From/Into conversions or the new IntoUnderlying trait would be unable to make the situation any safer - whereas the suggested potentially_non_preserving_enum_cast lint could warn/forbid against a potential bug:

#[repr(u32)]
enum Number {
    Zero = 0,
    Million = 1_000_000,
}

fn foo(items: &[i8], number: Number) {
    // `potentially_non_preserving_enum_cast` would apply to the following line if and only
    // if `usize` cannot represent `1_000_000` (on platforms with 16-bit pointers), whereas
    // `From`/`Into` is no help because `usize` doesn't implement `From<u32>`, so this
    // wouldn't work: `items[usize::from(number.into_underlying())]`.
    let _ = items[number as usize];
}

@illicitonion
Copy link
Author

I think the current RFC text uses the term "truncate" incorrectly perhaps throughout but at least in a couple of places under the "Likely incorrect casts" heading. I believe that whenever an integer is cast to a narrower integer type by using the as operator, the integer is said to be "truncated" irrespective of whether or not the mathematical value of the result of the operation is different from the mathematical value of the operand - whereas the current RFC text seems to use the term "truncating cast" to mean only those casts where the resulting mathematical value is different from the original. For example -1_i16 as i8 does in my mind "truncate" the i16 operand to one byte even though the mathematical value of the operand (-1) is the same as the mathematical value of the result of the operation (because i8 can represent the mathematical value -1).

I'm generally familiar with the terms "widening" and "narrowing" for the size-changing aspects of casts, and "truncating" for the discarding data aspect. This is in line with Mitre's definition of truncation: https://cwe.mitre.org/data/definitions/197.html

Is there a term you think is more generally known/applicable for describing this kind of cast?

Under the heading "Correctness issue" you write that: "This silent truncation causes potential correctness issues. For example, if an enum changes its repr from a smaller type to a larger one, casts which used to be non-truncating may silently become truncating, with not even a warning raised". Could you clarify that statement a bit - do you mean that a truncating cast that results in the same mathematical value as the mathematical value of the operand (like my code example above) could also cause correctness issues?

No, I'm specifically referring to cases where the value is changed. I agree that where only narrowing is performed, and values don't change, the operation is likely to be correct.

  1. Something like non_preserving_integer_cast lint that would have the level warn by default and it would apply to all those casts to integer type the compiler knows result in a different mathematical value from the original, such as casts from a unitary (fieldless) enum variant/constant or an integer constant to an integer type that cannot represent the mathematical value of the operand.

I think this would be a great lint to add, where it can be identified. I suspect that in practice it would trigger very rarely compared to the "potential" version, but I would definitely support it!

  1. Something like potentially_non_preserving_enum_cast lint that would have the level warn by default and it would apply to casts from a unitary enum variable to an integer type that cannot represent all the mathematical values of the enum's variants.

Am I reading this correctly that this is identical to the enum_truncating_cast cast proposed in the RFC, with a name which is more explicit about the fact that the cast is only potentially incorrect? I think adding the word "potentially" to the lint name is a good idea, particularly if we also end up with a lint for cases where the numerical change is statically guaranteed.

Here's one use case where the From/Into conversions or the new IntoUnderlying trait would be unable to make the situation any safer - whereas the suggested potentially_non_preserving_enum_cast lint could warn/forbid against a potential bug:

#[repr(u32)]
enum Number {
    Zero = 0,
    Million = 1_000_000,
}

fn foo(items: &[i8], number: Number) {
    // `potentially_non_preserving_enum_cast` would apply to the following line if and only
    // if `usize` cannot represent `1_000_000` (on platforms with 16-bit pointers), whereas
    // `From`/`Into` is no help because `usize` doesn't implement `From<u32>`, so this
    // wouldn't work: `items[usize::from(number.into_underlying())]`.
    let _ = items[number as usize];
}

If all of the measures in this RFC are implemented, including removing as casts from enums, number as usize would no longer be valid, so the code would actually look more like:

let _ = items[number.into_underlying() as usize];

or

let _ = items[u32::from(number) as usize];

But if we stopped short of that, I believe this case would be covered by the proposed enum_truncating_cast lint?

@pubfnbar
Copy link

pubfnbar commented Jan 4, 2021

I'm generally familiar with the terms "widening" and "narrowing" for the size-changing aspects of casts, and "truncating" for the discarding data aspect. This is in line with Mitre's definition of truncation: https://cwe.mitre.org/data/definitions/197.html

Is there a term you think is more generally known/applicable for describing this kind of cast?

I stand corrected. You were using the term correctly. Thank you for the definition - I was unable to find a definition for "truncate" in programming context, so I then read too much into a line in The Rustonomicon that says "casting from a larger integer to a smaller integer (e.g. u32 -> u8) will truncate".

  1. Something like potentially_non_preserving_enum_cast lint that would have the level warn by default and it would apply to casts from a unitary enum variable to an integer type that cannot represent all the mathematical values of the enum's variants.

Am I reading this correctly that this is identical to the enum_truncating_cast cast proposed in the RFC, with a name which is more explicit about the fact that the cast is only potentially incorrect? I think adding the word "potentially" to the lint name is a good idea, particularly if we also end up with a lint for cases where the numerical change is statically guaranteed.

No, it's not identical because my suggested lint takes advantage of the fact that the compiler knows the value of each enum variant at compile time and is also allowed to assume that an enum variable may only have a bit pattern that is identical to the bit pattern of one of the enum type's variants. In the RFC text you give the following example:

#[repr(u16)]
enum Number {
    Zero,
    One,
}

fn main() {
    let bad = Number::Zero as u8;
    //  ^^^ This cast may truncate the value of `bad`, as it is represented by a `u16` which may not fit in a `u8`.
}

...whereas my suggested potentially_non_preserving_enum_cast would never warn about that particular cast, and in fact, it wouldn't even warn about a cast like n as u8, where n is a Number variable whose value is not known at compile-time, because the compiler knows that each legal value a Number variable can take (namely 0 and 1) can be represented by u8 and therefore the compiler knows that the cast cannot truncate.

A better word for the lint would be potentially_enum_truncating_cast though.

If all of the measures in this RFC are implemented, including removing as casts from enums, number as usize would no longer be valid, so the code would actually look more like:

let _ = items[number.into_underlying() as usize];

But if we stopped short of that, I believe this case would be covered by the proposed enum_truncating_cast lint?

No, it wouldn't be covered by it, or at least I'm not proposing a lint that would warn about casting an integer variable to a narrower integer type because I think those warnings would be way too ubiquitous. My other proposed non_preserving_integer_cast (or with a better name integer_truncating_cast) lint would apply only when the operand is either a constant or a variant (and therefore known at compile-time). Thus your code example number.into_underlying() as usize being a cast from a u32 variable to usize would not cause a warning when compiled for a 16-bit platform, and it would probably be a silent bug - whereas number as usize would be warned about by the potentially_enum_truncating_cast lint. So, the intermediate cast from the enum to its underlying type is actually a problem in this case (which is the point I was trying to make with my code example).

And if there's a warn-by-default lint like potentially_enum_truncating_cast, I don't think that the other measures in this RFC or a trait like IntoUnderlying are even needed (with the exception that I do think that it's a good idea to make casting of unitary enums to integers an opt-in functionality).

Update:
I just realized that the term "truncate" sadly doesn't extend to every meaning I was using it for earlier in this post. The problem is that a cast operation like 200u8 as i8 clearly doesn't truncate anything (it doesn't even change the bit pattern) yet it evaluates to a value that is different from the operand value mathematically speaking. So, I still don't know what to call it, but whenever I earlier in this post said that an operation "truncates", what I meant was that the operation "evaluates to a different mathematical value". Perhaps the lints I proposed could be named integer_value_changing_cast and potentially_enum_value_changing_cast.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-lang Relevant to the language team, which will review and decide on the RFC. T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants