Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discriminant bits #2684

Closed
wants to merge 12 commits into from
120 changes: 120 additions & 0 deletions text/0000-discriminant-bits.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
- Feature Name: discriminant_bits
- Start Date: 2019-04-01
- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)
- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)

# Summary
[summary]: #summary

Add methods to `std::mem::Discriminant` which inform of the space necessary for a bitwise representation of an enum's discriminant and the bits itself in an opaque fashion.

# Motivation
[motivation]: #motivation

Rust encourages using enums to encode data with multiple variants.
An example of this can be found in the [game of life tutorial][game-of-life-tutorial].

```rust
enum Cell {
Dead = 0,
Alive = 1
}
```

Using these enums in collections is wasteful, as each instance reserves at least 1 byte of space.
Similarly, `std::mem::size_of<Discriminant<Cell>>()` is at least 1 byte.
For that reason, the Wasm book later goes on and replaces `Vec<Cell>` by [`fixedbitset`][game-of-life-exercise], ending up with a much less intuitive implementation.

If it were possible to read the exact necessary size and the bit representation the descriminant, we could have a `PackedBits<T>` that uses exactly as much space as necessary.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If it were possible to read the exact necessary size and the bit representation the descriminant, we could have a `PackedBits<T>` that uses exactly as much space as necessary.
If it were possible to read the exact necessary size and the bit representation of the discriminant, we could define a type `PackedBits<T>` that uses exactly as much space as necessary.


This allows for an efficient representation of discriminant sets, which is both useful for simple enums, but also for crating an index of all discriminant values present in collection.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This allows for an efficient representation of discriminant sets, which is both useful for simple enums, but also for crating an index of all discriminant values present in collection.
This allows for an efficient representation of discriminant sets, which is both useful for simple enums, but also for creating an index of all discriminant values present in collection.


# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

## Disciminant data
Copy link

@sftim sftim Apr 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Disciminant data
## Discriminant data


`Discriminant::bit_size` is a method to retrieve the minimal number in bits necessary to represent this discriminant.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`Discriminant::bit_size` is a method to retrieve the minimal number in bits necessary to represent this discriminant.
`Discriminant::bit_size` is a method to retrieve the minimal number in bits necessary to represent an enum's discriminant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the significance of the word "minimal" here? Can you elaborate?


```rust
const fn bit_size() -> usize;
```

This number is not subject to optimisation, so e.g. `Option<&str>` reports a bitsize of `1`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While being called "enum layout optimization", it's not really an optimization, but a clear set of rules of how the discriminant is represented.

What is the use-case for knowing the bitsize for enums whose variants have fields? I'm wondering if it would make more sense to have bit_size return Option<usize> in order to only return a bit_size where an actual tag field exists.


For example:

```rust
enum Cell {
Dead = 0,
Alive = 1,
}

enum RGB {
Red,
Green,
Blue
}

Discriminant<Cell>::bit_size() == 1
Discriminant<Option<&str>>::bit_size() == 1
Discriminant<RGB>::bit_size() == 2
```

This information can be used to pack multiple discriminants easily for efficient storage and easy indexing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you provide a sketch in an appendix?


`Discriminant<T>` gains the methods `into_bits` and `from_bits`:

```rust
fn into_bits(&self) -> u128
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I first saw this, I had to wonder what crazy kind of code would have an enum with more than 2^64 variants. Maybe #[repr(u128)] should be mentioned as justification?

Copy link
Contributor

@Centril Centril Apr 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mind it's not just about having > 2^64 variants. You can have much fewer and use enum Foo { ..., VariantN = DiscrimExprN, ... } to use up a whole lot of bits.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was my point. I had to think for a while before I remembered that explicit discriminants can be assigned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eddyb had actually recommended using that size, as that would be the final width of the internal Discriminant value.

```

Returns a bit representation of the discriminant.
This data can be used to construct an efficient storage or index.

```rust
fn from_bits(data: u128) -> Self
```
Copy link
Contributor

@Centril Centril Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exposing these functions does 2 things:

  1. It allows users to extract the discriminant_value and rely on those values potentially compromising ABI instability. RFC 639 noted that it was bad practice to rely on such specific values. While I understand that you wish to rely on values held abstract, rather than specific values, exposing this still allows other users to rely on this. You need to discuss what guarantees these functions give at minimum.
  2. It changes discriminant_value to return a u128 instead of u64 and so Discriminant<T> must also store a u128. This is probably for the best but this should be noted somewhere in this RFC. It seems to me it could have adverse performance results for some use cases.
  3. It means that we cannot use u_more_than_128 as the discriminant. No one would include so many variants, but #[repr(u128)] is a thing so why couldn't #[repr(u256)] be? RFC 639 even spoke about negative discriminant values; maybe @nikomatsakis can elaborate on that since they added the history section to that RFC.

The type Discriminant<T> was intentionally made opaque and this RFC is making it transparent, this should be discussed in the rationale and noted in the drawbacks along with other things mentioned I've noted above.

RFC 1696, which specified the current definition and behavior of Discriminant<T> goes on to note that it is unorderable, and indeed, PartialOrd is not implemented for Discriminant<T> for all T. It seems to me that by exposing from_bits you make the type effectively orderable in some fashion?

In RFC 1696, @Diggsey writes about Discriminant<T>::Repr -- it seems to me that accepting and returning Repr would be better from a stability POV.


Creates a `Discriminant` from emitted bits usable for comparison.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that I can use this function to do from_bits(discriminant_invalid_for_MyEnum) and get a Discriminant<MyEnum>, why is from_bits not returning Result<Self, SomeZST> ?


# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

The feature may interact with non-exhaustive enums.
In this case, still, the currently used discriminant size should be used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this allow people to couple themselves to the number of variants in an enum which was explicitly requested to be non-exhaustive (to avoid that coupling)?


Adding the proposed functions probably entails adding a new compiler intrinsic `discriminant_size`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Describe the semantics of said intrinsic.


Empty enums are of size 0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about enums with a single variant?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously size zero.

As I see it, empty enums are specified in the proposal simply as an acknowledgement that they were taken into account. From a mathematical standpoint, they take NEG_INFINITY bytes. This proposal has chosen not to do this, and to treat them as ZSTs (which indeed aligns with some other parts of the language).


# Drawbacks
[drawbacks]: #drawbacks

The added methods increase API surface in stdlib.

# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section feels really anemic and contains bits copied over from the template. For a proper review, the section should be substantially elaborated upon and the questions raised in the template should be answered.


- Why is this design the best in the space of possible designs?
- What other designs have been considered and what is the rationale for not choosing them?
- What is the impact of not doing this?
- `from_data` and `into_data` could instead be straight `From/Into` implementations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And why isn't it? As aforementioned there may be specific guarantees and non-guarantees we want to make which makes From and Into less apt in terms of communication of those guarantees to/with users.

- Alternatively, `from/into_bits` could return a `Bits<T>` type with a richer interface
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would that richer interface be like?


# Prior art
[prior-art]: #prior-art

# Unresolved questions
[unresolved-questions]: #unresolved-questions

- Naming of the functions could be improved.
- A basic implementation of a bitfield should be created during the implementation phase

# Future possibilities
[future-possibilities]: #future-possibilities

The feature is self-contained and I don't see direct extensions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The feature is self-contained and I don't see direct extensions.
The feature is self-contained and there are no direct extensions.


[game-of-life-tutorial]: https://rustwasm.github.io/docs/book/game-of-life/implementing.html
[game-of-life-exercise]: https://rustwasm.github.io/docs/book/game-of-life/implementing.html#exercises