-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Niches #3334
Niches #3334
Changes from 12 commits
e62830c
2947255
26b01d6
f1b671a
74fee72
ae13802
ae3671a
f4b0631
132d1f8
7a1b2e4
422dcd9
825d29c
1a36e03
ffd1965
9816724
ca5970e
eaf6327
bb5b5b9
93d4707
4559ee9
5b03ea2
b8bcfda
c3c6154
bb1a8ec
ad9bf34
f16a730
ee9359f
08bb5b4
77069e6
58d3b9e
d5966c6
c18bee3
627df7f
5109f44
f77eb45
5965de8
811024a
014d504
4856096
bb38342
c20796c
7f27700
b8e0397
48d1c22
6cf4c4e
a011965
ea1bd10
5358cb5
08244e0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,348 @@ | ||
- Feature Name: `niche` | ||
- Start Date: 2022-10-16 | ||
- RFC PR: [rust-lang/rfcs#3334](https://github.com/rust-lang/rfcs/pull/3334) | ||
- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) | ||
|
||
# Summary | ||
[summary]: #summary | ||
|
||
Provide a stable attribute to define "niche" values of a type. The type cannot | ||
store these values, allowing the compiler to use them to optimize the | ||
representation of containing types such as `Option<Type>`. | ||
|
||
# Motivation | ||
[motivation]: #motivation | ||
|
||
Rust makes extensive use of types like `Option`, and many programs benefit from | ||
the efficient storage of such types. Many programs also interface with others | ||
via FFI, via interfaces that provide data and a sentinel value (such as for | ||
errors or missing data) within the same bits. | ||
|
||
The Rust compiler already provides support for this via "niche" optimizations, | ||
and various types providing guarantees of such optimizations, including | ||
references, `bool`, `char`, and the `NonZero` family of types. However, Rust | ||
does not provide any stable means of defining new types with niches, reserving | ||
this mechanism for the standard library. This puts pressure on the standard | ||
library to provide additional families of types with niches, while preventing | ||
the broader crate ecosystem from experimenting with such types. | ||
|
||
Past efforts to define a stable niche mechanism stalled out due to scope creep: | ||
alignment niches, null-page niches, multiple niches, structures with multiple | ||
fields, and many other valid but challenging ideas (documented in the "Future | ||
possibilities" section). This RFC defines a *simple* mechanism for defining one | ||
common type of niche, while leaving room for future extension. | ||
|
||
Defining a niche mechanism allows libraries to build arbitrary types containing | ||
niches, and simplifies handling of space-efficient data structures in Rust | ||
without manual bit-twiddling. | ||
|
||
# Guide-level explanation | ||
[guide-level-explanation]: #guide-level-explanation | ||
|
||
When defining a structure containing exactly one field of a non-zero-sized type | ||
(non-ZST), you can attach a `niche` attribute on it to declare a specific value | ||
or range of values for that field as invalid. This promises the compiler that | ||
you will never store those values in that field, which allows the compiler to | ||
use those in-memory representations for different purposes, such as the | ||
representation of `None` in a containing `Option`. | ||
|
||
```rust | ||
use std::mem::size_of; | ||
|
||
#[niche(value = 42)] | ||
struct MeaninglessNumber(u64); | ||
|
||
assert_eq!(size_of::<MeaninglessNumber>(), 8); | ||
assert_eq!(size_of::<Option<MeaninglessNumber>>(), 8); | ||
|
||
#[niche(range = 2..)] | ||
struct Bit(u8); | ||
|
||
assert_eq!(size_of::<Bit>(), 1); | ||
assert_eq!(size_of::<Option<Option<Option<Bit>>>>(), 1); | ||
``` | ||
|
||
Constructing a structure with a niche value, or writing to the non-ZST field of | ||
joshtriplett marked this conversation as resolved.
Show resolved
Hide resolved
|
||
such a structure, or obtaining a mutable reference to such a field, requires | ||
joshtriplett marked this conversation as resolved.
Show resolved
Hide resolved
|
||
`unsafe` code. Causing a type with a niche to contain an invalid value (whether | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm highly skeptical of this, because it feels like it's repeating the same mistakes that rust ended up having to spend multiple years fixing for Namely, you now have something supposedly "typed" as This seems awkward for MIRI to check, too. Because if you have a For example, is this UB? #[niche(value = 42)]
#[derive(Copy, Clone)]
struct MeaninglessNumber(u64);
unsafe {
let mut x = Some(MeaninglessNumber(10));
let r: &mut u64 = &mut x.as_mut().unwrap_unchecked().0;
*r = 42;
} It feels to me like it ought to be UB -- conceptually it's setting the value to its niche, which is definitely not okay -- but operationally I don't see where we can diagnose the problem. I'm very much in favour of letting people control niches, but I think u32-with-a-particular-niche and normal u32 need to be different types. Edit: This seems to be the root of other comments on the RFC too, such as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There's a lot of discussion of this in rust-lang/unsafe-code-guidelines#84. It's been a while but I think the consensus is that it's very hard to enforce, we don't gain much from making it UB (we can have validity enforced on typed copy and load), and there are valid use cases for it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @scottmcm thank you for bringing this up, it's important that this is discussed somewhere. However, there is a nice way to represent this in the opsem. The interesting minirust operation here is a "load" - the operation that turns a list of bytes into a value at the given type. For tuples (and structs), that is currently defined like this: fn decode(Type::Tuple { fields, size }: Self, bytes: List<AbstractByte>) -> Option<Value> {
if bytes.len() != size { throw!(); }
Value::Tuple(
fields.iter().map(|(offset, ty)| {
ty.decode(bytes[offset..][..ty.size()])
}).collect()?,
)
} Essentially, this is saying that a value consists of all the values of the fields, and that the load is DB if the loads of all the fields are DB. Under this proposal, I believe we would adjust it to be the following: fn decode(Type::Tuple { fields, size }: Self, bytes: List<AbstractByte>) -> Option<Value> {
if bytes.len() != size { throw!(); }
Value::Tuple(
fields.iter().map(|(offset, ty, niche)| {
let field_val = ty.decode(bytes[offset..][..ty.size()])?;
if Some(niche) = niche {
if !niche.contains(field_val.unwrap_primitive_int()) {
throw_ub!();
}
}
}).collect()?,
)
} This additionally checks that if the field has a niche annotation, then the value of that field is inside the niche range (this should also help explain why I feel so strongly about the fields getting niche annotations, not the type). Of course this only matters when there is a memory operation at type struct Foo(
#[niche(range = 4..8)]
u8,
);
let mut x: Foo = Foo(5);
x.0 = 0;
dbg!(x.0); The memory operations are at type As far as I know, these semantics are completely reasonable from an opsem perspective. It of course remains to be answered if they are the semantics we want, but I can't think of any concrete reason we should disfavor them. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks, @thomcc, that's a good read. I feel like there's a difference between references and pointers, here, though. For example, if I had to write that as unsafe {
let mut x = Some(MeaninglessNumber(10));
let p = ptr::addr_of_mut!(x.as_mut().unwrap_unchecked().0);
p.cast::<u64>().write(42);
} then I agree that it's logical that there's no UB, same as if I'd done just Perhaps it's illogical, but I expect more restrictions from references, and expect to have to fall down to pointers to do squicky things like changing the variant of an option by writing through reference to the field. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think this is really about this RFC anymore, so moved to Zulip. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Just rejecting giving out these references is also an option. But we if allow giving them out, we have to make it FWIW this is not quite as bad as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To the extent this is related to the RFC, a few points:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've added a more detailed discussion of mutable references. I'm going to mark this "resolved" for now, but feel free to un-resolve if there's anything missing. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm unresolving because to me this isn't so much about about the UCG rules -- I agree it's not against the validity invariants to give Completely by happenstance, I just stumbled on an interesting example of how the " For the 2 years that git will conveniently trace history (https://github.com/rust-lang/rust/blame/e2267046859c9ceb932abc983561d53a117089f6/library/core/src/num/nonzero.rs#L47) and probably forever, we've been deriving And voila, the type lie immediately impacts things. Rather than telling LLVM we're comparing non-zero types by emitting the So then this very much is liked Is that really worth it to be able to write #[niche = 4]
struct MyType(u32); instead of a normal newtype? #[repr(transparent)]
struct MyType(NicheU32<4>); Personally, I don't think so. See also, for example, EDIT: Thanks, @RalfJung, I've corrected my phrasing in the first paragraph. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
FWIW it is definitely not sound to give out But it is possible to write UB-free code that takes an |
||
by construction, writing, or transmuting) results in undefined behavior. | ||
joshtriplett marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
If a type `T` contains only a single niche value, `Option<T>` (and other enums | ||
isomorphic to it, with one variant containing `T` and one nullary variant) will | ||
use that value to represent `None` (the nullary variant). If such a `T` is | ||
joshtriplett marked this conversation as resolved.
Show resolved
Hide resolved
|
||
additionally `repr(transparent)` or `repr(C)` or otherwise permitted in FFI, | ||
`Option<T>` will likewise be permitted in FFI, with the niche value mapping | ||
bidirectionally to `None` across the FFI boundary. | ||
|
||
If a type contains multiple niche values, Rust does not guarantee any | ||
particular mapping at this time, but may in the future. | ||
|
||
# Reference-level explanation | ||
joshtriplett marked this conversation as resolved.
Show resolved
Hide resolved
|
||
[reference-level-explanation]: #reference-level-explanation | ||
|
||
The niche attribute may either contain `value = N` where `N` is an unsigned | ||
integer, or `range = R` where R is a range expression whose endpoints are both | ||
unsigned integers. The unsigned integers may use any integer base | ||
representation (decimal, hex, binary, octal), but must not have a type suffix. | ||
The unsigned integers are interpreted as the bit patterns in memory | ||
corresponding to the representation of the non-ZST field. For instance, a | ||
struct with a float field could specify one or more NaN values as a niche using | ||
the integer representation of those values. | ||
|
||
The attribute `#[niche]` may only appear on a struct declaration. The struct | ||
must contain exactly one field of a non-zero-sized type (non-ZST). The struct | ||
may contain zero or more ZST fields, such as `PhantomData`. (Note that | ||
joshtriplett marked this conversation as resolved.
Show resolved
Hide resolved
|
||
`#[non_exhaustive]` types do not count as ZSTs for this purpose, even if they | ||
*currently* contain no fields with non-zero sizes.) | ||
|
||
Declaring a niche on any item other than a struct declaration results in an | ||
error. | ||
|
||
Declaring a niche on a struct containing more or less than one non-zero-sized | ||
field results in an error. | ||
joshtriplett marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Declaring multiple `niche` attributes on a single item, or multiple key-value | ||
pairs within a single `niche` attribute, results in an error. | ||
Comment on lines
+172
to
+173
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Given that |
||
|
||
Declaring a niche on a struct that has any generic parameters affecting the | ||
non-zero-sized field results in an error. | ||
joshtriplett marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Declaring a range niche with an empty range (e.g. `0..0`) results in a | ||
warn-by-default lint. As with many lints, this lint should be automatically | ||
suppressed for code expanded from a macro. | ||
|
||
Declaring a range niche with an invalid range (e.g. `5..0`) results in an | ||
error. | ||
|
||
Declaring a niche using a negative value or a negative range endpoint results | ||
in an error. The representation of negative values depends on the size of the | ||
type, and the compiler may not have that information at the time it handles | ||
attributes such as `niche`. The text of the error should suggest the | ||
appropriate two's-complement unsigned equivalent to use. The compiler may | ||
support this in the future. | ||
|
||
Declaring a range niche with an open start (`..3`) results in an error, for | ||
forwards-compatibility with support for negative values. | ||
|
||
Declaring a niche using a non-literal value (e.g. `usize::MAX`) results in an | ||
error. Constants can use compile-time evaluation, and compile-time evaluation | ||
does not occur early enough for attributes such as niche declarations. | ||
joshtriplett marked this conversation as resolved.
Show resolved
Hide resolved
joshtriplett marked this conversation as resolved.
Show resolved
Hide resolved
Comment on lines
+192
to
+194
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As others have mentioned, I think it's a mistake to mandate this. Perhaps this could be reworded to initially guarantee only literals, but explicitly permit const eval if implemented in the future? I understand this is mentioned under future possibilities, but it doesn't imply that it's acceptable to implement it. |
||
|
||
If a type `T` contains multiple niche values (e.g. `#[niche(range = 8..16)]`), | ||
the compiler does not guarantee any particular usage of those niche values in | ||
the representation of types containing `T`. In particular, the | ||
compiler does not commit to making use of all the invalid values of the niche, | ||
even if it otherwise could have. | ||
|
||
However, multiple instances of the same identical type (e.g. `Option<T>` and | ||
`Option<T>`) will use an identical representation (whether the type contains a | ||
niche or not). This permits a round-trip between such a value and a byte | ||
representation. | ||
|
||
If a type `T` contains niches and uses `repr(C)` or `repr(transparent)`, the | ||
joshtriplett marked this conversation as resolved.
Show resolved
Hide resolved
|
||
compiler guarantees to use the same storage size for the type as it would | ||
without the niche, even if the niche might allow storing fewer bytes. If a type | ||
`T` contains niches and uses the default (`Rust`) `repr`, the compiler may | ||
choose to represent the type using fewer bytes if the niche would allow doing | ||
so. For instance: | ||
|
||
```rust | ||
#[niche(range = 4..)] | ||
struct S { | ||
field: u16, | ||
} | ||
|
||
// `size_of::<S>()` may return less than 2 | ||
joshtriplett marked this conversation as resolved.
Show resolved
Hide resolved
|
||
``` | ||
|
||
# Rationale and alternatives | ||
[rationale-and-alternatives]: #rationale-and-alternatives | ||
|
||
We could allow defining *either* valid or invalid ranges. For instance, | ||
`niche(invalid_range(0..=3))` or `niche(valid_range(4..))`. Different types | ||
could use whichever of the two proved simpler for a given use case. However, in | ||
addition to adding gratuitous complexity and requiring longer names | ||
(`invalid_range` vs `range`), this would double the number of cases when | ||
defining other kinds of niches in the future. For instance, a future syntax for | ||
bit-pattern niches would need to provide both `valid` and `invalid` variants as | ||
well. We could introduce another level of nesting to make this orthogonal, such | ||
as `niche(invalid(range(...)))` and `niche(invalid(range(...)))`, but that | ||
further increases complexity. | ||
|
||
Rather than defining the range of *invalid* values, the attribute could define | ||
the range of *valid* values. Different types may find one or the other case | ||
simpler. This RFC chooses to define the range of *invalid* values for three | ||
reasons: | ||
- As an arbitrary choice, because we need to pick one or the other (see above). | ||
- The most common case will be a single invalid value, for which defining | ||
invalid values results in simpler code. | ||
- This mechanism commonly goes by the name `niche`, and `niche` also refers to | ||
the invalid value. So, an attribute defining the niche of a type most | ||
naturally refers to the invalid value. | ||
Comment on lines
+244
to
+253
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have to say I find the RFC's choice here very confusing. The relevant point for users of this type is that they are refining the validity invariant of their type: they must ensure that all values of this type always are in a certain range. I think it is much more natural to actually state the range you are promising you values to be in, rather than stating the range you are promising your value not to be in. That's an unnecessary negation IMO, and bound to lead to confusion. Niches are just a consequence of the fact that the type has a validity invariant. In discussions about this mechanism, the fact that 'niche' is (a subset of) the negated validity invariant does lead to confusion, and we should not codify this confusion into the language syntax. Honestly I wouldn't even want There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also note that in the compiler internally, this is represented as a range of valid values. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @RalfJung I feel like that may be a case of viewing the code through formal-methods glasses. :) In practice, I'm expecting that it's much easier to say I can imagine cases in which it'd be equally easy, such as the current niche on a "nanoseconds" field, where There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes that is terrible syntax, but it's not really a fair comparison -- we can easily come up with better syntax than that. The RFC proposes forcing the user to always write their invariant in negated form, so of course if the invariant is negated then that fits nicely. But when the invariant is not negated, it becomes awkward. In contrast, stating the invariant directly of course also gives the user the option to state them in terms of a negation. I was imagining something like Furthermore, 'niche' is not really a standard term people will have encountered before. The chances are, I think, much higher that people will have encountered the concept of a value always satisfying a certain property (an invariant). Niche is also a technical term from the guts of the rustc layout algorithm, and natural extensions of this concept to invariants like In contrast, the concept of a value being in a certain range of validity is a much higher-level concept that exists not just in Rust but also in other programming languages. Consider for example the ranged integer types from Ada and Pascal. Of course there you state the range the value must be in, not the range the value must not be in. (The RFC mentions Ada but not Pascal, and also does not mention that this RFC does the exact opposite of prior art in terms of how the invariant is stated.) I agree that I view things through formal-methods glasses, but that doesn't automatically mean that the formal methods approach is wrong. ;) I expect it will be common to find some unsafe code around types with a custom validity range, and then the programmer has to argue why the validity invariant of the type is maintained. And again, if we phrase things in terms of niches, there will always be an extra negation there. Sometimes a negation is inherent in the invariant ("the value is not Between these two, I feel very strongly that the first example is a lot more readable than the second. #[valid_range(0..=100)]
struct Percent(u8);
#[niche(range(101..)]
struct Percent(u8); There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I should also add that thinking in terms of 'niche' might be a case of viewing the code through rustc-implementor glasses. :) I think 'valid range'/'invariant' is a much more common concept, and has much more precedent also outside of academia, than 'niche'. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Niche fits well for the "single invalid value" case, and range fits rather poorly for that case. It seems reasonable to have both range and individual declaration forms. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm reopening this because I'm not really convinced it's resolved. The text now says "The most common case will be a single invalid value", but I couldn't find any justification for that statement. Near as I can tell, the vast majority of types using the attribute in the compiler today use multiple invalid values, via the default range of Or looking at the library, both as something with lots of invalid values and where saying "the valid range is This feels pretty broadly applicable to me, in fact. I would describe the ASCII codepoints as those in So if single holes are the only place where the negative phrasing is more natural (and are holes other than zero, which doesn't need this RFC, and
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I should probably have said 'set', not 'range'. The "single invalid value" case should, in my opinion, conceptually be declared as "all values are valid except for this one" -- hence my proposal There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The first use case I could think of for a user-defined niche would be risc-v registers, which I implemented with the valid range 1..=31 (so, nonzero and less than 32). Just as a datapoint for "easy to specify valid values, hard to specify invalid" |
||
|
||
Note that the compiler already supports having a niche in the middle of a | ||
type's possible values; internally, the compiler represents this by defining a | ||
valid range that wraps around the type's possible values. For instance, | ||
`#[niche(value = 42)]` gets represented internally in the compiler as a valid | ||
range starting at 43 and ending at 41. | ||
|
||
We could define *only* single-value niches, not ranges. However, the compiler | ||
already supports ranges internally, and the standard library already makes use | ||
of multi-value ranges, so this seems like an artificial limitation. | ||
|
||
We could define only ranges, not single-value niches, and users could express | ||
single-value niches via ranges, such as `0..=0`. However, that makes | ||
single-value niches more verbose to define, and makes mistakes such as `0..0` | ||
more likely. (This RFC suggests a lint to catch such cases, but the syntax | ||
should still attempt to guide users away from that mistake.) | ||
|
||
We could guarantee more usage of niches than just a single value; however, this | ||
would constrain the compiler in areas that still see active development. | ||
|
||
We could avoid guaranteeing the use of a single-value niche for `Option`; | ||
however, this would eliminate one of the primary user goals for such niches. | ||
|
||
We could require types to opt into the guaranteed use of a niche, separately | ||
from declaring a niche. This seems unnecessarily verbose, as well as limiting: | ||
we can't yet provide a full guarantee of all *future* uses we may want to | ||
guarantee, only of the limited single-value uses. | ||
|
||
We could implement niches using a lang-item type that uses const generics (e.g. | ||
`Niche<T, const RANGE: std::ops::Range<T>>`. This type would be useful | ||
regardless, and we should likely provide it if we can. However, this RFC | ||
advocates (eventually) building such a type on an underlying language-level | ||
building block like `niche`, and providing the underlying building blocks to | ||
the ecosystem as well. | ||
|
||
We could implement niches using a trait `Niche` implemented for a type, with | ||
associated consts for invalid values. If we chose to do this in the future, the | ||
`#[niche(...)]` attribute could become forward-compatible with this, by | ||
generating the trait impl. | ||
|
||
We could use a syntax based on patterns, such as `struct S(u8 is 0..=32);` or | ||
`struct S(MyEnum is MyEnum::A | MyEnum::B)`. | ||
|
||
# Prior art | ||
[prior-art]: #prior-art | ||
|
||
The Rust compiler has supported niches for types like `Option` in various forms | ||
since versions prior to Rust 1.0. In particular, Rust 1.0 already guaranteed | ||
that `Option<&T>` has the same size as `&T`. Rust has had many additional | ||
niche-related optimizations since then. | ||
|
||
The Rust compiler already supports user-defined niches via the unstable | ||
attributes `rustc_layout_scalar_valid_range_start` and | ||
`rustc_layout_scalar_valid_range_end`. | ||
|
||
Bit-twiddling tricks to store information compactly have seen widespread use | ||
and innovation since computing antiquity. | ||
|
||
# Unresolved questions | ||
[unresolved-questions]: #unresolved-questions | ||
|
||
Does the compiler support niches on structs containing ZST fields such as | ||
`PhantomData`? If it doesn't, then initially, having a limitation to only | ||
structs containing a single field would be fine, and would not substantially | ||
reduce the usefulness of stabilizing this feature. | ||
|
||
Could we support niches on generic types? For instance, could we support | ||
declaring a niche of `0` on a generic structure with a single field? | ||
|
||
Could we support negative numbers in a niche attribute, at least for fields of | ||
concrete primitive type? That would provide a much more friendly interface, but | ||
would require the compiler to better understand the type and its size. | ||
|
||
Will something go wrong if applying a niche to a struct whose non-ZST field is | ||
itself a struct containing multiple fields? Do we need to restrict niches to | ||
structs containing primitive types, or similar? | ||
joshtriplett marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Do we need to make `niche` mutually exclusive with `packed`? What about other | ||
attributes? | ||
|
||
# Future possibilities | ||
[future-possibilities]: #future-possibilities | ||
|
||
Niches offer possibilities as vast, rich, clever, and depraved as the | ||
collective ingenuity of bit-twiddlers everywhere. This section includes many | ||
possibilities that have come up in the past. This RFC deliberately excludes all | ||
of these possibilities from the scope of the initial version, choosing to | ||
specify only behavior that the Rust compiler already implements. | ||
|
||
New types of niches can use the same `niche` attribute, adding new key-values | ||
within the attribute. | ||
|
||
- **Signed values**: This RFC requires the use of unsigned values when defining | ||
niches. A future version could permit the use of signed values, to avoid | ||
having to manually perform the twos-complement conversion. This may | ||
require either making the compiler's implementation smarter, or using a | ||
syntax that defines the size of the integer type (e.g. `-1isize`). | ||
- **Limited constant evaluation**: This RFC excludes the possibility of using | ||
constants in the range expression, because doing so simplifies the | ||
implementation. Ideally, a future version would allow ranges to use at least | ||
*simple* numeric constants, such as `usize::MAX`. Full constant evaluation | ||
may be much harder to support. | ||
- **Alignment niches**: If a pointer requires a certain alignment, any bit pattern | ||
corresponding to an unaligned pointer could serve as a niche. This provides | ||
an automatic mechanism for handling "tagged pointers" using the low bits. | ||
- **Null-page niches**: If a target treats the entire null page as invalid, | ||
pointers on that target could have a niche corresponding to that entire page, | ||
rather than just the null value. This would allow defining niches spanning a | ||
large swath of the value space. However, this would either require extensive | ||
use of `cfg_attr` for various targets, or a new mechanism for obtaining the | ||
valid range from the compiler. In addition, for some targets the valid range | ||
may vary based on environment, even for the same target; in such cases, the | ||
compiler would need to provide a mechanism for the user to supply the valid | ||
range *to* the compiler. | ||
- **Invalid-pointer niches**: On targets where certain pointer values cannot | ||
represent a valid pointer in a given context (such as on x86-64 where the | ||
upper half of the address space represents kernel-space address and the lower | ||
half represents userspace addresses), types containing such pointers could use | ||
a large swathe of values as a niche. | ||
- **Pointer high-bit niches**: On targets that don't permit addresses with some of | ||
the high bits set (such as implicitly on historical x86 or ARM platforms, or | ||
explicitly defined via ARM's "top-byte ignore" or AMD's "upper address | ||
ignore" or Intel's "Linear Address Masking"), types containing pointers could | ||
potentially use values with those high bits set as a niche. This would likely | ||
require compile-time configuration. | ||
- **Multiple niches**: A type could define multiple niches, rather than just a | ||
single range. | ||
- **Other bit-pattern niches**: A type could define niches via a bit pattern, | ||
rather than a range. | ||
- **Per-field niches**: A structure containing multiple fields could have a | ||
niche on a specific field, rather than the whole structure. | ||
- **Whole-structure niches**: A structure containing multiple non-zero-sized | ||
fields could have a niche of invalid values for the whole structure. | ||
- **Union niches**: A union could have a niche. | ||
- **Enum niches**: An enum or an enum variant could have a niche. | ||
- **Specified mappings into niches**: Users may want to rely on mappings of | ||
multiple values into a multi-value niche. For instance, users could define a | ||
type with a niche containing a range of integer values, and a range of | ||
integer error codes, and rely on `Result<T, E>` assigning specific niche | ||
values to specific error codes, in order to match a specific ABI (such as the | ||
Linux kernel's `ERR_PTR`). | ||
- **Safety**: The attribute specified in this RFC requires an unsafe block to | ||
set the field. Future extensions could allow safely setting the field, after | ||
verifying in a compiler-visible manner that the value works. For instance: | ||
- **`derive(TryInto)`**: Rust could support deriving `TryInto` from the | ||
contained type to the structure. The implementation could explicitly check | ||
the range, and return an error if not in-range. This would avoid the need to | ||
write explicit `unsafe` code, and many uses may be able to elide or coalesce | ||
the check if the compiler can prove the range of a value at compile time. | ||
- **Lints**: Multiple lints may help users define niches, or detect usages of | ||
niches that may be better expressed via other means. For instance, a lint | ||
could detect a newtype whose constructor maintains a range invariant, and | ||
suggest adding a niche. | ||
- **Range types**: Rust (or libraries built atop Rust) could provide integer | ||
types with associated valid ranges, along with operations that | ||
expand/contract/propagate those ranges as appropriate. | ||
Comment on lines
+463
to
+465
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Personally, I view this as a major point. I developed the Further, it is my belief that ranged integers would subsume a vast majority of use cases for niche values. By no means all: pointer niches are still needed, for example, but I don't think most people need more than "just" ranged integers. |
||
- **`unsafe` fields**: If in the future Rust introduces `unsafe` fields, | ||
declaring a niche could internally mark the field as unsafe, taking advantage | ||
of the same machinery. | ||
Comment on lines
+466
to
+468
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I haven't said anything publicly on this front, but I am actually writing an RFC for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Project Safe Transmute is also very interested in unsafe fields! I think the RFC should perhaps make a stronger statement here, as we shouldn't have two mechanisms for doing the same thing for any extended amount of time. If this RFC stabilizes prior to unsafe fields, then we should require that, on an edition boundary, uses of: #[niche(value = 42)]
struct MeaninglessNumber(u64); ...are migrated to: #[niche(value = 42)]
struct MeaninglessNumber(unsafe u64); (or whatever the syntax happens to be). |
||
- **Move types, or types that don't support references**: Rust currently | ||
requires that all values of a given type have the same representation no | ||
matter where they get stored, to allow taking references to such types and | ||
passing them to contexts that don't know about any relevant storage quirks | ||
such as niches. Given a mechanism for disallowing references to a type and | ||
requiring users to copy or move it rather than referencing it in-place, Rust | ||
could more aggressively optimize storage layout, such as by renumbering enum | ||
values and translating them back when read. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mentioned in one of the other comment threads that
Can you elaborate on that more?
To me, the goal of "allow[ing] libraries to build arbitrary types containing niches" is perfectly well satisfied by letting them create newtypes over
core
types with customizable niches.And in any situation in which the value-with-niche isn't the whole representation of the public type (like if a couple different enum variants have private fields with various niches), the library type is substantially more convenient than needing to
#[niche] #[derive(Copy, Clone, Eq, PartialEq, Hash)]
a couple single-use internal types. (And this is even more true now that this RFC doesn't even propose having therepr(transparent)
rules, since I can't even put aPhantomData
in the type, and thus#[repr(transparent)] pub struct IndexForArrayOf<T>(pub RefinedUsize<0, {isize::MAX as usize}>, PhantomData<fn(T)>); is more ergonomic than needing both that type *and* my own custom
#[niche]` type.)Much as I dislike
repr(packed)
, that one at least has the ergonomic justification that one attribute is much easier than something on every field like if it needed to beBut defining one combined mega-niche for a multi-field struct seems way more confusing than putting niches on each field, and if they're on fields then specifying them via const generics on a type seems completely fine.