-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Niches #3334
Niches #3334
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
While we may need to narrow it, right now there isn't a specific restriction.
The two's-complement representation; added some text to that effect.
Unsigned integers. Added text to that effect.
They correspond to the in-memory bit-pattern representation. For instance, a float value maps via the bits of the float representation. |
@BoxyUwU wrote:
Added as an alternative. Let me know if that covers what you would like to see covered. |
I think I've addressed all comments up to this point. |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
text/3334-niche.md
Outdated
|
||
Constructing a structure with a niche value, or writing to the non-ZST field of | ||
such a structure, or obtaining a mutable reference to such a field, requires | ||
`unsafe` code. Causing a type with a niche to contain an invalid value (whether |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm highly skeptical of this, because it feels like it's repeating the same mistakes that rust ended up having to spend multiple years fixing for repr(packed)
(culminating finally in rust-lang/rust#102513).
Namely, you now have something supposedly "typed" as u32
, but it's not really a u32
at all, because it has extra constraints. Sure, you can say that you need to be an unsafe
block to get a &mut u32
to it, but that doesn't fix the "this is the wrong place to put the proof obligation" problem. Obviously this isn't as bad as mem::uninitialized
, but it has the same structural problems: asking someone to say "I promise that nothing after here will ever misuse this" is awkward, compared to only asking "when you construct a value of the type, you have to prove that it's in-range".
This seems awkward for MIRI to check, too. Because if you have a &mut u32
, how is it going to know that it's illegal to write a 42_u32
through it? Where does the UB happen? Since we don't have typed memory, it seems like it would be unable to diagnose the write as UB, and would have to wait until something later read it, which seems suboptimal -- especially for types that are Copy
and thus might not ever be read!
For example, is this UB?
#[niche(value = 42)]
#[derive(Copy, Clone)]
struct MeaninglessNumber(u64);
unsafe {
let mut x = Some(MeaninglessNumber(10));
let r: &mut u64 = &mut x.as_mut().unwrap_unchecked().0;
*r = 42;
}
It feels to me like it ought to be UB -- conceptually it's setting the value to its niche, which is definitely not okay -- but operationally I don't see where we can diagnose the problem.
I'm very much in favour of letting people control niches, but I think u32-with-a-particular-niche and normal u32 need to be different types.
Edit: This seems to be the root of other comments on the RFC too, such as
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels to me like it ought to be UB
There's a lot of discussion of this in rust-lang/unsafe-code-guidelines#84. It's been a while but I think the consensus is that it's very hard to enforce, we don't gain much from making it UB (we can have validity enforced on typed copy and load), and there are valid use cases for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@scottmcm thank you for bringing this up, it's important that this is discussed somewhere. However, there is a nice way to represent this in the opsem. The interesting minirust operation here is a "load" - the operation that turns a list of bytes into a value at the given type. For tuples (and structs), that is currently defined like this:
fn decode(Type::Tuple { fields, size }: Self, bytes: List<AbstractByte>) -> Option<Value> {
if bytes.len() != size { throw!(); }
Value::Tuple(
fields.iter().map(|(offset, ty)| {
ty.decode(bytes[offset..][..ty.size()])
}).collect()?,
)
}
Essentially, this is saying that a value consists of all the values of the fields, and that the load is DB if the loads of all the fields are DB. Under this proposal, I believe we would adjust it to be the following:
fn decode(Type::Tuple { fields, size }: Self, bytes: List<AbstractByte>) -> Option<Value> {
if bytes.len() != size { throw!(); }
Value::Tuple(
fields.iter().map(|(offset, ty, niche)| {
let field_val = ty.decode(bytes[offset..][..ty.size()])?;
if Some(niche) = niche {
if !niche.contains(field_val.unwrap_primitive_int()) {
throw_ub!();
}
}
}).collect()?,
)
}
This additionally checks that if the field has a niche annotation, then the value of that field is inside the niche range (this should also help explain why I feel so strongly about the fields getting niche annotations, not the type).
Of course this only matters when there is a memory operation at type Foo
. Your example above does not contain any such memory operation, and so it has no UB. Even something like this would be permitted:
struct Foo(
#[niche(range = 4..8)]
u8,
);
let mut x: Foo = Foo(5);
x.0 = 0;
dbg!(x.0);
The memory operations are at type u8
, and so there is no UB. If it were instead to say dbg!(x)
then there would be a load at type Foo
and we would have UB.
As far as I know, these semantics are completely reasonable from an opsem perspective. It of course remains to be answered if they are the semantics we want, but I can't think of any concrete reason we should disfavor them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @thomcc, that's a good read.
I feel like there's a difference between references and pointers, here, though.
For example, if I had to write that as
unsafe {
let mut x = Some(MeaninglessNumber(10));
let p = ptr::addr_of_mut!(x.as_mut().unwrap_unchecked().0);
p.cast::<u64>().write(42);
}
then I agree that it's logical that there's no UB, same as if I'd done just x.as_mut().unwrap_unchecked() as *mut _
.
Perhaps it's illogical, but I expect more restrictions from references, and expect to have to fall down to pointers to do squicky things like changing the variant of an option by writing through reference to the field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is really about this RFC anymore, so moved to Zulip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just don't think we should be giving out a &mut u64, as this RFC seems like it does, instead of a &mut Not42 .
Just rejecting giving out these references is also an option. But we if allow giving them out, we have to make it unsafe
-- I hope that much is uncontroversial. :) (And that's what the current rustc-private niche attribute does.)
FWIW this is not quite as bad as packed
since creating an &mut u64
reference that points to 42 is obviously perfectly fine in and of itself. It's only writes to that reference that are subtle, and that's where we hit rust-lang/unsafe-code-guidelines#84.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To the extent this is related to the RFC, a few points:
-
I do think we should support giving out a
&mut FieldType
from unsafe code, precisely because this allows calling other functions that mutate a field that were designed to operate on the field type. Otherwise, if you want to mutate the field, you have to extract it, operate on it, and write the result back to it. This seems to me like a reasonable operation for unsafe code to have. This isn't at all likepacked
, as @RalfJung noted. With packed, you can end up with an unaligned reference; with niches you cannot. -
Even if we want to provide types that use const generics to represent value ranges, we need some means of creating those types. I would like to give the ecosystem the tools we're using, not just things we've built with those tools. (See the motivation of the RFC.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a more detailed discussion of mutable references. I'm going to mark this "resolved" for now, but feel free to un-resolve if there's anything missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unresolving because to me this isn't so much about about the UCG rules -- I agree it's not against the validity invariants to give &mut
to the field here, and it's possible to use that without triggering UB in unsafe
code -- but because the field type is still basically a lie.
Completely by happenstance, I just stumbled on an interesting example of how the "u32
but not really a u32
" does weird things: rust-lang/rust#103556 (comment).
For the 2 years that git will conveniently trace history (https://github.com/rust-lang/rust/blame/e2267046859c9ceb932abc983561d53a117089f6/library/core/src/num/nonzero.rs#L47) and probably forever, we've been deriving PartialEq
for the NonZero
types. (EDIT: I traced it further and it's since before derive was called derive, which I think I'm good to call forever.) But what does derive
do for a struct
? Takes references to the fields and calls PartialEq
using those references.
And voila, the type lie immediately impacts things. Rather than telling LLVM we're comparing non-zero types by emitting the !range
metadata that would let it take advantage of the rustc_layout_scalar_valid_range_start
-- as happens if you write the obvious implementation yourself https://rust.godbolt.org/z/Ye5xr8P8x -- it thinks it's loading ordinary integers and the resulting code has no range metadata. That missing metadata then, presumably, makes it hard for LLVM to optimize ==
on Option<NonZeroNN>
, resulting in the linked PR to use specialization to try to fix it.
So then this very much is liked packed
in the sense that if you want your derive macro to handle these types well then you'll have to look for it specifically in the macro and handle those cases differently (https://github.com/rust-lang/rust/pull/98446/files#diff-d1ddb1b94e4a7a467f9a396742ceaa3a94e7cdc96055c36a18b8116614682df1R965-R968).
Is that really worth it to be able to write
#[niche = 4]
struct MyType(u32);
instead of a normal newtype?
#[repr(transparent)]
struct MyType(NicheU32<4>);
Personally, I don't think so.
See also, for example, derive(Deserialize)
. For the #[niche]
that's something that either won't work -- because it needs unsafe
to construct -- or needs special handling in the derive macro. Whereas for the newtype it just needs a library implementation for the NicheU32
type, and I'll take normal rust library code over a bespoke path in the proc macro any day.
EDIT: Thanks, @RalfJung, I've corrected my phrasing in the first paragraph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's sound to give out references here, in the same way that it's sound to give out &mut u32s to f32s
FWIW it is definitely not sound to give out &mut u32
to a NonZeroU32
, in the sense defined here.
But it is possible to write UB-free code that takes an &mut u32
to a NonZeroU32
.
@jhpratt "niche" is a real english word that means "space for a thing", like an alcove in a wall, or a place in an ecosystem for a species. It is used within rust because it was already a real word. |
@Lokathor I'm a native English speaker 🙂 I've just never heard it in a programming context, so I suspect others may be thrown off by its usage. |
Regarding the 'niche' terminology, also see this subthread -- I argue that the concept of a "valid range/set of values" is a lot more common in programming and we should design the interface around that. |
Just dropping a suggestion for the "Future possibilities" section of this RFC: Liquid Types :) (a.k.a. "Logically Qualified Data Types") Instead of just providing a range of (in)valid values, it would be great if in the future we could also add match guards that can express more than just continuous ranges, similar to the match guards that exist for Rust pattern matching today (although for liquid subtyping, the guards should not be allowed to have any side-effects!). This has the advantage that it would become easier to reason about types and their optimizations, leading to less runtime checks if the compiler has support for this (i.e. better optimized binaries), as well as the simplification of formally verified programs (this could be useful for static analysis tools like Prusti and Cruseot - or even for an interpreter like miri, too). The above linked paper about liquid types claims a significant reduction in manual proof obligation thanks to liquid types (read: this could translate to Rust as having less |
@joshtriplett This RFC feels like it was at least somewhat inspired by our discussion on LWN. I said there that I felt like the very simplest thing, just specifying a single niche, was very valuable. You obviously have a lot more experience of this process than I do, but I urge that this RFC tries to be uncontroversial even if that means it gets less done than a richer RFC might have. |
Small datapoint: I perfectly know about niches, and was more or less expecting what's in the RFC, and even then when I read #[niche(range = 2..)]
struct Bit(u8); for a while I thought "but why is that value starting at 2 and called #[niche(exclude = 2..)]
struct Bit(u8); or something as explicit. |
- **`unsafe` fields**: If in the future Rust introduces `unsafe` fields, | ||
declaring a niche could internally mark the field as unsafe, taking advantage | ||
of the same machinery. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Project Safe Transmute is also very interested in unsafe fields! I think the RFC should perhaps make a stronger statement here, as we shouldn't have two mechanisms for doing the same thing for any extended amount of time.
If this RFC stabilizes prior to unsafe fields, then we should require that, on an edition boundary, uses of:
#[niche(value = 42)]
struct MeaninglessNumber(u64);
...are migrated to:
#[niche(value = 42)]
struct MeaninglessNumber(unsafe u64);
(or whatever the syntax happens to be).
depends on the value of another field). It also would not work as well for | ||
niches on a `union`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also would not work as well for niches on a
union
.
Can you elaborate on this?
We could use a syntax based on patterns, such as `struct S(u8 is 0..=32);` or | ||
`struct S(MyEnum is MyEnum::A | MyEnum::B)`. The `niche` attribute could be | ||
forward-compatible with this, by generating the appropriate patterns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to suggest another alternative: slightly extending the capabilities of enums.
Rust already has a stable mechanism for establishing language-level validity invariants: enums. I can, for instance, already define bool
in stable Rust:
#[repr(u8)]
enum Bool {
False = 0,
True = 1,
}
...and I can even already define NonZeroU8
in stable Rust:
#[repr(u8)]
enum NonZeroU8 {
x01=0x01, x02, x03, x04, x05, x06, x07, x08, x09, x0A, x0B, x0C, x0D, x0E, x0F,
x10, x11, x12, x13, x14, x15, x16, x17, x18, x19, x1A, x1B, x1C, x1D, x1E, x1F,
x20, x21, x22, x23, x24, x25, x26, x27, x28, x29, x2A, x2B, x2C, x2D, x2E, x2F,
x30, x31, x32, x33, x34, x35, x36, x37, x38, x39, x3A, x3B, x3C, x3D, x3E, x3F,
x40, x41, x42, x43, x44, x45, x46, x47, x48, x49, x4A, x4B, x4C, x4D, x4E, x4F,
x50, x51, x52, x53, x54, x55, x56, x57, x58, x59, x5A, x5B, x5C, x5D, x5E, x5F,
x60, x61, x62, x63, x64, x65, x66, x67, x68, x69, x6A, x6B, x6C, x6D, x6E, x6F,
x70, x71, x72, x73, x74, x75, x76, x77, x78, x79, x7A, x7B, x7C, x7D, x7E, x7F,
x80, x81, x82, x83, x84, x85, x86, x87, x88, x89, x8A, x8B, x8C, x8D, x8E, x8F,
x90, x91, x92, x93, x94, x95, x96, x97, x98, x99, x9A, x9B, x9C, x9D, x9E, x9F,
xA0, xA1, xA2, xA3, xA4, xA5, xA6, xA7, xA8, xA9, xAA, xAB, xAC, xAD, xAE, xAF,
xB0, xB1, xB2, xB3, xB4, xB5, xB6, xB7, xB8, xB9, xBA, xBB, xBC, xBD, xBE, xBF,
xC0, xC1, xC2, xC3, xC4, xC5, xC6, xC7, xC8, xC9, xCA, xCB, xCC, xCD, xCE, xCF,
xD0, xD1, xD2, xD3, xD4, xD5, xD6, xD7, xD8, xD9, xDA, xDB, xDC, xDD, xDE, xDF,
xE0, xE1, xE2, xE3, xE4, xE5, xE6, xE7, xE8, xE9, xEA, xEB, xEC, xED, xEE, xEF,
xF0, xF1, xF2, xF3, xF4, xF5, xF6, xF7, xF8, xF9, xFA, xFB, xFC, xFD, xFE, xFF,
}
The above example is ridiculous, but it doesn't need to be. Rather than building out a parallel mechanism for defining validity ranges, I think we should seriously consider allowing ranged explicit discriminants. The above example would, then, collapse into something like:
#[repr(u8)]
enum NonZeroU8 {
Value = 1..=u8::MAX,
}
In doing this, we avoid the situation of having two parallel language-level mechanisms for defining validity ranges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this idea, actually -- we can start by requiring a repr
and exactly one variant, but there's the obvious future possibility of eventually supporting things like
#[repr(u32)]
enum char {
_Low = 0..=0xD7FF,
_High = 0xE000..=0x10FFFF,
}
should we ever start supporting multiple niches in the same type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already support disjoint regions in enums, we simply don't exploit all the niches. So maybe we could just allow ranges in enums then later add niches where profitable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we save much complexity with adding ranges to enum discriminants vs adding entirely new type syntax. The complexity is not as much in the type, but in the construction of the type. It is not clear to me how we would initialize such enums without transmute.
Also, if we ever want to allow things like
match s {
x @ 0..=1 => {
/*logic goes here*/
match x {
0 => {}
1 => {}
// look no fallback needed
}
}
_ => {}
}
We'll need something like pattern types anyway
Apologies if this is bikeshedding, but #[niche(value = 42)]
struct MeaninglessNumber(u64);
#[niche(range = 2..)]
struct Bit(u8); I internally read it as "MeaninglessNumber's value can only equal 42" and "the range of Bit is anything above 2". My suggestion is to swap #[niche(value != 42)]
struct MeaninglessNumber(u64);
#[niche(range != 2..)]
struct Bit(u8); I think it makes it a bit more understandable, to me it translates as "MeaninglessNumber has a niche: the value cannot be equal to 42" or "Bit has a niche: it is not in the range Edit: This RFC doesn't actually impact the type system. |
That would require even more new syntax for attributes.
…On Mon, 7 Nov 2022 at 11:44, Larson ***@***.***> wrote:
Apologies if this is bikeshedding, but value = (value that this type
CANNOT be) and range = (range that this type CANNOT be) is a bit
confusing to me. When I see this:
#[niche(value = 42)]struct MeaninglessNumber(u64);
#[niche(range = 2..)]struct Bit(u8);
I internally read it as "MeaninglessNumber's value can only equal 42" and
"the range of Bit is anything above 2".
My suggestion is to swap = for !=:
#[niche(value != 42)]struct MeaninglessNumber(u64);
#[niche(range != 2..)]struct Bit(u8);
I think it makes it a bit more understandable, to me it translates as
"MeaninglessNumber has a niche: the value cannot be equal to 42" or "Bit
has a niche: it is not in the range 2..". My intuition says most people
will be thinking of niches not as "places to fit *another* type's
information" but more as "the acceptable values for *this* type", since
the former is more of a compiler optimization, e.g. you rarely have to
think about it, whereas the latter affects how you use and interact with
the type in practice. Thoughts?
—
Reply to this email directly, view it on GitHub
<#3334 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABGLD2ZC4K5GU3XM4U75UJ3WHEWXPANCNFSM6AAAAAARMIWN5E>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
maybe it would be a fair compromise to just rename to |
Why not just "exclude" and autodetect if it's a range ? |
Is there a rationale for sidestepping the type system here? I feel like knowing which values a type can and cannot hold is the type system's domain, so duplicating that responsibility elsewhere in the compiler doesn't seem like the best idea. It also feels like the requirement to use these types in To phrase it differently, the type system makes it impossible (in safe rust) to construct an enum with an invalid variant, why not extend that functionality for this purpose, where the type system can enforce it? As it stands, this seems like it'd require an entirely new mechanism, one where all construction is impossible in safe rust. Compare niches as they are in this RFC to something like the following, just a simple extension of the enum system to use non-ident variants, some syntax for making many variants at once, and some new derives for fallible and infallible conversion. (I'm not suggesting this exact system, just using it as an example.) #[derive(TryFrom, Into)]
enum Bit as u8 {
0,
1
}
#[derive(TryFrom, Into)]
enum MeaninglessNumber as u64 {
0..42, 43..
}
fn main() -> {
let a: MeaninglessNumber = 250.into(); // Ok
let b: MeaninglessNumber = 42.into(); // Error
match a {
0..42 => println!("smol"),
43.. => println!("lorge")
}
} Because this is handled by the type system, these values would be checked at compile time, making it impossible (in safe code) to use an invalid variant, and, more importantly, possible to use at all. If you want zero-cost(?) unchecked conversions, So, why not implement this as an extension to the type system? |
A much simpler API (which probably already exists in the ecosystem, although without niche support) is to use const generics: struct RangedU32<const MIN: u32, const MAX: u32>(u32);
enum char {
Low(RangedU32<0, 0xD7FF>),
High(RangedU32<0xE000, 10FFFF>),
} |
I attempted this in rust-lang/rust#103724, but it is a performance regression due to the extra field now existing for |
I really like the ideas raised of using enums for this, as they encode the concept of a set of valid values. It feels very natural to define niches in terms of enums with valid ranges, and I feel makes the type system very coherent: even builtin types, like (Pointers are kind of like |
That is only appliable to literals, it is more complex for compile-time constants to be verified. For the former, maybe we can have something like C++'s And for the later, I would like Rust to support exception on |
That is already supported. |
Does matching work? I guess for Option<u8> which is copy.
…On Tue, Dec 6, 2022 at 14:28 Bruno Kolenbrander ***@***.***> wrote:
And for the later, I would like Rust to support exception on const
evaluation so that we can have NonZeroU8::new(12).unwrap().
That is already supported. const Option::unwrap is available on nightly,
and matching + panic works on stable.
—
Reply to this email directly, view it on GitHub
<#3334 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABGLD25DWYBYXMH4SKVRMXTWL6HVTANCNFSM6AAAAAARMIWN5E>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Here's something I stumbled upon relating to niches. On x86_64, we have two different kinds of pml2/pml3s, that change depending on the 7th bit. Both variants are the same size, but they have different behavior with regard to flags and bit locations. A simple implementation could look like: #[repr(transparent)]
pub struct Pml2Entry4KiB {
data: u64,
}
impl Pml2Entry4KiB {
// 4KiB code
}
#[repr(transparent)]
pub struct Pml2Entry2MiB {
data: u64,
}
impl Pml2Entry2MiB {
// 2MiB code
}
pub enum Pml2Entry {
Large(Pml2Entry2MiB),
Small(Pml2Entry4KiB),
} but obviously, this will not work. It would be nice if one could somehow specify which bit should be the discriminator, that way this could benefit from some abstraction. Currently, I just have to have a single struct with matches for the bit on every function. This is certainly a very niche niche (ha get it), but I'd love to see it implemented. |
@oli-obk successfully convinced me that an approach using const generics, rust-lang/rust#103724 , is better than the attribute approach. And he proved that it's feasible to implement. I'm closing this in favor of rust-lang/rust#103724 . |
Add support for niches in a stabilizable way.
Define them as narrowly as possible, using only support that already exists in
the compiler.
Provide an extensive section on future possibilities, covering many of the
additional ideas generated over the years.
Rendered