-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stabilize Strict Provenance and Exposed Provenance APIs #130350
base: master
Are you sure you want to change the base?
Conversation
rustbot has assigned @Mark-Simulacrum. Use |
The Miri subtree was changed cc @rust-lang/miri Portable SIMD is developed in its own repository. If possible, consider making this change to rust-lang/portable-simd instead. |
This comment has been minimized.
This comment has been minimized.
3652514
to
fec397d
Compare
This comment has been minimized.
This comment has been minimized.
d42b82f
to
f3df34f
Compare
One thing I'm not entirely sure about is: if we ever plan to support actual targets (as opposed to just running under miri) that cannot implement Personally I would prefer the functions to simply not exist if they can't act according to the docs, akin to target specific atomic types. That way, if a user of a library tries to run the library on a target that doesn't support expose, they get told upfront that the library can't be used. Not that I would expect that to come up much, since as far as I know every current target can implement expose, and anyone running code on CHERI or similar would need to audit their libraries anyways (since |
I'm not sure. But no such target is supported right now, and
I think I'd prefer that, too (and doing the same with |
Makes sense, as long as this isn't stabilizing the fact that these functions will always exist on all (even future) targets, it's fine as-is, I was unsure if that's what it meant. My bad! |
This comment has been minimized.
This comment has been minimized.
18a2985
to
d8636a5
Compare
This comment has been minimized.
This comment has been minimized.
d8636a5
to
01e8884
Compare
This comment was marked as outdated.
This comment was marked as outdated.
Also asked on zulip, but posting here as well because I find zulip confusing (and so it doesn't get lost):
In particular, if the underlying LLVM semantics aren't fully figured out, should Rust really stabilise the high level API before that? |
Co-authored-by: Matthew Woodcraft <[email protected]>
cc3b5fc
to
5be50af
Compare
As far as I know CHERI-like models support "provenance" in hardware directly in a way that is equivalent to this (but generally more optimized, and possibly 16/32/128-bit instead of 64-bit):
The goal of this is to ensure that any accidental out-of-bounds memory access will always trigger an exception, and also that if malicious code takes control of a process, then it only has access to memory that can be accessed based on the currently available registers (which in a lot of models is going to be all process memory due to the stack pointer, but could be more restricted if the stack pointer is only restricted to the current function stack frame, with function call/returns having special hardware support) Note that this isn't the same concept of "provenance" as the compiler-based version, since it doesn't actually track the originating pointer, but it is rather a weaker version that merely tracks access right; weaker in the sense that that compiler-based "provenance" also implicitly tracks the [start, end) of an address in addition to the pointer it originated from. Clearly on such hardware pointer-to-int-to-pointer casts will create a pointer that will always fault upon use, unless something like the exposed provenance API proposed here is provided, and not implemented as a no-op, but rather using a global map or system calls to make sure that the start and end fields in the register are valid. |
I'm not sure what the point of your comment is. This PR isn't about CHERI, except insofar as CHERI happens to benefit from some of the APIs provided here. But Rust doesn't even have experimental CHERI support right now so it's mostly off-topic to talk about CHERI here.
|
It think providing an API that can't be supported on CHERI is the wrong decision, since it means that Rust crates using the API won't compile on it, which I think doesn't make any sense since Rust crates should work on all platforms that Rust supports, and Rust should support all important platforms, which CHERI definitely is given that its memory safety goals are aligned with Rust's. CHERI is the most difficult platform to support for such an API, so it should be the primary consideration for its design, and an API that can't be supported on CHERI should not be adopted by Rust, unless there is no other possible design, which is not the case here as far as I can tell since I think my proposed implementation with a global interval tree (or some variant of it) works on CHERI. |
I also think the cost to support CHERI or similar efficiently is very low since the functions to stop exposing provenance would be a no-op on other platforms and the size parameter to with_exposed_provenance would be ignored (except for miri, which should check them, but it's not a critical component and the the additional work seems minimal), so I don't see any reason to not make those changes, unless there are other changes that still support a future CHERI implementation and are better than my set of changes. |
Unfortunately you're too late then. Rust already has So it's not a question of whether Rust will have operations that don't work on CHERI. It's a question of which operations don't work on CHERI. Note that this PR actually improves the situation for CHERI because it encourages code ti migrate away from |
And to summarize, the current proposal including exposed provenances seems implementable on CHERI, but has two fixable flaws:
|
I am very excited about CHERI, but it is a very early-stage experiment and one can't even buy hardware on the free market. So we're not going to force every Rust programmer on every target to only do things that CHERI supports. Exposed provenance functions are exactly intended to let programmers do things that don't work on CHERI -- that is their entire point. |
This is quite wrong. Capabilities are 128-bit values, with the bounds next to the address. There is no tripling of memory. 1 bit per 128 bits of memory is used for tags. The main increase is just that the size of a pointer is bigger, so uses more of your memory, but the memory itself is there and could be used for something else instead. Loading an integer rather than a capability does not give you something with whole address space bounds. It gives you something that is not a capability. If you use it to do a memory access then the system uses a default capability alongside it which lives in a system register. For non-CHERI code that will normally have bounds of the available address space, and for CHERI code it will normally be null. And with the exception of an experimental instruction in Morello that is in practice always disabled as it’s a bad idea to have, there is no special instruction to create capabilities out of thin air, they have to be derived from existing ones. |
I think "as" casts can be implemented by calling expose_provenance() and with_expose_provenance(). But this still is not ideal due to the stopping provenance and size problems. BTW, I think I was wrong and no size parameters are actually required as long as with_expose_provenance() provides pointer bounds as large as those that were passed to expose_provenance() rather than T-sized bounds for *T (either the union or the topmost in the stack, not sure which), since size can instead be restricted using an API to reduce the object size before exposing it or after recreating it. However, such an API might be a bit of a footgun because it means that when exposing *T one is exposing the whole memory object that the pointer points to rather than just a "T"-sized memory block; OTOH this is the natural interpretation in some mental models so maybe just documenting it is fine, and perhaps providing convenience methods that combine expose_provenance() and with_exposed_provenance() with restriction to a single T-sized object. |
I think the model you describe is equivalent to mine for the purposes of this discussion. The memory tripling is just for the most naive model, which is unlikely to be used in production. AFAIK some CHERI proposals use 64-bit capabilities with a complicated compression scheme where only some [start, end) bounds are representable and others are approximated, as well as supporting less than 64 bits of address space or being less accurate with lots of address space. |
Just for some context here: @jrtc27 is an actual CHERI contributor. We're not like, hypothesizing if CHERI would be cool with these APIs existing, those folks have been roped in on many of these discussions along the way. If they're happy with what we're doing, then any concerns raised on their behalf are dubious at best. |
Although actually it depends on which capability is picked by expose_provenance(). If the union is picked, then indeed the size is not required because it will always be as large as possible; but if the topmost is picked, then it might be useful so that it can pick the topmost capability that is large enough for the requested size, unless one is OK with newly pushed capabilities to make it impossible to access some model. I think this needs more thought to decide what the semantics should be and if no decision is made, then specifying a size parameter on with_exposed_provenance() seems prudent as it allows to pick any semantics later. |
How do they propose to implement exposed provenances on CHERI then? |
Ralf's explanation is the correct one. CHERI is a breaking change to the way people write C/Rust/whatever code (even if it's compatible with the vast majority of code). No one, including the CHERI devs, expects x64/arm64 devs to suddenly start writing all the programs to be CHERI compliant overnight. The entire point of the strict provenance is to try to get more programmers writing code that has "nice" provenance that has clear semantics in compilers and sanitizers (main benefit) and lowers to more strict models like CHERI (cool bonus). Currently everyone is randomly doing "messy" provenance because we only gave them the uber-powerful as-operators and not a bunch of more specific operations that are clearly "nice" (strict) or "messy" (exposed). The "messy" operators are not invalid per-se, but they pose a problem for analysis and semantics, so we'd rather people avoid using them whenever possible. The exposed provenance APIs exist as a new way of doing the uber-powerful messy operations while clearly indicating that your code has been "migrated" to strict provenance. That is, we generally hope programmers will go "oh i shouldn't be using as casts to/from pointers anymore" and migrate to strict provenance. But some programmers will determine that doing so is hard and we want them to:
Without the exposing APIs, if I see your code is full of "as" casts I cannot tell if you haven't tried to migrate your code to strict provenance, or if you did try and decided to be incompatible. Having a different name for these operations that clearly signals you're aware of the new model is an incredibly valuable signal for everyone to have. It can also be used to e.g. more properly compile your program to CHERI (or warn, or error). |
This looks great @RalfJung, thanks for putting this together! I'm excited to see us providing better specified alternatives to more things that I also love the example demonstrating how to correctly tag pointers. I've want to chase optimizations like that in the past, but stopped short because I wasn't fully sure how to do it correctly. I think this new vocabulary will help with that quite a bit! |
It won’t be implemented. |
For the purposes of this discussion, maybe. But your model has gaping security issues and huge overheads, so calling it CHERI is disingenuous, confusing and risks people thinking that those are real issues with CHERI. It is generally best to present a faithful approximation as a model rather than something quite different that happens to have similar properties within the context of this discussion. |
This may be true for C which has been designed decades ago, but I think that Rust should reasonably strive for something like CHERI (and in general any predictable future execution environment change) to NOT be a breaking change, and in fact be something that is seamless to adopt, which means not encouraging crates to depend on APIs that are planned to not be supported there. One possibility is to design with API with CHERI-like systems in mind; another is to discourage its direct usage and instead have crates rely on a wrapper crate that uses the API when available and does something like the global interval tree map construction on CHERI-like architectures. I think it might be better to directly support the API everywhere since it avoids the burden of having people learn about the third-party crate and avoids the risk of them using the API directly when they shouldn't. |
For some embedded hardware it is literally impossible to program it in a CHERI-compatible way. So I'm afraid your utopia is not going to happen. Instead we're doing what we generally do in Rust: we give people a nice tool that covers >90% of the usecases (and fully works on CHERI), and then for the few cases where that's not enough we give people the tools they require to Get The Job Done, even if that means handing them a loaded gun. The Strict Provenance API has been designed with CHERI in mind, and this PR will improve the Rust ecosystem support for CHERI by pushing people towards using that API. This is the wrong thread to suggest that Rust should be 100% CHERI compatible. This PR does not add any fundamentally new CHERI-incompatible operation to Rust, it just gives a new name to an existing such operation (namely the But meanwhile, please stop derailing this PR, or I'll have to lock it to contributors. |
🔔 This is now entering its final comment period, as per the review above. 🔔 |
☔ The latest upstream changes (presumably #131635) made this pull request unmergeable. Please resolve the merge conflicts. |
Given that RFC 3559 has been accepted, t-lang has approved the concept of provenance to exist in the language. So I think it's time that we stabilize the strict provenance and exposed provenance APIs, and discuss provenance explicitly in the docs:
I also did a pass over the docs to adjust them, because this is no longer an "experiment". The
ptr
docs now discuss the concept of provenance in general, and then they go into the two families of APIs for dealing with provenance: Strict Provenance and Exposed Provenance. I removed the discussion of how pointers also have an associated "address space" -- that is not actually tracked in the pointer value, it is tracked in the type, so IMO it just distracts from the core point of provenance. I also adjusted the docs forwith_exposed_provenance
to make it clear that we cannot guarantee much about this function, it's all best-effort.There are two unstable lints associated with the strict_provenance feature gate; I moved them to a new strict_provenance_lints feature since I didn't want this PR to have an even bigger FCP. ;)
@rust-lang/opsem Would be great to get some feedback on the docs here. :)
Nominating for @rust-lang/libs-api.
Part of #95228.
FCP comment