Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validity of pointers and references to memory not allocated by the compiler #285

Open
jrvanwhy opened this issue Jun 3, 2021 · 5 comments

Comments

@jrvanwhy
Copy link

jrvanwhy commented Jun 3, 2021

Rust's core::ptr documentation currently says:

For a pointer to be valid, it is necessary, but not always sufficient, that the pointer be dereferenceable: the memory range of the given size starting at the pointer must all be within the bounds of a single allocated object.

This is difficult to satisfy when working with MMIO as well as when working across kernelspace:userspace boundaries. For example, the above constraint prevents a kernel from creating a valid pointer into userspace memory when the address range of the userspace memory is determined at runtime.

This leads to the following questions:

  1. Do we want to support such "fabricated" pointers? I think the consensus is yes, but it doesn't appear to be documented anywhere.
  2. How do bounds checks work for such pointers? Are they always in bounds as long as they do not overlap any compiler-allocated objects?

It's also unclear how this works for a malloc written in Rust. Presumably, that allocator would get raw pointers from a system call like mmap, but it would eventually return something like an "allocated object" the compiler does know about.

@jrvanwhy
Copy link
Author

jrvanwhy commented Jun 3, 2021

This question has some overlap with #75 and #213, feel free to close it and let me know if it is a subset of one of those issues.

@bjorn3
Copy link
Member

bjorn3 commented Jun 3, 2021

It's also unclear how this works for a malloc written in Rust. Presumably, that allocator would get raw pointers from a system call like mmap, but it would eventually return something like an "allocated object" the compiler does know about.

For the global allocator all allocations and deallocations go through certain symbols that LLVM is told to be allocation and deallocation functions: rust-lang/llvm-project@f234a4d

@Lokathor
Copy link
Contributor

Lokathor commented Jun 3, 2021

Also note that within LLVM's memory rules volatile access bypasses some of the normal access rules and just does "target specific" stuff. While Rust doesn't always go precisely by what LLVM does (sometimes things are more strict in rust than in LLVM), so far mmio/volatile have mostly been "whatever llvm does, i guess".

@jrvanwhy
Copy link
Author

jrvanwhy commented Jun 3, 2021

It's also unclear how this works for a malloc written in Rust. Presumably, that allocator would get raw pointers from a system call like mmap, but it would eventually return something like an "allocated object" the compiler does know about.

For the global allocator all allocations and deallocations go through certain symbols that LLVM is told to be allocation and deallocation functions: rust-lang/llvm-project@f234a4d

Suppose the following happens:

  1. __rust_alloc is called to allocate a small object.
  2. __rust_alloc retrieves a page from mmap, and stores a pointer into that page (from Rust's perspective, this pointer is fabricated).
  3. __rust_alloc does some pointer arithmetic, and returns a pointer into the page. This causes a subset of the page to become an allocated object.
  4. __rust_alloc is called again, and does more arithmetic using the pointer from step 2.

Example question, that I don't think we currently have an answer to: How do we make sense of this from a bounds checking perspective? Presumably, we want the pointer arithmetic in step 4 to step over the allocated object without invoking UB.

@RalfJung
Copy link
Member

RalfJung commented Jun 5, 2021

How do bounds checks work for such pointers? Are they always in bounds as long as they do not overlap any compiler-allocated objects?

They have to be in-bounds of whatever the actual bounds of the object are. :)
Just because the compiler cannot know the bounds of the object, doesn't mean that there are no bounds. In fact, I'd suggest to forget the compiler and think about the specification of the Rust Abstract Machine: from that perspective, what happens is that there exist a number of "external" allocations that have not been allocated through the Rust native memory allocation operations (__rust_alloc, or stack variables, or ...) -- but other than that they are completely normal allocations, and in particular they have a size. Inbounds requirements thus work the same for all these kinds of allocations, "external" or not.

Example question, that I don't think we currently have an answer to: How do we make sense of this from a bounds checking perspective? Presumably, we want the pointer arithmetic in step 4 to step over the allocated object without invoking UB.

Things become extra tricky when considering an allocation function written in Rust, and I am not aware of any formal work in this space (implementing an allocator "inside" the C/C++/Rust Abstract Machine), so all I can do here is some educated guessing for how this might be done properly. I think what happens is that a pointer returned by __rust_alloc is "altered" somehow by the Abstract Machine to obtain a different provenance, and that provenance is associated with a separate fresh allocation. Conceptually, this region of memory is now no longer part of the original allocation created "externally" with mmap, and becomes a proper Rust Abstract Machine allocation. (Looks like our allocations can have holes now. Fun.)

The allocator implementation in step 4 must ensure to use pointers with the original provenance; that pointer will thus still point to the mmap allocation and not to the new allocation that was handed to the Abstract Machine.

(Now I hope you won't ask about free, because there it gets extra-tricky... there we get a pointer with "small" provenance and somehow can to convert it back to a pointer with provenance for the mmap allocation, and we have to re-integrate the Abstract Machine allocation into the mmap allocation and "fill the hole"...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants