Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement intptrcast model #224

Closed
9 of 10 tasks
oli-obk opened this issue Jun 28, 2017 · 40 comments · Fixed by #1851
Closed
9 of 10 tasks

implement intptrcast model #224

oli-obk opened this issue Jun 28, 2017 · 40 comments · Fixed by #1851
Labels
A-interpreter Area: affects the core interpreter A-intptrcast Area: affects int2ptr and ptr2int casts C-project Category: a larger project is being tracked here, usually with checkmarks for individual steps

Comments

@oli-obk
Copy link
Contributor

oli-obk commented Jun 28, 2017

Since right now miri's pointer comparison bails out on &x as *const T < &y as *const T but allows &x as *const T == &y as *const T, I suggest to also allow <. This will not impact ctfe-mode, since ctfe mode disallows any binary ops on pointers into different allocs, even ==.

It's deterministic between multiple miri runs, doesn't leak HashMap ordering and only depends on optimizations and codegen from rustc (but that's already true for ==).

Steps to do:

@oli-obk oli-obk added the C-enhancement Category: a PR with an enhancement or an issue tracking an accepted enhancement label Jun 28, 2017
@RalfJung
Copy link
Member

My plan was to go even further and essentially implement the intptrcast-model. That would immediately make us support all pointer arithmetic.

Essentially, I think at some point miri should (in non-CTFE mode!) guarantee that safe Rust code can not error out (if the unsafe libraries it uses don't contain bugs). So hashing raw pointers and printing them and stuff like that all have to work.

@oli-obk
Copy link
Contributor Author

oli-obk commented Jun 29, 2017

Sounds reasonable to me. Once a pointer has been observed and thus it's integral representation is available, there's no way in safe code to go from that pointer back to the allocation, so we don't even need to support that direction, right? All we need to ensure is hat the integral representation doesn't change when obtaining it multiple times from a real pointer

We'll also need to generate an integer address when offsetting pointers due to overflow concerns. Overflowing exposes the address essentially.

@RalfJung
Copy link
Member

We'll also need to generate an integer address when offsetting pointers due to overflow concerns. Overflowing exposes the address essentially.

That's only true when you do the offsetting after casting to integer though, is it? The offset operations on pointers don't let safe code observe whether an overflow actually happened. Really, we could just generate an integer address immediately on allocation.

Doing lookup on a Pointer is still more efficient, so that may be a reason to convert back.

@oli-obk
Copy link
Contributor Author

oli-obk commented Jun 30, 2017

Well... You can do wrapping_offset (safe and stable method), and then do a <= comparison between the resulting pointer and the new one.

let x = 5;
for i in 1.. {
    let ptr = &x as *const i32;
    let ptr2 = ptr.wrapping_offset(i);
    if ptr2 <= ptr {
        println!("Address of `x` is {}", std::usize::max_value() - i);
    }
}

Really, we could just generate an integer address immediately on allocation.

We'd run out of integer addresses eventually (and quickly on 32bit emulation) if we don't reuse integer addresses and create a unique integer address for every allocation.

@RalfJung
Copy link
Member

RalfJung commented Jul 3, 2017

Heh, nice catch.

We'd run out of integer addresses eventually (and quickly on 32bit emulation) if we don't reuse integer addresses and create a unique integer address for every allocation.

"Quickly"? 2^32 is still a large number, I feel.^^
But anyway, I guess there's no problem with lazy allocation. That also gives us a nice single point of control for rejecting everything that could potentially leak the base address when in CTFE mode -- we just always make the allocation of an integer address fail in that mode.

@oli-obk
Copy link
Contributor Author

oli-obk commented Jul 3, 2017

The problem is, that once you leak the base address of an array, you also leak the address of the last element's last byte. So allocatingand deallocating 100MB 50 times would fill the address space unless we allow reusing it.

@RalfJung
Copy link
Member

RalfJung commented Aug 4, 2017

Actually, given what I wrote in https://internals.rust-lang.org/t/types-as-contracts/5562/81, I think it would make sense for miri to both support pointer comparisons on "abstract" pointers and only provide intptrcast optionally. Without intptrcast, there will always be safe code that miri cannot run (like printing the value of a pointer), but currently there still seems to be value in also having a mode that keeps pointers and integers more separate.

@RalfJung RalfJung changed the title Allow pointer comparison even between pointers of different allocs Allow pointer inequality even between pointers of different allocs Oct 1, 2018
@RalfJung RalfJung changed the title Allow pointer inequality even between pointers of different allocs implement intptrcast model Oct 10, 2018
@Amanieu
Copy link
Member

Amanieu commented Feb 6, 2019

This code currently fails on miri:

fn main() {
    let x: Vec<u32> = Vec::new();
    &x == &[];
}
error[E0080]: constant evaluation error: attempted to do invalid arithmetic on pointers that would leak base addresses, e.g., comparing pointers into different allocations
    --> /root/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/slice/mod.rs:5024:12
     |
5024 |         if self.as_ptr() == other.as_ptr() {
     |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ attempted to do invalid arithmetic on pointers that would leak base addresses, e.g., comparing pointers into different allocations
     |
     = note: inside call to `<[A] as core::slice::SlicePartialEq<A>><u32>::equal` at /root/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/slice/mod.rs:4962:9
     = note: inside call to `core::slice::<impl std::cmp::PartialEq<[B]> for [A]><u32, u32>::eq` at /root/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/liballoc/vec.rs:2069:50
     = note: inside call to `<std::vec::Vec<A> as std::cmp::PartialEq<[B; _]>><u32, u32>::eq` at /root/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/cmp.rs:1009:48
note: inside call to `std::cmp::impls::<impl std::cmp::PartialEq<&'b B> for &'a A><std::vec::Vec<u32>, [u32; 0]>::eq` at src/main.rs:3:5
    --> src/main.rs:3:5
     |
3    |     &x == &[];
     |     ^^^^^^^^^
     = note: inside call to `main` at /root/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/rt.rs:64:34

@RalfJung
Copy link
Member

RalfJung commented Mar 8, 2019

Blocked on #653

@oli-obk
Copy link
Contributor Author

oli-obk commented May 27, 2019

cc @christianpoveda

Basic (meaning probably incomplete) implementation instructions:

  1. add a fn int_to_ptr(&self, ptr: Scalar) -> InterpResult<'tcx, Pointer> method to Machine (in rustc). This function should have a default impl that matches on Scalar, returns the Pointer if it's the Ptr variant and returns the ReadBytesAsPointer error variant otherwise
  2. find all current sites where ReadBytesAsPointer is being emitted and call M::int_to_ptr(self, scalar) at said sites instead of emitting the error.
  3. Do the same trick for ReadPointerAsBytes (but name the newly added method ptr_to_int(&self, ptr: Pointer) -> InterpResult<'tcx, Scalar> and fully ignore the argument and just emit the error variant)
  4. Up to now, there should be zero change in behaviour by const eval or miri
  5. go back to miri, all further changes should be doable completely within miri
  6. add a FxHashMap<AllocId, u64> to

    miri/src/lib.rs

    Line 324 in 3c930e4

    pub struct Evaluator<'tcx> {
    which should then be used in the implementation of Machine::ptr_to_int inside miri
  7. add a u64 to Evaluator, which is the current number of bytes used up by the intptr "memory".
  8. whenever there's no entry in the FxHashMap during ptr_to_int, do name_of_the_hash_map.insert(ptr.alloc_id, name_of_the_u64);, compute name_of_the_u64 + ptr.offset in a variable, add memory.get(id).bytes.len() to the u64, and return a new Scalar created from the u64 + offset` variable. This will allow us to go the one way and do it deterministically
  9. for the other direction, you need another field in the Evaluator, this time to go from an integer to an AllocId (so the inverse of the other map). But since we may have added integers to the base address, you can't just compute the inverse map as that would not have entries for every single pointer. Instead we need something like a BTreeMap<u64, AllocId> (which also needs to be filled in in ptr_to_int). Unfortunately BTreeMap doesn't quite support our use case. What we want to find is the entry which our integer points into. So I guess we go with a Vec<(u64, AllocId)> and do a binary search on it, which should be just as fast for finding elements as searching in a BTreeMap.
  10. add an if self.seed.is_none() { return err!(...); } to both of the methods to make sure we only do this behavior in nondeterministic mode.

@pvdrz
Copy link
Contributor

pvdrz commented May 27, 2019

Ok I'll start working on this :D

i'll be working on this branch: https://github.com/christianpoveda/rust/tree/intptrcast-model

@pvdrz
Copy link
Contributor

pvdrz commented May 28, 2019

Should I wait for rust-lang/rust#59654?

@oli-obk
Copy link
Contributor Author

oli-obk commented May 28, 2019

nah. whoever comes last needs to rebase ;)

@RalfJung

This comment has been minimized.

@oli-obk

This comment has been minimized.

@pvdrz
Copy link
Contributor

pvdrz commented May 29, 2019

I'm on step 1 and got errors when running tests, the linker is failing and I'm not even sure if its my fault, these are the changes: rust-lang/rust@master...christianpoveda:intptrcast-model

@RalfJung
Copy link
Member

I never run run-make-fulldeps tests...^^

What is your x.py invocation?

@pvdrz
Copy link
Contributor

pvdrz commented May 29, 2019

./x.py test

@RalfJung
Copy link
Member

When just working on the Miri engine, I'd recommend ./x.py test --stage 1 src/test/{ui,run-pass}. That's much faster and still gives you very good test coverage.

The error you are running into looks like a bug, might be worth reporting if you can reproduce it on unchanged master.

@pvdrz
Copy link
Contributor

pvdrz commented Jun 3, 2019

Ok, I've added the helper method into InterpretCx, on which places do we want this cast?

@RalfJung
Copy link
Member

RalfJung commented Jun 3, 2019

Well, many of the places where we use to_bits/to_ptr. I'm afraid figuring out for which exactly to do what is a lot of work, and the reason why I have not implemented this myself yet.

Given that one can transmute between integers and pointers, and that not all valid pointers are even representable as Scalar::Ptr, I think it makes sense to do the conversion as lazy as possible. So the two places where our hands are forced would be:

  • in operator.rs, when performing an integer operation, we should do ptr_to_int instead of dispatching to ptr_op.
  • whenever we access memory, after determining that this is not a ZST, we should do int_to_ptr.

@pvdrz
Copy link
Contributor

pvdrz commented Jun 3, 2019

Ok, I'll start digging then :D

@pvdrz
Copy link
Contributor

pvdrz commented Jun 5, 2019

This is an small update:

  • The int_to_ptr method has been added and it is replacing some of the usages of to_ptr as discussed with @RalfJung.
  • The ptr_to_int method has been added and I will start checking today where it can be used.

After that I think that this part is ready for PR, the other changes must be done in miri AFAIK. Is that ok?

@pvdrz
Copy link
Contributor

pvdrz commented Jun 5, 2019

Ok I might have a problem, both int_to_ptr and ptr_to_int are static on my implementation, and of course they shoudn't. However, I don't know how to use these methods to replace usages of to_ptr/to_bits. For example, here I cannot find a way to recover the Machine or the InterpretCx to call the methods accordingly.

@RalfJung
Copy link
Member

RalfJung commented Jun 5, 2019

The machine hooks should likely just take a MemoryExtra, not self.

@pvdrz
Copy link
Contributor

pvdrz commented Jun 7, 2019

I've changed the Machine hooks and now both return an error. I've also moved the force methods to Memory and forwarded them in InterpretCx.

However, I've been replacing to_ptr by force_ptr in the places we discussed with @RalfJung and most of the time it causes UB or ends up with the following error when compiling rustc:

 a raw memory access tried to access part of a pointer value as raw bytes

Could you take a look? rust-lang/rust@master...christianpoveda:intptrcast-model

Edit: Its already solved, I'm doing some tests now.

@pvdrz
Copy link
Contributor

pvdrz commented Jun 7, 2019

I replaced some of the to_ptr uses by the new force_ptr method. I am trying to do the same with force_bits in https://github.com/rust-lang/rust/blob/master/src/librustc_mir/interpret/operator.rs#L317

Update: I modified operator.rs to use force_bits in the binary_op method as you can see here. If @RalfJung || @oli-obk are ok with these changes I'll update the test accordingly.

@RalfJung
Copy link
Member

That looks odd, there shouldn't be two calls to binary_int_op. The entire point was to keep the code paths nice and uniform.

But I think @oli-obk is right, this is not very nice either. Let's move this refactoring to later, doesn't have to happen now. For now, keep the dispatching the way it was.

bors added a commit that referenced this issue Jun 26, 2019
@oli-obk
Copy link
Contributor Author

oli-obk commented Jun 28, 2019

Some discussion between eddyb and me about miri and threads came to the conclusion that for certain kind of checks it would be helpful to only have the intptrcast part where you can go from ptr to int, but not the other way. I believe that would even cover most real use cases, as I've only seen the other direction in artificial examples so far

@RalfJung
Copy link
Member

I don't understand what you even mean by that. Could you give some concrete examples of programs we should not support and why?

Also see this Zulip discussion.

@oli-obk
Copy link
Contributor Author

oli-obk commented Jun 28, 2019

well, basically I want to forbid (on 64 bit)

union Foo {
    a: (u32, u32),
    b: usize,
}
let (x, y) = Foo { b: &1 as *const i32 as usize }.a;
let ptr = ((x as usize) << 32) | (y as usize);
let ptr = *(ptr as *const i32);

or any other bit modifying operations beyond those below the alignment (so those that will happen on the integer representation instead of on the abstract representation).

You can do these reading operations, so you can print all bits of a pointer, I just want to forbid int2ptr everywhere.

@RalfJung
Copy link
Member

I'd rather get rid of all the code that performs arithmetic on the Pointer representation. That code is a mess.

Why don't you want to allow that code? I do want to forbid this code which is why I added a test for this, but yours seems fine to me.

@oli-obk
Copy link
Contributor Author

oli-obk commented Jun 28, 2019

Why don't you want to allow that code?

Essentially to be able to do full thread separation via our symbolic pointers. I want to be able to leverage the knowledge about what is accessible where. If we have integer pointers, that becomes impossible.

@RalfJung
Copy link
Member

I'm afraid you will have to tell me more, I still don't know what this is all about.^^

Can you open a new issue for this specific restriction, and describe in more detail the why and how? My plan is to close this one once everything works.

I should probably put a checklist in the first comment here for what is still on @christianpoveda's TODO list. ;)

@pvdrz
Copy link
Contributor

pvdrz commented Jun 28, 2019

Plz do

@RalfJung
Copy link
Member

Done. :)

@RalfJung RalfJung added the A-intptrcast Area: affects int2ptr and ptr2int casts label Jun 29, 2019
@RalfJung RalfJung added C-project Category: a larger project is being tracked here, usually with checkmarks for individual steps and removed C-enhancement Category: a PR with an enhancement or an issue tracking an accepted enhancement labels Jul 1, 2019
@RalfJung
Copy link
Member

Turns out instead of calling force_ptr in Ref, we can have the invariant that places are actually "normalized" to Pointer on Deref, which happens in rust-lang/rust#63075. So that checkbox is also taken care of.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-interpreter Area: affects the core interpreter A-intptrcast Area: affects int2ptr and ptr2int casts C-project Category: a larger project is being tracked here, usually with checkmarks for individual steps
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants