Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Convention for constructing lifetime-bound values from raw pointers #556

Merged
merged 11 commits into from
Feb 4, 2015

Conversation

mzabaluev
Copy link
Contributor

Establish a convention throughout the core libraries for unsafe functions
constructing references out of raw pointers. The goal is to improve usability
while promoting awareness of possible pitfalls with inferred lifetimes.

Remove unnecessary mutability in copy_mut_lifetime while we are at it.

Rendered

NOTE: much of the discussion below concerns earlier revisions where lifetime anchor parameters were proposed.

fn inner_str(&self) -> &[u8] {
unsafe {
let p = ffi::outer_get_inner_str(&self.raw);
let s = std::slice::from_raw_buf(p, libc::strlen(p));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the indentation on this line is a tiny bit off

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks.

@reem
Copy link

reem commented Jan 6, 2015

Unfortunately copy_mut_lifetime is unusable in many of the cases where it would be most useful since the thing whose lifetime you are trying to set is often a mutable (exclusive) borrow into the thing with the lifetime you are trying to copy from.

Here's an example from real code in rust-lazy:

impl<T: Send + Sync> Deref<T> for Thunk<T> {
    fn deref(&self) -> &T {
        self.force();
        match *self.inner {
            // We can use `copy_lifetime` here because the `val` borrow
            // doesn't prevent us using `self`.
            Evaluated(ref val) => unsafe { mem::copy_lifetime(self, val) },

            // We just forced this thunk.
            _ => unsafe { debug_unreachable!() }
        }
    }
}

impl<T: Send + Sync> DerefMut<T> for Thunk<T> {
    fn deref_mut(&mut self) -> &mut T {
        self.force();
        match *&mut*self.inner {
            // We can't use copy_mut_lifetime here because self is already
            // borrowed as &mut by val.
            //
            // Changing it to only require an `&` reference wouldn't change this.
            Evaluated(ref mut val) => unsafe { mem::transmute(val) },

            // We just forced this thunk.
            _ => unsafe { debug_unreachable!() }
        }
    }
}

Changing the copy-from parameter to a non-mutable reference doesn't help these cases.

@mzabaluev
Copy link
Contributor Author

@reem True, but:

  1. With unsafe code, the aliasing would be often beyond the FFI, like in one of the examples.
  2. It will work in implementing unsafe-to-safe helpers like the proposed functions.

Here's an example from grust (which has served as a testbed for this proposal):

pub unsafe fn from_raw_mut<'a, T, Sized? U>(ptr: *mut <T as Wrapper>::Raw,
                                            _life_anchor: &'a U)
                                           -> &'a mut T
    where T: Wrapper
{
    // could be mem::copy_mut_lifetime(_life_anchor, &*(ptr as *mut T))
    mem::transmute(&*(ptr as *mut T))
}

mzabaluev added a commit to gi-rust/grust that referenced this pull request Jan 8, 2015
Instead of obtaining the lifetime from a raw pointer,
provide it explicitly in the manner of rust-lang/rfcs#556.
@alexcrichton
Copy link
Member

@mzabaluev another alternative you may wish to mention is to remove the "lifetime anchor" altogether. I've had mixed experiences in libraries trying to require a lifetime anchor here and there. Sometimes it works out great but other times it ends up just being a pain most of the time.

Along the same lines, do you think it's worth having the anchors (in some fashion, not necessarily the current fashion) over not having any anchors at all?

@mzabaluev
Copy link
Contributor Author

@alexcrichton I should put that in. But pretty much the only context where such functions - no input lifetime parameters, but a lifetime on the return value - can be used without explicit lifetime annotation is return values, isn't it? That may be more pain than the need to use an anchor; in effect, for that to be usable you'd often need a wrapper function to supply the anchor anyway. In my experience, the anchor in many cases will be self or some other 'host' object present in the call context.

Can you point at some negative examples? Also, why it was decided to pass the raw pointers by reference in the first place?

@alexcrichton
Copy link
Member

But pretty much the only context where such functions - no input lifetime parameters, but a lifetime on the return value - can be used without explicit lifetime annotation is return values, isn't it?

The signature I was thinking of was:

fn c_str_to_bytes<'a>(ptr: *const c_char) -> &'a [u8] { ... }

Where in this case 'a is basically any lifetime (and it's even more wildly unsafe!)

Can you point at some negative examples?

One example I have personally looks like:

struct Foo<'a> { ptr: *mut T, lt: marker::ContravariantLifetime<'a> }

impl<'a> Foo<'a> {
    fn bar(&self) -> &'a int { 
        unsafe {
            let ptr = &(*self.ptr).bar;
            // how to create promote `ptr` to the `'a` lifetime?
        }
    }
}

Basically the lifetime is injected via some marker and there's not actual reference with the lifetime I want to return a value from. I do realize though that this is probably rare.

Also, why it was decided to pass the raw pointers by reference in the first place?

Initially these functions looked like the example I have above where the lifetime wasn't tied to anything at all and we were thinking that to help curb at least some of the unsafety that we would tie it to something. We didn't necessarily forsee some of the pitfalls you've outlined in this RFC, however.

@mzabaluev
Copy link
Contributor Author

@alexcrichton

Where in this case 'a is basically any lifetime (and it's even more wildly unsafe!)

Then that lifetime would have to be explicitly annotated, unless it can be inferred from e.g. a return value type when the call is in such context. I have described that as an alternative, with the drawbacks outlined.

In your example, could self be annotated with lifetime 'a as well, given that the type is contravariant to it anyway? So it could look like:

impl<'a> Foo<'a> {
    fn bar(&'a self) -> &'a int { 
        unsafe {
            let ptr = &(*self.ptr).bar;
            std::mem::copy_lifetime(self, &*ptr)
        }
    }
}

@alexcrichton
Copy link
Member

Then that lifetime would have to be explicitly annotated, unless it can be inferred from e.g. a return value type when the call is in such context. I have described that as an alternative, with the drawbacks outlined.

It does not actually require explicit annotation, unlink a generic the compiler will not throw an error if it cannot infer a lifetime, it will just default to 'static I believe.

I have personally found that in 99% of cases that the actual lifetime is being driven by a return value (as the unsafety is normally contained to a small method), which is the source of me questioning the need for an anchor at all.

In your example, could self be annotated with lifetime 'a as well, given that the type is contravariant to it anyway?

Not quite as this means that self has to live for as long as 'a which is not always the case.

@mzabaluev
Copy link
Contributor Author

@alexcrichton: Indeed, I underestimated Rust's type inference. The problem with unanchored return lifetime is that it has even worse potential for "convenient evil" than the current convention. Especially if names for such functions are also shorter.

this means that self has to live for as long as 'a which is not always the case.

In your example, a shorter lifetime on self would mean that the user has borrowed from what already looks like a smart reference. Discouraging that might be considered a feature :)

There may be less need for custom reference types if #709 is implemented: if Foo is an unsized type, &Foo can represent a pointer to a value which cannot be copied or moved out.

@alexcrichton
Copy link
Member

The problem with unanchored return lifetime is that it has even worse potential for "convenient evil" than the current convention.

I've been thinking about this recently and I'm not actually entirely convinced any more that it's so much more dangerous. In my example I gave above, I'm likely to end up using mem::transmute no matter what, in which case we haven't really gained a whole lot of safety.

In other examples where you do have an anchor it may still be convenient to use mem::transmute due to the long function name of _with_lifetime.

I suppose what I've been thinking recently is that inference of the actual lifetime would largely be driven by signatures which should be carefully scrutinized regardless. I am a bit up in the air though, I haven't quite made up my mind!

@mzabaluev
Copy link
Contributor Author

@alexcrichton The question is essentially about how much safety rope do we want to offer with these unsafe methods, above mem::transmute which is the universal can opener. Are there better be just thin adaptations of transmute to produce target types from their constituent data, or do we want the programmer to also explicitly assign the lifetime for additional safety? The answer may be "we want both", just as there is a purpose in mem::copy_lifetime despite the longer name and parameter signature.

@alexcrichton
Copy link
Member

Ok, so to recap a bit, I think we have three routes proposed so far:

  1. The status quote today where judicious use of mem::copy_lifetime may be necessary. We may want to change the signature of copy_mut_lifetime to have a more easy-to-use signature as well.
  2. As proposed in this RFC, we separate out the anchor from the input pointer and have a separate argument. I would propose that we remove the existing functions and replace them with this "extra anchor argument" if this were the case.
  3. Remove all lifetime anchors altogether, requiring that the lifetime is explicitly constrained through some other means.

So with these three options, I'd like to take a look at what these functions are actually used for:

  • slice::from_raw_buf is basically just a "typesafe transmute" where you know precisely your input and output types.
  • ffi::c_str_to_bytes is like from_raw_buf, except that it also encapsulates some useful work (strlen).

I believe that all three options above will satisfy the basic needs of these two pieces of functionality, and the only question is how they communicate the output lifetime. Given this, I want to try to write out the cons of each approach:

  1. As mentioned above, the lifetime of the pointer is not always connected to the lifetime of the data itself. Often a pointer is manifested out of the blue (a return value from a function accesstor). Other times if a pointer is in a C struct, for example, it has the correct lifetime but may have an incorrect type (e.g. *mut T vs *const T). By casting you lose the reference's lifetime information. Finally, an anchor may not always be available (see my example)
  2. Anchors must always be provided, which does suffer from the same con as (1) where an anchor may not always be available. This method also reduces composability between copy_lifetime and the functions above. Strategy (1) requires composition (which the standard library heavily leverages) to acquire the same semantics of this strategy.
  3. As no anchor is available, the output lifetime can be wildly unsafe.

None of these seem to be a clear winner to me, and they all seem to have their footguns. I think that this may largely be a question of what footguns we want to buy into in some sense. I think I'm personally leaning towards option (3) because it provides the least amount of reason to use a raw mem::transmute which may in fact be more important than constraining the returned lifetime.

@mzabaluev
Copy link
Contributor Author

@alexcrichton I'm becoming convinced, too, that option(3) is a valid approach, if different on the balance of usability vs safety. Changing from that to the status quo reduced usability, but fell short in providing useful safety semantics. The documentation can warn about possible pitfalls and suggest practices to avoid them.

That solution invalidates the essential parts of this RFC. I can recycle the (so far) non-controversial bits into another proposal: revert to by-value input signatures and fix mem::copy_mut_lifetime.

mzabaluev added a commit to mzabaluev/rust-rfcs that referenced this pull request Jan 30, 2015
@alexcrichton
Copy link
Member

I think that actually all changes still fall under the purview of this RFC, going by the title:

RFC: Lifetime parameters for unsafe pointer conversions

We're still developing the lifetime parameter conventions! Feel free to just re-word the RFC in-place, I'm totally fine with that! (including the copy_mut_lifetime changes)

@Gankra
Copy link
Contributor

Gankra commented Jan 30, 2015

I am strongly in favour of option 3 at this point in time. raw ptrs already deref to &'static, so if you're using raw ptrs you already have to deal with this problem. Having some APIs randomly try to force you to think about it is pointless book-keeping for someone who has opted into the problems of raw manipulation.

In my experience, the only problem with lifetimes and raw types are when you store raw ptrs and forget to include a marker, and this should be solved by lints and #738.

Rewritten the RFC to reflect the emerging consensus.
@mzabaluev mzabaluev changed the title RFC: Lifetime parameters for unsafe pointer conversions RFC: Convention for constructing lifetime-bound values from raw pointers Jan 31, 2015
@mzabaluev
Copy link
Contributor Author

@alexcrichton I have rewritten the RFC, thanks a lot for your input.

Now, assuming this is accepted, would you or anyone in the core team have time to implement it any time soon? This feels like something I could try to do myself in a branch, but as it probably involves much manual work, I wouldn't want the effort to be duplicated.

@alexcrichton
Copy link
Member

Thanks @mzabaluev! I know I'd at least personally be up for implementing this if you're short on time, but you'd certainly have first dibs :)

@Gankra
Copy link
Contributor

Gankra commented Feb 1, 2015

One thing I've never really been clear on: Is the signature

unsafe fn from_raw_buf<'a, T>(ptr: *const T, len: uint) -> &'a [T]

really any more expressive than

unsafe fn from_raw_buf<T>(ptr: *const T, len: uint) -> &'static [T]

?

As far as I can tell, the former basically always spits out a static lifetime, which usually just unifies to something reasonable at a later function boundary, or when put in something that expects a particular lifetime. I suppose one could manually specify 'a in the function call, but I don't think I've ever seen someone do that. If we specify it as `static, we'd perhaps be lying less about what is being output.

@mzabaluev
Copy link
Contributor Author

@gankro The lifetime parameter may help catch errors when specified explicitly. Also semantically, I prefer the function signature saying "the lifetime is anything you want it to be" rather than "we say it's static, but you gotta remember it can be shorter than that".

mzabaluev added a commit to mzabaluev/rust-c-str that referenced this pull request Feb 1, 2015
The emerging consensus in Rust [RFC PR #556][RFC] is to allow freely
inferred lifetimes.

[RFC]: rust-lang/rfcs#556
@alexcrichton
Copy link
Member

I've talked this over with the rest of the team, and we've got pretty broad support for this change, so I'm going to merge this now. Thanks again for writing this all up @mzabaluev!

@alexcrichton alexcrichton merged commit 184daee into rust-lang:master Feb 4, 2015
Manishearth added a commit to Manishearth/rust that referenced this pull request Feb 6, 2015
 New functions, `slice::from_raw_parts` and `slice::from_raw_parts_mut`,
are added to implement the lifetime convention as agreed in rust-lang/rfcs#556.
The functions `slice::from_raw_buf` and `slice::from_raw_mut_buf` are
left deprecated for the time being.

Holding back on changing the signature of `std::ffi::c_str_to_bytes` as consensus in rust-lang/rfcs#592 is building to replace it with a composition of other functions.

Contribution to rust-lang#21923.
bors added a commit to rust-lang/rust that referenced this pull request Feb 6, 2015
New functions, `slice::from_raw_parts` and `slice::from_raw_parts_mut`,
are added to implement the lifetime convention as agreed in rust-lang/rfcs#556.
The functions `slice::from_raw_buf` and `slice::from_raw_mut_buf` are
left deprecated for the time being.

Holding back on changing the signature of `std::ffi::c_str_to_bytes` as consensus in rust-lang/rfcs#592 is building to replace it with a composition of other functions.

Contribution to #21923.
@Centril Centril added A-convention Proposals relating to documentation conventions. A-raw-pointers Proposals relating to raw pointers. A-unsafe Unsafe related proposals & ideas A-references Proposals related to references labels Nov 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-convention Proposals relating to documentation conventions. A-raw-pointers Proposals relating to raw pointers. A-references Proposals related to references A-unsafe Unsafe related proposals & ideas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants