-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Autoreferencing Copy
Types
#2111
Conversation
text/0000-autoref-copy.md
Outdated
|
||
fn main() { | ||
let mut x = 5; | ||
let y = u8Ref(x); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this rather unexpected. I thought this RFC meant that Copy
types are copied at the site and create a reference to the copy. That is how such a function call looks to me, I'm moving the value into the function, but since it's Copy
, it will be copied and I retain the original.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the owner of x
, you would have to trust u8Ref(x)
not to mutate x
.
(sorry for the misunderstanding)
That x
is borrowed is indeed surprising.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@repax: this RFC only allows coercion from T
-> &T
, not &mut T
, so u8Ref
can't mutate x
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the lifetime of the borrow was limited to a function call I think I would be less confused:
let mut x = 5;
let y = foo(x);
// ^ autoreference of `x` occurs here
x = 7; // `y` is unrelated to `x` so this is ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cramertj is there a guarantee that we'll never get a Copy-able UnsafeCell like wrapper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rust-lang/rust#25053 was active not that long ago, seems there's at least some support for making UnsafeCell implement Copy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the copy-ability of UnsafeCell
is a somewhat distinct question. The last time we discussed it in depth, the conclusion was that we ought to address with a lint. i.e., we should have the ability to make things copy, but warn if they are actually copied. =) I still think this makes sense, and it would apply here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am torn about whether to treat this auto-ref as borrowing the actual variable. I too find it somewhat surprising; I think I might prefer that we always create a temporary (with the standard lifetime). It would certainly suffice for the cases that are motivating the RFC; it seems like those cases where it would not suffice are also kind of confusing and unexpected to me (i.e., a reference that lives longer than the current statement).
It also seems more compatible with possibly extending this "auto-ref" to types beyond Copy
types. At least the way we had previously talked about that, we wanted foo(x)
to be loosely equivalent to { let y = x; foo(&y) }
, but with better error messages indicating that one can add an &x
if one must retain ownership. I am still interested in thinking that over, but it seems incompatible with the approach of this RFC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm currently also leaning towards making a copy every time instead of borrowing. That seems like:
- the only way to prevent the combination of Copy+UnsafeCell and this RFC from allowing "implicit mutability", which seems like a terrible footgun, and
- the only way to prevent the "found T, expected &T" errors we get today from turning into borrow checker errors (which are typically much harder to understand and fix, no matter how good the error message)
If that extra copy is somehow undesirable, iiuc you can always add the explicit & even after this RFC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make an implicit copy, but promise that the optimizer will almost always eliminate that copy in practice, except in cases like UnsafeCell?
text/0000-autoref-copy.md
Outdated
`Copy` types will autoreference: at | ||
[coercion sites](https://github.com/rust-lang/rfcs/blob/master/text/0401-coercions.md#coercions), | ||
when a reference is expected, but an owned | ||
`Copy` type is found, an `&` will be automatically inserted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this will work?
trait Trait {}
impl Trait for u8 {}
let x: &Trait = 0u8;
// A reference to `Trait` is expected and `u8: Copy` is provided (but `u8 != Trait`) => a `&` is still inserted
// let x: &Trait = &0u8;
// Now coercion `&u8 -> &Trait` can apply.
// Nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to my answer below, I would expect this to work since coercions are supposed to apply transitively.
The question about multiple references again. Does this work? fn f(arg: &&&u8) {}
f(0u8); // Desugared into `f(&&&u8)`
// Note that reference types themselves `&T` are `Copy`. This needs to be clarified in the RFC text. |
|
Yes, but not all currently-existing coercions are applied transitively. The original coercions RFC says that they should be, though, so that is the behavior I would expect. |
Any use cases for wanting to pass a Copy value by reference instead of by value? The example given is not really convincing; an u8 is one byte and gets most likely passed via one register anyway. Its also inconsistent with non Copy types which really should not get autoreferencing. |
@est31 The concern here isn't the performance implications-- the concern is making it easier and more ergonomic to work with Although it's more of a long-term goal, and I've brought it up in the unresolved questions, I think that it would be good to provide some more explanation of this process in the RFC itself. I'll amend the RFC soon to add more details. |
I know that the RFC intends to make passing Copy types per reference easier. My question was: Why would you want to pass a Copy type per reference instead of per value?
I'm not sure whether its good to allow comparing &T with T. In C this means something much different. |
Usually, you wouldn't. However, cases often show up in generic or autogenerated code where a function or type expects
We already implement operators for |
Is it possible to design this feature in a general manner, not limited in "auto-reference"? I would expect a trait in std library which exposes this feature to be user-extendable. Such as:
I wouldn't mind if it is a lang item. Whenever we need an "implicit type conversion", compiler will call this method. For Maybe it is dangerous to allow users to define implicit type conversions. |
text/0000-autoref-copy.md
Outdated
`Copy` type is found, an `&` will be automatically inserted. | ||
|
||
If the `Copy` value is a temporary, its lifetime will be promoted to meet the | ||
required lifetime of the reference (where possible). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this sentence at odds with the behavior specific in RFC 66 etc. Specifically, the sentence as written suggests to me that we will infer the appropriate lifetime for the temporary, which would be quite at odds with the rest of the language. I think I would say it more like this:
If the value to be borrowed is an rvalue (e.g., a constant like 1
), then a temporary will be created for it, just as if one explicitly wrote the &
(e.g., &1
). This temporary will have the same lifetime that would result from an explicit &
, which generally means that it lives until the end of the innermost enclosing statement (until the next ;
) -- but in some cases it may live longer. See RFC 66 and @nikomatsakis's amendment for more details.
Note that a consequence of this is that in some cases there will be implicit borrows that might be surprising. An example:
fn foo(x: &i32) -> &i32 { x }
let mut x = 22
let p = foo(x); // implicitly: borrows `x`
x += 1; // ERROR -- `x` is borrowed
print(p);
As has been mentioned before, we should consider the compatibility with the discarding ownership idea. |
I think if we interpret this RFC as "make a copy and borrow the copy" (which I think we should anyway) then the discarding ownership idea is actually a strict superset of this proposal, since that idea is basically "do a move and borrow the move" for all types (and it's already the case that "moving" a Copy type means copying it). So I'd imagine we can easily do that later. Hopefully I'm not missing some important complication there. |
In addition to solving the |
`T: Copy`: | ||
- `T` to `&mut T`: This conversion has the potential to introduce hidden | ||
mutations. With this change, passing a variable to a function could allow | ||
the function to change the value of the variable, even if no `&mut` is present. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As already mentioned elsewhere I think, using &
does not exclude this. For example, Cell
could well be Copy
, and it is possible for a user crate to soundly implement a Copy
Cell
today.
I think this RFC should take the possibility of Copy
interior mutable types into account.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically, this is not possible, because UnsafeCell
is not Copy
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't this get fixed recently? Or am I imagining it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not Copy
yet, but AFAIK there is any good reason for that. So this RFC should work under the assumption that UnsafeCell
can be Copy
.
EDIT: Also see rust-lang/rust#25053
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find @RalfJung's logic compelling -- we should consider interior mutability. That said, when we last discussed the Copy question, we had thought to add a lint to reflect the "can copy" vs "should copy" distinction (similar to "must use"). I thought this was a good plan, but it never happened. If it had -- and of course it still can! -- it could certainly apply here.
mutations. With this change, passing a variable to a function could allow | ||
the function to change the value of the variable, even if no `&mut` is present. | ||
- `&mut T` and `&T` to `T`: This conversion would cause extra copies to occur | ||
which could be difficult to identify when attempting to optimize code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the fact that copying is already implicit in many places, I don't think this is a strong concern. I would argue that if copying a type is actually noticeable in cost, it should not be Copy
-- then the clone
calls will make it obvious where the cost is incurred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I agree with everything you said, it seems like the PoR is for [i128; 1_000_000]
to be Copy
...
We've generally tried to avoid "open-ended" magic, preferring instead that overloaded operators be constrained by the type signature to do fairly harmless things. For example, the |
There have been a lot of interesting responses to this RFC. I'll do my best to summarize and respond here. Copying values firstMany people reading the RFC expected to see values copied before being autoreferenced. Initially, I was opposed to this idea since it seems counter-intuitive to some of the RFC's goals, such as making it easier to work with (potentially large) Additionally, introducing removable copies could make it hard for users to optimize their code. If a user wanted to prevent excess copies, they would need to search each function call to check where items have been passed by-value that could have been passed by-reference. This seems like a frustrating and error-prone process. That said, copying values first would avoid the problem of unexpected borrows. Personally, i don't find this to be too motivating, as I think such cases would be rare in practice, and good error messages would make them easy to fix. Interaction with
|
This already happens now. If there's a generic function taking a |
Uhm, no, |
Agree with @oli-obk . C++ has copy by default everywhere, and it sucks bad for performance. Rust therefore has move by default everywhere. But there is one exception for Rust, types that implement If you pass copy types per-value they get copied, not moved. This is a decision done in a trade off between ergonomics and performance. The proposed copy-then-autoref behaviour is the same behaviour we have right now for by-value passing of arguments: fn bar(_v: [u32; 64]) {
}
fn foo() {
let a = [0; 64];
bar(a); // Creates a copy of a
bar(a); // Creates a second copy of a
} With copy-then-reference, this would look like: fn bar(_v: &[u32; 64]) {
}
fn foo() {
let a = [0; 64];
bar(a); // Creates a copy of a, then references it
bar(a); // Creates a second copy of a, then references it
} I definitely support a guaranteed optimisation or a lint that warns you if no optimisation could be performed, its a great idea. But I think its not really neccessary, mainly because the main motivation for this RFC seems to be to make generic interfaces that take references more ergonomic to use for Copy types, e.g. the usual "indexing HashMap<u32,T>" case. |
So @eddyb was talking to me on IRC this morning. He was advocating for a more general approach to autoref -- in particular, I have long been against this, in part stemming from prior experience where I found it helpful to have an idea of where moves occur without having to look at the definition of callees (this was back in the days of "cross-borrows", in which we would silently coerce However, people often bring it up, and talking to @eddyb I started to wonder if I was being too harsh. It would be nice, after all, if I could do I do think that this starts to erode a certain clarity that we have around moves and borrows. It might impede learnability -- this is a bit hard for me to determine. I know that when I teach about moves, I almost always get questions about why (I've also not thought through how this interactions with generics and other coercions. Certainly not knowing, in general, what is a move and what is not might make lots of stuff more complicated.) Regardless, it's worth considering as another way to build on this RFC. If we wanted to go this way, then it would suggest some value towards not making |
@nikomatsakis I'm mostly supportive of this RFC as it limits auto-referencing to a small subset of cases, but I think extending this beyond Copy types and creating references without asking for all types except a small minority would be really bad. Being explicit about the difference between references and non-references seems part of what a systems language is for me. As someone with a C/C++ background, I think it was one of the mistakes of C++ to have introduced references without making creation of them happen with any discernible syntax. That being said, I do definitely favour adding a |
I think the reason this was a mistake is that it lets the callee change the value of my variable. That's not the case here, it would only be done for shared references and types with no interior mutability. For me, that makes all the difference. |
Operators are traits, coercing on them would be doing the same thing we do with method receivers which is quite a slippery slope. I'd rather wait for a better specialization that allows us to make impl<T: Borrow<i32>, U: Borrow<i32>> Add<T> for U |
That's exactly why we should coerce on them - they are similar enough to method receivers, where we already do coercions.
Transitive traits are far off in the future - I'm not aware of any plans to implement them. |
Thanks for the correction, that was a misconception on my part. If we can't specify it in the impl then auto-derefing and auto-refing for operators is an interesting idea. I see it as complementing this rfc since there is no intersection between what that solves and what this rfc solves. EDIT: But with intersection impls we could have EDIT2: Also if we start coercing operators then we can now add two |
One motivation for this RFC was that if/when we add coercion to operators, copy types will autoref in operators. I addressed this in the unresolved questions section of the RFC. |
The reason the LHS of operators isn't a coercion site is because it breaks expected type propagation (it's actually very close to a standard "generics break coercions" case). The RHS of operators is currently a normal coercion site, but we could change that somewhat. |
I somewhat agree with @arielb1, and I'm worried we may not have taken a full view of all the trade offs in this decision. There are four obvious points we could sit on in the space of "automatically referencing arguments":
All of these have downsides. Its not obvious to me that this RFC is the best position (right now, any of them but the last seems potentially viable to me). I believe we went with 2 instead of 3 here because of this concern: Let's say I do That's a valid concern, but its also a concern to me that we're going to exacerbate the existing pedagogical difference between |
@arielb1 Woah, that's a surprise for me. I noticed that if you add another impl for the same type, the compiler will no longer coerce and instead error with "trait not implemented", as in this example. Is this all by design? @withoutboats Again the point here is not to make an absolute decision but to experiment with a minimal version so that we may be better informed to decide where we want to sit. This RFC is not simply option two, it's precisely the intersection of the last two options. |
@leodasvacas I believe each of those is actually forward compatible with all subsequent items (that is, we could also do 3, and relax it to 4, without breaking any code). |
@withoutboats I agree that semantically moving is forwards compatible with not semantically moving. However semantically copying is forwards incompatible with not semantically copying. Consider that |
I don't like this (because I don't like implicitness). |
Reading these comments has been really interesting. Great stuff, everyone! This feature is clearly useful. However, it's also clearly not completely thought through to its final ramifications, including interactions with other potential changes. Postpone? eRFC? Extended period behind a feature gate? |
I too am wary. I think that there are obvious pain-points being addressed here, but it's also not obvious that this is the most elegant fix. Also, interestingly, I've been observing the type errors I get lately, and it seems that most of the time the error I get is actually when I have a That said, I did propose to merge, and I did so primarily because I felt like this is the kind of thing where we need to gain some "real-time" experience. Now I'm rethinking that and wondering if this is a case where an eRFC would be appropriate, but one with a broader scope: that is, basically, some kind of experimentation aiming to reduce the problem of "&T vs T" mismatches writ large. That would probably cover all of the above:
The goal would be to try out different models (on a strictly opt-in, experimental basis, obviously) and explore the tradeoffs. Ideally we would have @joshtriplett actively involved =) since he seems to have lots of scenarios in mind where the difference between ref and not-ref is vital to performance (I'd like to see examples and try to run various proposals across them and see what happens). The eRFC process seems suitable because we could come back with a proposed set of rules, once we have more data and a better feeling for what they ought to be.
Well, the code will compile, but I still don't think all those variants are forwards compatible. In particular, there are subtle changes with respect to when destructors run. For example, if we say that |
@nikomatsakis I think that's entirely the wrong direction to head into. Sure, experimentation is great, but I already know now that I don't want these things in the language, ever. If you are trying to teach Rust to people who don't even know the difference between T and &T then removing that difference in a few subtle cases where its barely noticeable won't help with those people learning the language! And for autoclone... that's just a tiny little papercut and its definitely not worth the special case and added complexity. |
@nikomatsakis I wholeheartedly approve of this plan for experimentation -- it's essentially what I was hoping to do for coercion/generics interaction as well. I think it's clear that there are some real problems here and some plausible solutions (not just in this RFC), but hands-on experience would be helpful in pinning down the tradeoffs. There is a procedural question here: the RFCs this would apply to currently specify substantial technical detail. I'm not sure that we'd want to merge them as-is, but I also don't think it makes sense to write a fresh eRFC at this late date. I wonder if the right approach is "Closed as experiment-first" or some such? |
@est31 The difference between In my experience teaching rust, people understand |
I would be ok with closing the RFCs as "approved for experimentation" and opening a tracking issue that has links to both of them for posterity (or, maybe even better, opening up a GH repo -- perhaps in rust-lang-nursery -- that includes their text, and where future plans can be laid, but I'm not sure about that). (One thing i'm not clear about, incidentally-- if @arielb1 or @cramertj were to delete their GH accounts, presumably, it would mean the text of these RFCs disappears?) |
@leodasvacas &i32 vs i32 doesn't make much difference semantically, but it does have performance implications. Apart from that, it's useful to be able to build a mental model of how things work, even if it doesn't capture every detail of the actual implementation. For example, in Python, my mental model is that every value is a pointer to a structure that contains a hashmap of properties and methods. In Java, everything is a pointer except for the primitive types. I think when you try to confuse these, it could cause problems. |
I wonder if not LLVM is smart enough to optimise away references to primitive types in local expressions (e.g. iterators). If so, then this isn't so much of a problem. For complex |
Difference between i32 and &i32 is only really “meaningless” in scenarios
where the reference is pointing at a regular memory. It could also very
well point to MMIO region or even some unmapped pages that get mapped on
demand/access.
In cases like these auto-deref becomes implicit auto-code-execution, and
with this RFC people wanting to keep the explicitness would have to resort
to raw pointers and therefore -- littering their code with unsafe blocks
all over.
…On Sep 14, 2017 10:52 PM, "repax" ***@***.***> wrote:
&i32 vs i32 doesn't make much difference semantically, but it does have
performance implications.
I wonder if not LLVM is smart enough to optimise away references to
primitive types in local expressions (e.g. iterators). If so, then this
isn't so much of a problem. For complex T types references are probably
more performant anyhow.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#2111 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AApc0p8TnlHQRBKGFPCmxnFCAC6NziTkks5siYQBgaJpZM4O4hPT>
.
|
@nagisa Rust already assumes that successful reads are side-effect-free. The compiler is free to insert spurious reads or reorder reads around other effectful operations, and it tells LLVM so. In corner cases like you describe, you already have to use more explicit operations. |
Well yes, you would use the volatile read and write functions, but that
does not prevent such a reference from being automatically dereferenced by
user mistake.
I mean, sure, I don't disagree that there would be better and less
footgunny ways to write down the same thing (a wrapper type perhaps) -- I'm
just pointing out that the difference between two types is not always...
Meaningless.
On Sep 15, 2017 10:26, "Ralf Jung" <[email protected]> wrote:
@nagisa <https://github.com/nagisa> Rust already assumes that successful
reads are side-effect-free. The compiler is free to insert spurious reads
or reorder reads around other effectful operations, and it tells LLVM so.
In corner cases like you describe, you already have to use more explicit
operations.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2111 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AApc0i810pJXWqV8mRT4Jn3edRvnoQj5ks5siiafgaJpZM4O4hPT>
.
|
The compiler is free to insert spurious reads. For example, if you write fn foo(x: &i32) {
if some_condition { let x = *x; ... }
} then it is fine for this code to be translated into fn foo(x: &i32) {
let x = *x;
if some_condition { ... }
} So, the guarantee you are trying to protect against is already non-existent. (Or, if LLVM doesn't do this despite all the annotations we give it, the guarantee is non-existent in many of the Rust semantics that are being proposed.) |
@rfcbot fcp postpone OK, having thought more on this, I feel better and better about moving towards an experimentation model here, as I described earlier. Therefore -- although FCP completed, I am going to make an alternate proposal, and move to postpone this RFC, with the intention of immediate experimentation. In other words, the intention is to do the following:
|
@nikomatsakis you have to cancel the existing FCP before proposing a new one. That said: given that this decision effects multiple in-flight RFCs, and that we don't plan to open a separate eRFC (rather, just a direct tracking issue), I'm going to go ahead and close out the relevant RFCs and get the tracking issue set up. Thanks, all, for the illuminating discussion so far! Hopefully trying out some of these ideas behind a flag will give us additional clarity on the tradeoffs, before we come back with a fresh RFC. |
This RFC allows autoreferencing of T where T: Copy.
Example:
Rendered