-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Placement by return #2884
base: master
Are you sure you want to change the base?
RFC: Placement by return #2884
Conversation
I've questions about https://github.com/rust-lang/rfcs/blob/c34989cc81c0dbd96ca65a50fe987ad09f5a6251/text/0000-placement-by-return.md#guide-level-explanation Is there any good reason why We cannot improve the situation for |
There's no fundamental reason, no. It just requires more complex analysis (GCE versus GCE + NRVO), and rules that have to be nailed down as to when NRVO does or doesn't apply. This RFC tries to be minimalistic. |
After reading the RFC, I get the understanding that guaranteed copy elision doesn't actually guarantee copy elision for small (roughly register sized) types? This seems problematic, as the guarantee is no longer a guarantee, as it doesn't apply in some cases (though cases where the difference is basically not measurable). |
The downsides of future-possibility Split the discriminant from the payload can be mitigated by applying that split only at function boundaries. Rust already has two separate ideas of how values are represented- one for passing/returning them, and another for storing in memory when they have their address taken- that's the whole idea behind So if you want to return DST enum variants, you can return the discriminant in a register, and the actual unwrapped variant values the usual way (i.e. via a return pointer for large ones, or yet another register for small ones). Now you can "emplace" any or all of those variants using the same techniques as structs and arrays. |
@CAD97 also because placement new works by basically passing a hidden out pointer, I don't think that RVO is actually an optimization for types smaller than a word. On the main proposal, it would be best if we get the infallible case for RVO from a function, because can build some useful apis around that, like: That said, we could also make some more explicit apis for These explicit apis are harder to use, but they also completely side-step the issue of how to do fallible operations, because the "RVO"ed value isn't returned in the With the code in the playground, you could write something like this: let big_value = Box::try_write_new(|uninit| {
let value = some_fallible_op()?;
let mut init = uninit.write(BigValue::new(value));
init.modify(); // we can now safely use `BigValue` through `init`
Ok(init)
})?; Which is pretty nice. This will work if we have 3 guarantees,
I think that these guarantees are simple enough to make (especially since they seem to already work for all the simple cases I tried), so we could easily move forward with a proposal. With this, we can support fallible construction, and the simple case of We can support more elaborate schemes in the future, after we have some sort of basic RVO guarantees. |
Awesome to see another stab at this!
This is a really clever point 👍
One thing I've seen from that in C++ is that the usual advice became to essentially just always use Hmm, is there some way we could "overload"
When I first read this design choice it felt clearly-correct, but I ended up wondering more about it over the course of reading the RFC. I wonder if a different kind of constructor could be less overall churn (somehow) than new versions of anything that puts things in a Bx or vector or whatever. Maybe there'd be a way that only the types large enough to really care about this would have to make the new kind of constructor, but ordinary things wouldn't. Although as I type that, I guess any generic wrapper-constructor (like |
Interesting. What I heard was that emplacement should theoretically always be at least as fast as insertion, and it is often faster, but there are also a bunch of cases where emplacement is actually slower, won't even compile, introduces an exception safety issue or even introduces subtle UB (Effective Modern C++ covers all this in Item 42). So we just write whichever makes the code more readable unless we care enough to benchmark it. However, most of the reasons for those annoying cases (implicit type conversions, implicit expensive copies, |
If you're worried about length then |
Since you mentioned "generator" here, the RFC could be simplified a lot (hiding the state machine) if it can be explained in terms of #2033 generator. From what I understand, every let layout = Layout::for_value(&dst_object);
let slot = yield layout;
slot.copy_from_nonoverlapping(&raw dst_object as _, layout.size());
return <*mut Dst>::from_raw_parts(slot, ptr::metadata(&raw dst_object)); so e.g. fn with_multiple_exit_points() -> [i32] {
while keep_going() {
if got_cancellation() {
return [];
}
}
let n = 100;
[1; n]
} gets desugared as #2033 generators like fn with_multiple_exit_points___gen()
-> impl Generator<*mut u8, Yield=Layout, Return=*mut [i32]>
{
move |_: *mut u8| {
while keep_going() {
if got_cancellation() {
// BEGIN REWRITE -----------------------------------------------
let slot = yield Layout::array::<i32>(0).unwrap();
let p = slot as *mut i32;
return ptr::slice_from_raw_parts_mut(p, 0);
// END REWRITE -------------------------------------------------
}
}
let n = 100;
// BEGIN REWRITE -----------------------------------------------
let slot = yield Layout::array::<i32>(n).unwrap();
let p = slot as *mut i32;
for i in 0..n {
unsafe {
p.add(i).write(1);
}
}
return ptr::slice_from_raw_parts_mut(p, n);
// END REWRITE -------------------------------------------------
}
} and then pub struct ReadUnsizedReturnWithFinish<G> {
layout: Layout,
generator: G,
}
impl<T, G> ReadUnsizedReturnWithFinish<G>
where
T: ?Sized,
G: Generator<*mut u8, Yield = Layout, Return = *mut T> + Unpin,
{
fn new(mut generator: G) -> Self {
match Pin::new(&mut generator).resume(ptr::null_mut()) {
GeneratorState::Yielded(layout) => Self { layout, generator },
_ => panic!("generator completed without returning a layout"),
}
}
pub fn finish(mut self, slot: *mut u8) -> *mut T {
match Pin::new(&mut self.generator).resume(slot) {
GeneratorState::Complete(ptr) => ptr,
_ => panic!("generator returned too many layouts"),
}
}
pub fn layout(&self) -> Layout {
self.layout
}
} which can be used directly in the desugaring... fn function_that_calls_my_function() -> str {
println!("Hi there!");
my_function()
}
fn function_that_calls_my_function___gen()
-> impl Generator<*mut u8, Yield = Layout, Return = *mut str>
{
move |_: *mut u8| {
println!("Hi there!");
// BEGIN REWRITE -----------------------------------------------
let state = ReadUnsizedReturnWithFinish::new(my_function___gen());
let slot = yield state.layout();
return state.finish(slot);
// END REWRITE -------------------------------------------------
}
} and fn write_unsized_return_with___gen<T: ?Sized>(layout: Layout, f: impl FnOnce(*mut u8) -> *mut T)
-> impl Generator<*mut u8, Yield = Layout, Return = *mut T>
{
move |_: *mut u8| f(yield layout)
} |
That's a good question. On a specification level, GCE would take the form of guarantees that any code observing the address of the variable being returned (eg trait methods) would "see" the address stay the same. This might have implications for eg Pin types. However, upholding these guarantees for register-sized types would probably be terrible for performance (compared to just passing the return value in the rax register).
Can you elaborate? In particular, how do you think the following code should be handled, layout-wise? fn bar() -> Result<SomeType, Error> {
let my_data : Result<SomeType, Error> = get_data();
foo(&my_data);
my_data
} @ KrishnaSannasi
You can already do something similar (and safer) using the current proposal: let big_value = || {
let value = some_fallible_op()?;
Ok(Box::new_with(|| BigValue::new(value))
}() The hard case is when
Yeah, I had the same reaction. It gets even worse if we nail down semantics for placement return of One workaround would be to use the macros I proposed (eg But the fundamental problem is that a permanent solution would require non-transparent semantics; that is, a way to write a function thay says "I'm pretending to take a X, but I actually take a closure returning X" (in such a way that they're indistiguishable in calling code); something like I'm not sure the Rust community is willing to accept that kind of feature. |
But Assume, for example, that
We can specialize this for other representations of Now if someone writes
This also suggests a more precise framing for @CAD97's question about guarantees. We can restrict the guarantee of "has the same address before and after the return move" to types whose function-return ABI goes through memory. With a specified ABI, this would become a real answer; today it is a less-vague version of "roughly register sized." |
@rpjohnst So if I'm understanding your proposed semantics, you're saying that That would be one possible design, yeah. The downside is that it clashes with NRVO in non-obvious ways. In any case, it's beyond the scope of this RFC. |
I don't agree that it makes sense to punt fallible allocator support. After all, we surely want to support them eventually. Any design that supports fallible allocators can almost certainly be easily modified to support infallible ones, but the reverse isn't true. If this RFC is accepted, but we later discover that supporting fallible allocators requires a completely different design, we'll end up having to maintain two new sets of APIs in all the collections, on top of the existing non-placement-aware APIs. One set of duplicate APIs will already be a (well-justified) burden for language learners; there's no need to add another! Therefore, we shouldn't accept the RFC without at least collectively agreeing that fallibility can be done with the same base design. But if we've managed to agree on that, it's not much further of a step to just put it in the RFC. For what it's worth, I think fallibility can be done with the same base design, in the way @rpjohnst and others have been discussing. |
@kennytm There are pros and cons to using the generator syntax like you mention. On one hand, it's a little more concise; on the other hand, it requires keeping another hypothetical feature in mind; I'm not sure this helps readability. On the other other hand, I'm thinking if I wrote this RFC from the ground up now, I would use the yield syntax, so maybe I should do that anyway; I'm just not sure the readability improvements (if any) are worth rewriting half the RFC. It's understandable as it is. I'll fix the typos you mentioned. @comex To be fair, I think this RFC already does an adequate job showing that fallible emplacement isn't a dead end. It already includes two possible strategies for fallible emplacement, and mentions some pros and cons for each strategy. I'm going to explore the solution space a little deeper, but I don't think this RFC should commit to a future strategy or spend too much of its complexity budget on detailing a future extension. |
So, I'm not 100% able to follow this discussion, but from a user perspective, it's not super great to have there be a huge performance gap between IMHO, I'd rather be explicit if the magic isn't applied uniformly. In other words, the closure you pass in should take an argument of |
I don't think that we can fix this without changing the semanics of
This is kinda of what I proposed earlier, but in a more type-safe way. If we just passed a
Note: LLVM already performs this optimization, all we are doing here is making the guarantee that this optimization will always (reliably) be performed. |
You are right about the ordering-- I didn't even consider that part. I guess that my main concern is the semantics of movement here. Assuming we omit the "bad" case and boil stuff down to: Box::with(|| {
let x = initial_state;
f(&mut x);
x
}) Then conceptually, I see no difference between this and: let x = initial_state;
f(&mut x);
Box::new(x) Basically, you're arguing that there's a semantic ordering between function calls, and that Rust should automagically coerce function types into this special generator type. Whereas, I feel like it is more like semantically changing the meaning of bindings, and how the compiler could specially interpret let bindings, function parameters, and return values as referring to the same memory addresses for non- Because really, what you're proposing is treating it like this for a small subset of use cases. But the main issue is that this is adding new APIs that would advise against this being applied to other use cases in the future. Obviously, it makes sense to not strictly define that Rust never moves data around ever, but given the way Rust's ownership system works, I can't see a clear way to explain to a user why return values are special and parameters are not, especially when we have to specifically access the code inside the function in order to make the special changes that permit this behaviour. Not 100% sure if what I'm saying makes sense, but it's mostly where I'm at right now as far as thinking about this. EDIT: Essentially, you're saying that the provided closure is the one that must be modified to make the no-copy elision work, and not ever |
Note that this proposal explicitly does not guarantee NRVO, so your example would not do anything emplaced, and Yes, this does create a theoretical performance penalty for extracting a temporary value. But in practice, this shouldn't bite many people; if I don't think we should try to guarantee NRVO for your example. Why? Because NRVO is complicated to specify, and it's easily rewritable to not need it: let mut boxed = Box::with(|| initial_state);
f(&mut *boxed);
boxed In fact, we could probably get away without actually doing anything special for (Edit: to be clear, I'm not against having nrvo and friends as an optimization; I just think guaranteeing it is not necessary.) |
Aside from guarantees, there are more complex cases cases for NVRO too, so an
In In I suppose an As an aside, we'd exploit better NRVO optimizations, and "dissociated" NRVO enums, lots in cryptography where folks often return types like |
There is a difference, because at least some part of the code in Of course, one could argue that in your second snippet, the compiler could reorder the calls so that But the semantics aren't trivial. tl;dr The fundamental reason the RFC is written that way is that space must be allocated before data can be emplaced into it; and returning data from closures that are called inside of the functions allocating the data is the simplest way Rust can express that (yet). |
In release mode it doesn't. |
I see… But even in debug mode, rustc is apparently able to differentiate between a closure whose type can escape to downstream crates and one which can't, and mark only the latter However, it's a moot point if sharing is disabled in release mode anyway. |
🤔 The RFC and comment mentioned async trait very few times and no one actually elaborated on it. So I was thinking about the "dyn async traits" blog series (1, 2, 3, 4, 5). As of nikomatsakis/dyner#2 the dynamic version is always returning a In terms of this RFC we want to make the dynamic impl<'a, Item> AsyncIter for dyn AsyncIter<Item = Item> + 'a {
type Item = Item;
type Next<'this> = dyn Future<Output = Option<Item>> + 'this // <- associate with unsized type
where
Self: 'this,
Item: 'this;
fn next(&mut self) -> Self::Next<'_> { // <- use unsized return
let (data_ptr, vtable) = (self as *mut _).to_raw_parts();
(vtable.next)(data_ptr) // <- delegate to function returning DST
}
} The question is how do we represent the type of struct AsyncIter_VTable {
...
next: fn(*mut ()) -> dyn Future<Output = Option<Item>> + '_, // ??????
} In this RFC such functions becomes a state machine structure, whose size and operations depend entirely on the function itself. AFAIK the size of this structure is close to the actual stack size needed, so we can struct ErasedUnsizedFn<Args, Return: ?Sized> {
self_layout: Layout,
drop: unsafe fn(*mut ()),
init: unsafe fn(*mut (), Args),
start: unsafe fn(*mut ()) -> Layout,
finish: unsafe fn(*mut (), *mut u8) -> &mut Return,
}
struct AsyncIter_VTable {
...
next: &'static ErasedUnsizedFn<(*mut (),), dyn Future<Output = Option<Item>> + '_>,
}
impl<'a, Item> AsyncIter for dyn AsyncIter<Item = Item> + 'a {
...
fn next(&mut self) -> Self::Next<'_> {
let (data_ptr, vtable) = (self as *mut _).to_raw_parts();
unsafe {
let gen: *mut () = alloca(vtable.next.self_layout);
(vtable.next.init)(gen, (data_ptr,));
defer!((vtable.next.drop)(gen));
let layout = (vtable.next.start)(gen);
write_unsized_return_with(layout, |slot: *mut u8| {
(vtable.next.finish)(gen, slot)
})
}
}
} The coroutine nature of this RFC forces the arguments For struct AsyncIter_VTable {
...
next_vtable: &'static Future_VTable,
next: fn(*mut u8, *mut ()),
// ^~~~~~~ ^~~~~~~
// slot self
//
// for large struct this is ABI-compatible with fn(*mut ()) -> S.
// for small struct fitting into registers we may need to add a shim.
}
impl<'a, Item> AsyncIter for dyn AsyncIter<Item = Item> + 'a {
...
fn next(&mut self) -> Self::Next<'_> {
let (data_ptr, vtable) = (self as *mut _).to_raw_parts();
unsafe {
write_unsized_return_with(vtable.next_vtable.layout, |slot: *mut u8| {
(vtable.next)(slot, data_ptr);
&mut *<*mut _>::from_raw_parts(slot, vtable.next_vtable)
})
}
}
} |
What is the status of this RFC? Has another one taken over since the last message? These optimizations seem very important for the language, maybe it's not the ideal solution, but I'd like to know what the state of the art is. |
MIR optimization has continued to improve, and I think "destination propagation" serves some of the intended benefit of placement by return. The async trait case is making its own progress as well, though there the discussion is currently mostly around a " Guaranteeing placement remains a complicated question with a lot of moving parts, so at the time being it seems more beneficial that people's effort is focused on providing move elision best effort where it is easier to provide rather than the legwork to guarantee it where the cost benefit trade-off is more complicated. |
Can you give any link to talks/resources about MIR optimization? I guess that destination propagation is the "hidden-way" of doing what this RFC suggests, am I right? |
I am currently trying to remove |
I'm very much not the kind of person to be talking about the technicals of anything
On the one hand, I can see such an attribute fitting in like On the other hand, I'm reminded of both the noise of things like Java's ...made worse by how I often find (And, given that, even for CLI utilities, I need to put most of my code in
Given how many misunderstand how to use
See my prior comment about my concern with Rust unnecessarily becoming too |
Continuing discussion on zulip. |
I posted a sketch of an alternative to the generator design, that addresses some of the pain points of this RFC for functions that return unsized types, on IRLO. |
Rust should maybe add some method resembling
that makes reset methods unnecessary by ensuring zero copies. There is considerable boilerplate like the
|
Such function would need to abort if |
And #1736 |
For the "return Result" usecase, I wanted to point out a possible angle. Notice that the desugaring into two functions is equivalent to desugaring into a single function with signature pub trait ReadUnsizedReturnWithFinish<T: ?Sized> {
pub fn layout(&self) -> Layout;
pub fn finish(self, &mut MaybeUninit<T>);
} My understanding is that most of the RFC is about adding compiler magic so that functions that return In that reading, the solution to manage impl Box<T> {
fn emplace(x: impl ReadUnsizedReturnWithFinish<T>) -> Box<T>;
}
fn foo() -> Result<impl ReadUnsizedReturnWithFinish<T>, E> { ... }
fn use_foo() -> Result<(), E> {
let box: Box<T> = Box::emplace(foo()?);
...
} More generally, this RFC feels pretty similar to |
I remember there is another (pre-)RFC proposing to implement this using generator, which is the implementation detail of async fn. |
This RFC seems to only guarantee the copy elision in the return value of functions, however, a generalized guaranteed copy elision, in terms of C++, will include the following case: let v = S{field1:xxx, field: yyy}; This directly initialized the |
There is no need for an RFC to handle that case, as that temporary is not observable (unlike in C++). It can be eliminated as an optimization without language changes. |
Should tuples be added to that? Or are they already considered a "struct literal"?
|
Rendered.
Glossary: