Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generator size: unwinding and drops force extra generator state allocation #59123

Closed
Nemo157 opened this issue Mar 12, 2019 · 42 comments · Fixed by #61922
Closed

Generator size: unwinding and drops force extra generator state allocation #59123

Nemo157 opened this issue Mar 12, 2019 · 42 comments · Fixed by #61922
Assignees
Labels
A-async-await Area: Async & Await A-coroutines Area: Coroutines AsyncAwait-Triaged Async-await issues that have been triaged during a working group meeting. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@Nemo157
Copy link
Member

Nemo157 commented Mar 12, 2019

#![feature(generators, generator_trait)]

use std::ops::Generator;

struct Foo([u8; 1024]);

impl Drop for Foo {
    fn drop(&mut self) {}
}

fn simple() -> impl Generator<Yield = (), Return = ()> {
    static || {
        let first = Foo([0; 1024]);
        let _second = first;
        yield;
    }
}

fn complex() -> impl Generator<Yield = (), Return = ()> {
    static || {
        let first = Foo([0; 1024]);
        { foo(); fn foo() {} }
        let _second = first;
        yield;
    }
}

fn main() {
    dbg!(std::mem::size_of_val(&simple()));
    dbg!(std::mem::size_of_val(&complex()));
}

The two generators returned by simple and complex should be equivalent, but complex takes twice as much space:

[foo.rs:29] std::mem::size_of_val(&simple()) = 1028
[foo.rs:30] std::mem::size_of_val(&complex()) = 2056

Dumping out the MIR (with rustc 1.34.0-nightly (f66e4697a 2019-02-20)) shows an issue with how unwinding from foo interacts with the two stack slots for first and _second, using a dynamic drop flag means that first is "live" through the path that goes through the yield, even though the drop flag is guaranteed to be false. (The below graph shows the basic blocks, with the psuedo-code run in them and which variables are alive when exiting the block):

MIR graph

@Nemo157
Copy link
Member Author

Nemo157 commented Mar 12, 2019

@rustbot modify labels: A-generators and T-compiler.

@rustbot rustbot added A-coroutines Area: Coroutines T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 12, 2019
@matprec
Copy link
Contributor

matprec commented Mar 23, 2019

Is this related to #52924?

@Nemo157
Copy link
Member Author

Nemo157 commented Mar 23, 2019

@MSleepyPanda in as much as it’s about generators being too big. The specific optimisation proposed there won’t help here as first and second are both live over the same yield point. What should really happen is that first is not kept live across the yield at all and it should be allocated in the resume function stack instead of the generator state. (And then some sort of copy-elision optimisation might eliminate that allocation and use the allocation in the generator state directly, but IMO that’s less important (and probably more difficult) than ensuring the memory usage is reduced).

@cramertj
Copy link
Member

cramertj commented Apr 2, 2019

cc @tmandry

@cramertj cramertj added the A-async-await Area: Async & Await label Apr 25, 2019
@andreytkachenko
Copy link

Another example, without drop:

#![feature(generators, generator_trait)]

use std::ops::Generator;

struct Foo([u8; 1024]);

fn simple() -> impl Generator<Yield = (), Return = ()> {
    static || {
        let first = Foo([0; 1024]);
        let _second = first;
        yield;
    }
}

fn complex() -> impl Generator<Yield = (), Return = ()> {
    static || {
        fn foo(_: &mut Foo) {}
        
        let mut first = Foo([0; 1024]);
        foo(&mut first);
        let mut second = first;
        foo(&mut second);
        let mut third = second;
        foo(&mut third);
        let mut _fourth = third;
        
        yield;
    }
}

fn main() {
    dbg!(std::mem::size_of_val(&simple()));
    dbg!(std::mem::size_of_val(&complex()));
}

outputs

[src/main.rs:32] std::mem::size_of_val(&simple()) = 4
[src/main.rs:33] std::mem::size_of_val(&complex()) = 3076

@nikomatsakis nikomatsakis added the AsyncAwait-Polish Async-await issues that are part of the "polish" area label Jun 4, 2019
@tmandry
Copy link
Member

tmandry commented Jun 12, 2019

I was thinking, we can solve this by adding the following rules to our MaybeStorageLive dataflow analysis (possibly being renamed to RequiresStorage):

  1. (Existing) StorageLive(x) => mark x live
  2. (Existing) StorageDead(x) => mark x dead
  3. If a local is moved from, and has never been mutably borrowed, mark it dead
  4. If (any part of) a local is initialized, mark it live

We must not optimize away storage of locals that are mutably borrowed, because as @matthewjasper notes in #61430, it isn't decided that the following is UB:

let mut x = String::new();
let p = &mut x as *mut String;
let y = x;
p.write(String::new());

It's an open question of whether we can say "the local hasn't been mutably borrowed up to here" when evaluating rule 3. I'd prefer to make the optimization as smart as we can, but MIR probably allows borrowing a moved-from value and mutating it.

Is this sound?

cc @cramertj @eddyb @matthewjasper @RalfJung @Zoxc

@eddyb
Copy link
Member

eddyb commented Jun 12, 2019

The "mutably" of "mutably borrowed" is a red herring IMO, unless you want to check for Freeze, which will conservatively default to "may contain interior mutability" once generic parameters are thrown into the mix.

@RalfJung
Copy link
Member

RalfJung commented Jun 12, 2019

@tmandry Interesting. How bad would it be to relax this to "if a local is moved from and never has had its address taken"? Then we can be sure without any assumptions about Stacked Borrows that direct accesses to the local variable are the only way to observe it, and those will be prevented after a move. This would also alleviate @eddyb's concern I think.

Also, what is the granularity here? Without Stacked Borrows assumptions we can only do this on a per-local level, not on a per-field-of-a-struct level. Taking the address of one field leaks the address of other fields (if layout assumptions are made).

@cramertj
Copy link
Member

Taking the address of one field leaks the address of other fields (if layout assumptions are made).

Oh, really? I suppose that makes sense in a "list of things you can do" sense, but it's surprising and seems like it would make optimizations like SROA hard.

@RalfJung
Copy link
Member

Well, Stacked Borrows might restrict this. But then we get into rust-lang/unsafe-code-guidelines#134. So for now I'd say as compiler authors we should assume that it does leak the entire local.

@tmandry
Copy link
Member

tmandry commented Jun 12, 2019

The "mutably" of "mutably borrowed" is a red herring IMO, unless you want to check for Freeze, which will conservatively default to "may contain interior mutability" once generic parameters are thrown into the mix.

Oh right, well Freeze is an option. I don't see why we need to rule out types with generic parameters, at least based on this definition. But if you want to maintain flexibility on the definition then I suppose we could.

if a local is moved from and never has had its address taken

Yes, but how would we enforce this? A borrow can be passed to another function, which can convert to a pointer.

I'd rather not disable the optimization when a borrow escapes the function. One practical reason for this is that std::mem::size_of_val(&x) is used to benchmark the size of futures, and I don't want that to have an impact on optimizations. I don't think Rust futures should have their own uncertainty principle.

Also, what is the granularity here? Without Stacked Borrows assumptions we can only do this on a per-local level, not on a per-field-of-a-struct level. Taking the address of one field leaks the address of other fields (if layout assumptions are made).

I'm only proposing to do this on a per-local level, for now.

@cramertj
Copy link
Member

@tmandry

Yes, but how would we enforce this? A borrow can be passed to another function, which can convert to a pointer.

I think that @RalfJung was including & as an operator which would "take the address" of a local. That said, I agree with you that we can do better than this and only include & if T: ?Freeze.

@RalfJung
Copy link
Member

RalfJung commented Jun 12, 2019

Yes, but how would we enforce this? A borrow can be passed to another function, which can convert to a pointer.

"the address taken" = "has the & operator executed". That should be trivial to enforce?

Basically what @cramertj said. Doing "better than this" requires committing to parts of Stacked Borrows. I am not sure if @rust-lang/lang is ready for that. For sure it shouldn't happen en passant as part of a PR like this.

@tmandry
Copy link
Member

tmandry commented Jun 12, 2019

"the address taken" = "has the & operator executed". That should be trivial to enforce?

Then yes, that's easy to enforce.

Doing "better than this" requires committing to parts of Stacked Borrows.

Can you expand on this? How would checking for Freeze require committing to Stacked Borrows?

@cramertj
Copy link
Member

cramertj commented Jun 12, 2019

@tmandry I'm guessing @RalfJung is referring the fact that it commits us to things like this being UB:

let mut x = String::new();
let addr_x: *const String = reference_to_pointer(&x);
drop(x);
ptr::write(addr_x as *mut String, String::new());

@RalfJung
Copy link
Member

RalfJung commented Jun 12, 2019

@cramertj indeed that's what I mean. This would be the first new exploitation of "reference-based UB" since... forever, I think. It should happen deliberately and prominently and have its own discussion.

@tmandry
Copy link
Member

tmandry commented Jun 12, 2019

@RalfJung Hmm okay, then I think we should start having that discussion soon (in a different issue, probably).

This relates to the maturity of await. It's common practice to debug the size of your futures by calling size_of_val on the inner futures. In this state of the world, doing so would increase the size of your outer future! I'm uncomfortable shipping a feature in that state, at least without a plan for making it better.

@RalfJung
Copy link
Member

RalfJung commented Jun 12, 2019

@tmandry notice that the moment your future is generic or has a Cell local, taking a shared reference will have to pessimize such optimizations due to the necessary is_frozen check.

@RalfJung
Copy link
Member

Also, won't it usually await the inner future, so anyway its address has been taken mutably?

@cramertj
Copy link
Member

is generic

I'm not sure I understand this bit-- couldn't the optimization be monomorphization-dependent?

@matthewjasper
Copy link
Contributor

(possibly being renamed to RequiresStorage):

FWIW I'm probably reverting all changes to the generator transform in that PR. The MaybeStorageLive analysis is also used for determining what variables need to have StorageLive when we enter resume, so it probably shouldn't be modified.

@tmandry
Copy link
Member

tmandry commented Jun 12, 2019

Also, won't it usually await the inner future, so anyway its address has been taken mutably?

@RalfJung What I'm trying to optimize is this:

let mut x = get_future();
dbg!(std::mem::size_of_val(&x));
x.await

which today expands to

let mut x = get_future();
dbg!(std::mem::size_of_val(&x));
{
  let mut pinned = x;  // <--- This move
  loop {
    // use `&mut pinned`
    yield;
  }
}

The move let mut pinned = x; doubles the size of the future that contains it today, because we reserve space for both x and pinned. What we need is for the generator transform to recognize that x does not need to be saved across the yield at all.

Based on our discussion, it seems pretty straightforward to allow this optimization when the size_of_val line is not present.

@cramertj
Copy link
Member

For what it's worth, that move (let mut pinned = x;) is just to prevent the user from using x after calling .await, and could be omitted if we had some way to mark a local as "moved" without actually moving it (drop doesn't work because it can cause the value to move to a different place in memory).

@tmandry
Copy link
Member

tmandry commented Jun 12, 2019

A secondary question I have is, can we rely on the ordering between the move and the borrow? e.g.

let mut x = String::new();
move_from(x);
// ...later...
x = bar();
let addr_x = &x as *const String;
move_from(x);
ptr::write(addr_x as *mut String, String::new());

Since the first instance of move_from(x) happens before any borrow, can we consider x as not needing storage between that and when it is re-assigned? (Clearly, today we cannot do this for the second instance of move_from(x).)

I don't want to get bogged down in this question though, since I'm unlikely to rely on it at first anyway.

@cramertj
Copy link
Member

For what it's worth, that move (let mut pinned = x;) is just to prevent the user from using x after calling .await, and could be omitted if we had some way to mark a local as "moved" without actually moving it (drop doesn't work because it can cause the value to move to a different place in memory).

Actually i think we can do this today-- ptr::drop_in_place followed by mem::forget XD

@tmandry
Copy link
Member

tmandry commented Jun 12, 2019

@cramertj Hmm, good to know, but this also affects the size of join! because we are moving all the sub-futures into a single joined future.

@cramertj
Copy link
Member

@tmandry we can get away with a similar thing there by only putting references in the new state machine. It's higher overhead because of the additional pointers, but it'll prevent ever having to copy into a new location, and save space for large types, even those whose address had been taken.

@RalfJung
Copy link
Member

Still catching up, some comments on the way:

I'm not sure I understand this bit-- couldn't the optimization be monomorphization-dependent?

Oh so everything we are discussing here happens post-monomorphization? Never mind that part then.

Based on our discussion, it seems pretty straightforward to allow this optimization when the size_of_val line is not present.

Agreed. And if it's just size_of_val we could let the analysis know that it doesn't actually escape the pointer. (Or we could do inlining and then that becomes obvious.)

Even with full Stacked Borrows, we'd still have an "uncertainty principle" (the physicist in me dies a little in the face of this very inaccurate analogy but whatever ;) when there is any interior mutability in a local of the future. That still seems bad enough for debugging, does it not?

@cramertj
Copy link
Member

Actually i think we can do this today-- ptr::drop_in_place followed by mem::forget XD

Aaaand this doesn't work because it requires that the binding be mutable:

macro_rules! drop_with_guaranteed_move_elision {
    ($val:ident) => {
        unsafe {
            std::ptr::drop_in_place(&mut $val);
            std::mem::forget($val);
        }
    }
}

fn main() {
    let mut x = "foo".to_string(); // errors if `x` isn't `mut`
    drop_with_guaranteed_move_elision!(x);
}

@tmandry
Copy link
Member

tmandry commented Jun 12, 2019

Agreed. And if it's just size_of_val we could let the analysis know that it doesn't actually escape the pointer. (Or we could do inlining and then that becomes obvious.)

Even with full Stacked Borrows, we'd still have an "uncertainty principle" (the physicist in me dies a little in the face of this very inaccurate analogy but whatever ;) when there is any interior mutability in a local of the future. That still seems bad enough for debugging, does it not?

Mm yeah, I guess the true fix for this rough edge is MIR inlining, with a special case for size_of_val being an acceptable interim solution.

Still, with regard to optimization I would like to support as many cases as possible.

@RalfJung
Copy link
Member

RalfJung commented Jun 13, 2019

Still, with regard to optimization I would like to support as many cases as possible.

Sure you do, we all do. Just remember that you are paying for this optimization with the blood sweat of Rust programmers because you made their programs UB. ;)


One thing I am wondering about: in a situation like

let mut x = get_future();
dbg!(std::mem::size_of_val(&x));
{
  let mut pinned = x;  // <--- This move
  loop {
    // use `&mut pinned`
    yield;
  }
}

why can't MIR building insert the StorageDead(x) right after the move? That would also make the optimization trivial. It wouldn't change the MIR semantics at all. It would, however, change the surface language semantics by adjusting StorageDead generation (which so far I have no good idea how to even specify, but with the impact it is having we will have to specify it some day).

@RalfJung
Copy link
Member

Aaaand this doesn't work because it requires that the binding be mutable:

Closures have a similar problem with bindings having to be mutable for some stuff, and that's why we have "unique" borrows in the compiler. So maybe a similar technique could be used here?

A secondary question I have is, can we rely on the ordering between the move and the borrow? e.g.

Since the first instance of move_from(x) happens before any borrow, can we consider x as not needing storage between that and when it is re-assigned?

let mut x = String::new();
move_from(x);
// ...later...
x = bar();
let addr_x = &x as *const String;
move_from(x);
ptr::write(addr_x as *mut String, String::new());

You can rely on there not being any access to x between the first move_from and x = bar, yes. The address has not been observed yet and hence the program should be unable to notice that x actually did not have a stable address all the time.

This is subtle but I see nothing wrong with this reasoning.^^

@tmandry
Copy link
Member

tmandry commented Jun 13, 2019

Sure you do, we all do. Just remember that you are paying for this optimization with the blood sweat of Rust programmers because you made their programs UB. ;)

Fair enough :) Personally I'm just as wary of painting ourselves into a corner in terms of classes of optimizations we can do, though. One is a higher immediate cost, and the other is a smaller cost which will be paid by all future users of Rust. But anyway, I'll stop waxing philosophical about optimizations for now ;)

why can't MIR building insert the StorageDead(x) right after the move?

To me it seems just as hard as doing the analysis, because you have to make sure not to do this in instances where x has already been borrowed. (And make sure to issue StorageLive(x) again before re-assignment, but that seems easy.)

This is subtle but I see nothing wrong with this reasoning.

Great. In MIR is it possible to write code that takes the address of a local after it's been moved from (before it's reinitialized)? If so it should be easy to handle, we just mark locals as needing storage as soon as any part of them is borrowed.

@Matthias247
Copy link
Contributor

Agreed. And if it's just size_of_val we could let the analysis know that it doesn't actually escape the pointer. (Or we could do inlining and then that becomes obvious.)
Even with full Stacked Borrows, we'd still have an "uncertainty principle" (the physicist in me dies a little in the face of this very inaccurate analogy but whatever ;) when there is any interior mutability in a local of the future. That still seems bad enough for debugging, does it not?

Mm yeah, I guess the true fix for this rough edge is MIR inlining, with a special case for size_of_val being an acceptable interim solution.

Still, with regard to optimization I would like to support as many cases as possible.

I guess this might be already understood, but size_of_val shouldn't be the only interim statement that should be looked at and improved. It was mainly an instrument to surface the issue, but it seems like the size extension happens in other places too.

There is quite a bit of code around which moves futures just in order to make sure users can't fiddle with them anymore, e.g. join! in futures-rs.

With @tmandry's last optimization I got a not too large future based program already from a a 70kB Future to a 30kB Future without further changes. However with e.g. replacing some await based mechanisms with manual combinators I brought it down further to less than 10kB - and I'm pretty sure there is a lot more potential since the program is actually pretty tiny.

I can't comment on what can be done in the compiler to improve this, but I definitely encourage all further investigations and improvements on this. The impact seems to be huge!

@RalfJung
Copy link
Member

To me it seems just as hard as doing the analysis, because you have to make sure not to do this in instances where x has already been borrowed. (And make sure to issue StorageLive(x) again before re-assignment, but that seems easy.)

No, that's not what I mean. The idea is to aggressively insert StorageDead after moves, to make such programs UB. What I am more worried about is that the program might move out only in some conditional branches, and what that would mean for the storage of those locals.

The reason I think we need this is that validity of pointers is not the only concern here. Currently, the following (entirely safe) code will definitely return true:

let mut x = String::new();
let addr_x = &x as *const _ as usize;
drop(x);
// later
x = String::new();
let addr_x2 = &x as *const _ as usize;
return addr_x == addr_x2;

If we want to do optimizations like yours here (and I am totally sympathetic to that), we have to explain in the "Rust Abstract Machine" (and in Miri) why this program might return false. And the answer cannot be "there is UB", because this is safe code.

This is a topic that @nikomatsakis, @eddyb and me have touched on several times already, without ever hashing out a full plan. But in the current state of affairs, the only mechanism we have to "defeat" pointer equality tests like the above is to make sure that this is not the same allocation any more.

So, one thing we might do is to do StorageDead(x); StorageLive(x); immediately after every move. This "re-allocates" x and thus definitely kills any existing pointers and also "defeats" pointer comparisons. The immediate StorageLive is to keep the liveness state in sync in both branches of a conditional (which might or might not be relevant -- unfortunately LLVM's semantics for these intrinsics is less than clear). I guess the StorageLive could be moved down in cases where there is no merging control flow, which should give you your optimization in many cases.

Great. In MIR is it possible to write code that takes the address of a local after it's been moved from (before it's reinitialized)? If so it should be easy to handle, we just mark locals as needing storage as soon as any part of them is borrowed.

Maybe. Better assume so. Just look out for any & in the MIR.

@tmandry
Copy link
Member

tmandry commented Jun 14, 2019

So, one thing we might do is to do StorageDead(x); StorageLive(x); immediately after every move.

Okay, that sounds like something worth pursuing, but it also seems like it would have potential fallout we need to check for. I'm not sure I understand all the implications myself. Given that I have an optimization pass mostly working without it, maybe it would be good to set up a separate issue to track all the considerations involved. EDIT: Opened #61849


Also, I forgot to mention.. just adding StorageDead(x) isn't enough because of drop flags. @eddyb wanted to add a MIR sanity check that every use (including drop) is dominated by StorageLive(x). Adding StorageDead(x) after every move would cause that sanity check to fail when we're branching on drop flags.

We would need something that adds StorageLive(x) in a strategic place, in order to both pass this check and make the optimization "just work" based on StorageLive/StorageDead alone.

@cramertj
Copy link
Member

Moving to "deferred" by the same logic as #52924 (comment). @rust-lang/lang feel free to put the labels back if you disagree.

@cramertj cramertj added AsyncAwait-Triaged Async-await issues that have been triaged during a working group meeting. and removed AsyncAwait-Polish Async-await issues that are part of the "polish" area labels Jun 14, 2019
@tmandry
Copy link
Member

tmandry commented Jun 14, 2019

I'd personally like to see this fixed before stabilization. It can cause the same exponential size effects as we were seeing before #60187, albeit in different contexts.

I'm hoping to have a fix up for review soon.

@eddyb
Copy link
Member

eddyb commented Jun 18, 2019

@RalfJung But wasn't the point of "Operand::Move doesn't invalidate borrows" that source-level moves don't invalidate borrows?
We could certainly emit pairs of llvm.lifetime.{end,start} calls after an Operand::Move, without adding pairs of Storage{Dead,Live} statements into the MIR, if the goal is to make it UB to reuse an old pointer. But I thought you wanted to allow reinitialization after a move, with an old pointer?

@RalfJung
Copy link
Member

The only thing I want is a precise definition of the semantics in a way that can be dynamically checked (e.g. in Miri), and ideally I also want the semantics to not be full of weird special cases. ;)

We could certainly emit pairs of llvm.lifetime.{end,start} calls after an Operand::Move, without adding pairs of Storage{Dead,Live} statements into the MIR, if the goal is to make it UB to reuse an old pointer.

How would that work? Wouldn't that mean that legal MIR code (that uses some kind of trick to reinitialize after a move) becomes UB in LLVM?

@eddyb
Copy link
Member

eddyb commented Jun 18, 2019

How would that work? Wouldn't that mean that legal MIR code (that uses some kind of trick to reinitialize after a move) becomes UB in LLVM?

No, I was referring to the case where we want the semantics of Operand::Move to be that they invalidate any outstanding borrows, similar to Storage{Dead,Live} but without bloating the MIR/impacting analyses which rely on some sort of dominance relationship.

I don't understand your position now. Are you saying you don't mind if source-level moves invalidate borrows, you just don't want it be be encoded into Operand in MIR, but rather something more like StorageDead? That would make sense, I just kept thinking you were worried about source-level moves.

@RalfJung
Copy link
Member

Let's continue at #61849 (comment).

Manishearth added a commit to Manishearth/rust that referenced this issue Jul 2, 2019
…, r=matthewjasper

Don't store locals that have been moved from in generators

This avoids reserving storage in generators for locals that are moved
out of (and not re-initialized) prior to yield points. Fixes rust-lang#59123.

This adds a new dataflow analysis, `RequiresStorage`, to determine whether the storage of a local can be destroyed without being observed by the program. The rules are:

1. StorageLive(x) => mark x live
2. StorageDead(x) => mark x dead
3. If a local is moved from, _and has never had its address taken_, mark it dead
4. If (any part of) a local is initialized, mark it live'

This is used to determine whether to save a local in the generator object at all, as well as which locals can be overlapped in the generator layout.

Here's the size in bytes of all testcases included in the change, before and after the change:

async fn test    |Size before |Size after
-----------------|------------|----------
single           | 1028       | 1028
single_with_noop | 2056       | 1032
joined           | 5132       | 3084
joined_with_noop | 8208       | 3084

generator test              |Size before |Size after
----------------------------|------------|----------
move_before_yield           | 1028       | 1028
move_before_yield_with_noop | 2056       | 1032
overlap_move_points         | 3080       | 2056

## Future work

Note that there is a possible extension to this optimization, which modifies rule 3 to read: "If a local is moved from, _**and either has never had its address taken, or is Freeze and has never been mutably borrowed**_, mark it dead." This was discussed at length in rust-lang#59123 and then rust-lang#61849. Because this would cause some behavior to be UB which was not UB before, it's a step that needs to be taken carefully.

A more immediate priority for me is inlining `std::mem::size_of_val(&x)` so it becomes apparent that the address of `x` is not taken. This way, using `size_of_val` to look at the size of your inner futures does not affect the size of your outer future.

cc @cramertj @eddyb @Matthias247 @nikomatsakis @RalfJung @Zoxc
bors added a commit that referenced this issue Jul 2, 2019
…jasper

Don't store locals that have been moved from in generators

This avoids reserving storage in generators for locals that are moved
out of (and not re-initialized) prior to yield points. Fixes #59123.

This adds a new dataflow analysis, `RequiresStorage`, to determine whether the storage of a local can be destroyed without being observed by the program. The rules are:

1. StorageLive(x) => mark x live
2. StorageDead(x) => mark x dead
3. If a local is moved from, _and has never had its address taken_, mark it dead
4. If (any part of) a local is initialized, mark it live'

This is used to determine whether to save a local in the generator object at all, as well as which locals can be overlapped in the generator layout.

Here's the size in bytes of all testcases included in the change, before and after the change:

async fn test    |Size before |Size after
-----------------|------------|----------
single           | 1028       | 1028
single_with_noop | 2056       | 1032
joined           | 5132       | 3084
joined_with_noop | 8208       | 3084

generator test              |Size before |Size after
----------------------------|------------|----------
move_before_yield           | 1028       | 1028
move_before_yield_with_noop | 2056       | 1032
overlap_move_points         | 3080       | 2056

## Future work

Note that there is a possible extension to this optimization, which modifies rule 3 to read: "If a local is moved from, _**and either has never had its address taken, or is Freeze and has never been mutably borrowed**_, mark it dead." This was discussed at length in #59123 and then #61849. Because this would cause some behavior to be UB which was not UB before, it's a step that needs to be taken carefully.

A more immediate priority for me is inlining `std::mem::size_of_val(&x)` so it becomes apparent that the address of `x` is not taken. This way, using `size_of_val` to look at the size of your inner futures does not affect the size of your outer future.

cc @cramertj @eddyb @Matthias247 @nikomatsakis @RalfJung @Zoxc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-async-await Area: Async & Await A-coroutines Area: Coroutines AsyncAwait-Triaged Async-await issues that have been triaged during a working group meeting. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.