Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quadratic slowdown for lookup of bounds on associated types. #65510

Open
eddyb opened this issue Oct 17, 2019 · 9 comments
Open

Quadratic slowdown for lookup of bounds on associated types. #65510

eddyb opened this issue Oct 17, 2019 · 9 comments
Labels
A-associated-items Area: Associated items (types, constants & functions) A-traits Area: Trait system I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@eddyb
Copy link
Member

eddyb commented Oct 17, 2019

There are two factors here, which we've spotted combined in the wild:

  1. we hoist all type X: ...; bounds to the trait-level, i.e.:
    trait Trait {
        type X: A + B;
        type Y: C + D;
    }
    is sugar for:
    trait Trait
    where
        Self::X: A,
        Self::X: B,
        Self::Y: C,
        Self::Y: D,
    {
        type X;
        type Y;
    }
    • I doubt we can do much about this, perhaps group/index the bounds?
  2. lookup of <_ as Foo>::X: Bar in Foo's where clauses appears to be quadratic
    • this wouldn't have a noticeable impact without 1.
    • probably easier to fix, I would've assumed it was already linear

And here's our stress test - apologies for the macro, but it takes a lot to push it into multiple seconds (the version in the wild had more sophisticated, and actually useful, macros):

macro_rules! stress {
    (type $name:ident: $($i:expr,)*) => {
        type $name: $(From<[(); $i]> + Into<[(); $i]> +)*;
    };
    (fn $($underscore:tt)*) => {
        const _: () = {
            $({fn _f<T: Trait>() {
                #[derive(Copy, Clone)]
                struct Foo<X>(X);
                let $underscore = Foo(T::X::default()).clone();
            }})*
        };
    };
}

trait Trait {
    type X: Copy + Default;

    // The bounds would normally be on many separate
    // associated types, but that makes no difference.
    stress!(type _Unused:
        00, 01, 02, 03, 04, 05, 06, 07, 08, 09,
        10, 11, 12, 13, 14, 15, 16, 17, 18, 19,

        // Uncomment to (almost) quadruple compile time:
        //20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
        //30, 31, 32, 33, 34, 35, 36, 37, 38, 39,

        // Add more to raise the total time to minutes.
    );
}

stress!(fn
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
);

Using -Zself-profile and summarize, we get this (times are total "self" times):

Query 20 (00-19) 40 (00-39) Slowdown
type_op_prove_predicate 2.52s 9.17s 3.64x
typeck_tables_of 1.78s 5.83s 3.28x
check_item_well_formed 1.43s 4.47s 3.13x

cc @rust-lang/wg-traits

@eddyb eddyb added A-traits Area: Trait system I-compiletime Issue: Problems and improvements with respect to compile times. A-associated-items Area: Associated items (types, constants & functions) labels Oct 17, 2019
@jonas-schievink jonas-schievink added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Oct 17, 2019
@eddyb
Copy link
Member Author

eddyb commented Oct 20, 2019

Looking at a RUSTC_LOG=rustc::traits, a type_op_prove_predicate query involving the X associated type, e.g. WF(<T as Trait>::X) or <T as Trait>::X: Sized, appears to be doing a lot of work for _Unused's bounds, far more than I expected to happen.

However, this explains why an individual query can reach 300ms, it's doing way more than I thought.

EDIT: specifically for WF, it looks like WF(<T as Trait>::X) requires that all predicates on Trait (which includes the ones on _Unused) hold.
It sort of makes sense, except... wouldn't T: Trait, which is already required by WF, be enough to guarantee WF(<T as Trait>::X)?

EDIT2: disabling that part of WF gets me an overall 2.5x to 3x speedup on the stress test, and the only test failures appear to be less redundant error messages, not anything compiling when it shouldn't.
I haven't looked yet at the query timings or the debug log, I'll try the original testcase now.

EDIT3: the speedup for cargo check on the original testcase is 8.3x, with the speedup for the type_op_prove_predicate self-time being 64x.

@eddyb
Copy link
Member Author

eddyb commented Oct 21, 2019

To explain more one of the places I believe the quadratic behavior is coming from:

  1. WF(<T as Trait>::X) requires all predicates_of(Trait).instantiate([T])
    • this is what I've experimented with removing, assuming it'd be sound to do so
  2. each resulting <T as Trait>::Foo: Bar predicate has to be proven and doesn't get cached
    • maybe we can add a cache here?
  3. proving each <T as Trait>::Foo: Bar is linear in the size of predicates_of(Trait)
    • this one is harder to solve but we could partition predicates by details inference variables can't represent, like the choice of trait Bar in the example above

@jackmott
Copy link

In case another example is helpful as a stress test, or otherwise, I have a library that was very much affected by this issue. Before the linked commit, the library would take minutes to compile, after the refactor, which grouped trait bounds into marker traits, it took only a few seconds to compile:

arduano/simdeez@8ffc9ed

@eddyb
Copy link
Member Author

eddyb commented Nov 27, 2019

Assuming my initial approach (#66020) won't land, I've been looking for alternatives.

Something that does appear to make a dent is making the trait selection/evaluation caches use ty::ParamEnvAnd - this means that they can be used even when there are bounds in scope.
That's not dealing with WF in particular, but AFAICT the expensive WF check is part of something larger that is being cached now, and also that kind of caching could benefit a lot more code.

If this pans out I'll open a PR to do more perf testing (EDIT: opened #66821).

bors added a commit that referenced this issue Nov 28, 2019
[WIP] [DO NOT MERGE] combine #66020 and #66821.

That is, the two fixes for #65510, and only for perf testing purposes.
The fact that they both work to a comparable extent, while touching different parts of the trait system, made me curious if there would be any gains from having both.

r? @nikomatsakis
bors added a commit that referenced this issue Dec 2, 2019
rustc: allow non-empty ParamEnv's in global trait select/eval caches.

*Based on #66963*

This appears to alleviate the symptoms of #65510 locally (without fixing WF directly), and is potentially easier to validate as sound (since it's a more ad-hoc version of queries we already have).

I'm opening this PR primarily to test the effects on perf.

r? @nikomatsakis cc @rust-lang/wg-traits
bors added a commit that referenced this issue Dec 11, 2019
rustc: allow non-empty ParamEnv's in global trait select/eval caches.

*Based on #66963*

This appears to alleviate the symptoms of #65510 locally (without fixing WF directly), and is potentially easier to validate as sound (since it's a more ad-hoc version of queries we already have).

I'm opening this PR primarily to test the effects on perf.

r? @nikomatsakis cc @rust-lang/wg-traits
@Mark-Simulacrum
Copy link
Member

@eddyb this is the issue I was thinking of in @jonas-schievink's PR that interns the associated items, I think.

@eddyb
Copy link
Member Author

eddyb commented Feb 6, 2020

Hmm I doubt that would make a big difference, as the problem is "baked into" predicates_of ahead of time.

The stress test even only has 2 associated types.

@jonas-schievink
Copy link
Contributor

Ah I see. The PR shouldn't affect this case since it only accelerates access to assoc. items, not their bounds.

@Mark-Simulacrum
Copy link
Member

Yes, sorry, I did not mean that as "that PR should fix this" -- I think I misremembered what this issue was talking about.

@estebank
Copy link
Contributor

#73905 will have a significant positive impact on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-associated-items Area: Associated items (types, constants & functions) A-traits Area: Trait system I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants