Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Combine run-pass tests into a single crate #49142

Closed
wants to merge 11 commits into from

Conversation

Zoxc
Copy link
Contributor

@Zoxc Zoxc commented Mar 18, 2018

This combines run-pass tests into larger crates.

Some timings:

System Before After
Windows 4.9 min 2.4 min
Fedora VM 3 min 1.2 min
Travis (probably unreliable numbers) 9.3 min 5.3 min

@Zoxc Zoxc force-pushed the run-fast-pass branch 2 times, most recently from 31ccebb to f646fc3 Compare March 18, 2018 19:16
@kennytm kennytm added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Mar 18, 2018
@Zoxc Zoxc force-pushed the run-fast-pass branch 10 times, most recently from 9f17513 to 6d7c17b Compare March 20, 2018 03:04
@Zoxc
Copy link
Contributor Author

Zoxc commented Mar 20, 2018

cc @rust-lang/compiler
cc @rust-lang/infra

@kennytm
Copy link
Member

kennytm commented Mar 20, 2018

To future contributors, when should // no-combine be needed? To future reviewers, how to tell when a run-pass test should require // no-combine, and will there be any silent breakage rendering the test invalid if // no-combine is missing?

BTW I think the same strategy could be applied to doc tests.

@kennytm kennytm added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. labels Mar 20, 2018
@alexcrichton
Copy link
Member

I'm personally not totally convinced we should do this, we did this a long time ago and it became untenable quickly as tests were never written for such a compilation and it would cause conflicts on various tests. Even here, for example, there's a lot of tests tagged with // no-combine which further creates friction when writing a test case. It's already a pain to have to throw on // ignore-emscripten to half the tests and this seems like it's "yet another thing" only discovered on CI as it's not enabled locally.

The wins here are nice but seem modest enough that it's not necessarily worth the increased friction in the test suite and maintenance on our end

@nikomatsakis
Copy link
Contributor

Hmm. This combines run-pass tests eagerly unless we annotate them with // no-combine? Indeed, as @alexcrichton suggests, this does sound like it will lead to a lot of confusing failures. :/

If we are going to do this, why would we not do it locally?

@Zoxc
Copy link
Contributor Author

Zoxc commented Mar 20, 2018

we did this a long time ago and it became untenable quickly as tests were never written for such a compilation and it would cause conflicts on various tests

I am curious to why it didn't work out previously. Do you remember some concrete problems?

We could infer no-combine on more tests. We could also suggest adding no-combine if the test fails when combined but passes otherwise.

The wins here are nice but seem modest enough that it's not necessarily worth the increased friction in the test suite and maintenance on our end

100% faster tests is not a modest gain. Incremental compilation will reduce the time spent in compilation making testing the overall bottleneck, so it is important to get faster tests. On AppVeyor, 2/3 of the time is spent on testing.

@nikomatsakis
Copy link
Contributor

We discussed this in the @rust-lang/compiler meeting. It seems like the gains on tests are potentially substantial, if @Zoxc's numbers are representative. On the other hand, this involves a fair amount of complexity in the compiler itself (changes to feature gating system etc), and causes our testing code to stray further from the "real thing" that users will use.

Some concerns that were raised:

  • How likely are we to miss bugs that only occur with no-combine (and hence don't get tested)?
  • When a test failure occurs, how clear is it from travis why it failed?
  • How often will // no-combine tags be required and how easy is it to tell?
    • @Zoxc proposes re-running failing tests in isolation.
  • To what extent would we get the same gains from a "long-lived" compiler process, such as what the RLS could probably use? (We have no idea how easy that would be to do, though.)

@pietroalbini
Copy link
Member

Just for reference, while discussing this on IRC I did some analysis on splitting tests based on the feature flags they use (grouping tests with the same set of feature flags).

I personally think grouping based on the sets of features the tests are using (without an explicit blacklist) is better than grouping as much tests as possible together with all the feature flags available.

@alexcrichton
Copy link
Member

I am curious to why it didn't work out previously. Do you remember some concrete problems?

This was way way long ago (like years before 1.0) when we did this, so my memory is a little hazy of this. If I recall correctly though, some issues were:

  • Back then we concatenated on Windows and didn't concatenate on other platforms. This means that you would often fail with weird errors on windows that you wouldn't know how to debug without picking up a Windows machine.
  • Back then it was a source concatenation rather than a compiler concatenation. A script was used to generate a large source file which rustc compiled. This had tons of bugs in it related to name resolution and such.
  • Tests quickly became very difficult to write in such a way that'd actually pass in all platforms. Most were unidiomatic or bending over backwards to being tested in all configurations

Some of these are solved with the strategy taken here, compiler support. That being said I don't think the pros outweigh the cons here, we already have a systemic problem of "my PR fails on CI but not locally" and by changing the testing strategy on CI further it inevitably increases that gap and continues to make tests harder and harder to write.

@nikomatsakis
Copy link
Contributor

I personally think grouping based on the sets of features the tests are using (without an explicit blacklist) is better than grouping as much tests as possible together with all the feature flags available.

That makes a lot of sense, and would certainly be less intrusive.

@Zoxc Zoxc force-pushed the run-fast-pass branch 2 times, most recently from 338d469 to d6c335e Compare March 23, 2018 06:19
@bors
Copy link
Contributor

bors commented Mar 23, 2018

☔ The latest upstream changes (presumably #49308) made this pull request unmergeable. Please resolve the merge conflicts.

@Zoxc Zoxc force-pushed the run-fast-pass branch 2 times, most recently from b286ae3 to 15a2f7e Compare March 24, 2018 17:15
@Zoxc Zoxc force-pushed the run-fast-pass branch 2 times, most recently from ab20fff to 563e8ca Compare March 24, 2018 19:44
@Zoxc
Copy link
Contributor Author

Zoxc commented Mar 24, 2018

I've made some changes. I now group tests by features and run all tests with the same feature set together and no longer rely on rustc supporting modular features. rustc now asserts that the features inside a test module is equal to the feature set enabled by the crate. It also gives errors on attributes applied to test modules. I've fixed my path rewrite to work with macro expansion so macros no longer break tests. I also run tests in individual processes which avoid issues with tests messing up the global state and allows them to run themselves again. This doesn't seem to slow tests down by much. On Windows I'm up to 2.5 minutes for run-pass.

@bors
Copy link
Contributor

bors commented Mar 24, 2018

☔ The latest upstream changes (presumably #49337) made this pull request unmergeable. Please resolve the merge conflicts.

@Zoxc
Copy link
Contributor Author

Zoxc commented Mar 25, 2018

I've added a whitelists of attributes allowed in a combined tests to catch attributes which could have a crate-level effect.

@shepmaster
Copy link
Member

Ping from triage, @Zoxc ! Looks like you have failing tests; will you have time to address them sometime soon?

@bors
Copy link
Contributor

bors commented Apr 2, 2018

☔ The latest upstream changes (presumably #49252) made this pull request unmergeable. Please resolve the merge conflicts.

@nikomatsakis
Copy link
Contributor

@Zoxc

I've made some changes.

These changes definitely seem to be going in the right direction, from my POV. I'm much happier with "asserting" that things are equal vs changing the behavior...

@alexcrichton
Copy link
Member

I'm personally still against changing the tests in such a blanket fashion. I do agree that this will result in a faster test suite, but the wins here are only a handful of minutes. I feel that the tradeoff isn't worth it with respect to the amount of new code to maintain here and the possible impact this can have on the test suite.

@Zoxc
Copy link
Contributor Author

Zoxc commented Apr 10, 2018

This also applies the same strategy for documentation examples in rustdoc. Using it there seems less risky as language edge cases aren't being tested. The test suite seems to spend a lot of time building those examples. I don't have any numbers for that yet though.

Even if we don't end up enabling this for run-pass/compiletest, I'd still like the code for it to land as this makes it much easier to change compiletest to compile multiple test in a single rustc process, once rustc gets proper support for that.

@alexcrichton
Copy link
Member

It's true yeah that rustdoc may be a better target here, but those are also notoriously flaky tests which are very difficult to tweak how they're generated. That means we'd for sure want it off by default to start with and may want to just take it slow on enabling it for CI builds.

@pietroalbini
Copy link
Member

Ping from triage @Zoxc! What's the status of this PR?

1 similar comment
@pietroalbini
Copy link
Member

Ping from triage @Zoxc! What's the status of this PR?

@Zoxc
Copy link
Contributor Author

Zoxc commented Apr 23, 2018

This PR needs some more work. I'll just close it for now.

@Zoxc Zoxc closed this Apr 23, 2018
@pietroalbini pietroalbini added S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants