Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up SparseBitMatrix use in RegionValues. #52250

Merged
merged 1 commit into from
Jul 22, 2018

Conversation

nnethercote
Copy link
Contributor

In practice, these matrices range from 10% to 90%+ full once they are
filled in, so the dense representation is better.

This reduces the runtime of Check Nll builds of inflate by 32%, and
several other benchmarks by 1--5%.

It also increases max-rss of clap-rs by 30% and a couple of others by
up to 5%, while decreasing max-rss of coercions by 14%. I think the
speed-ups justify the max-rss increases.

r? @nikomatsakis

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 11, 2018
@nnethercote
Copy link
Contributor Author

Here are the instruction count improvements exceeding 1%:

inflate-check
        avg: -32.4%     min: -32.4%     max: -32.4%
style-servo-check
        avg: -4.5%     min: -4.5%     max: -4.5%
clap-rs-check
        avg: -2.3%      min: -2.3%      max: -2.3%
coercions-check
        avg: -2.3%     min: -2.3%     max: -2.3%
sentry-cli-check
        avg: -1.5%      min: -1.5%      max: -1.5%
webrender-check
        avg: -1.5%      min: -1.5%      max: -1.5%
cargo-check
        avg: -1.5%      min: -1.5%      max: -1.5%
encoding-check
        avg: -1.5%      min: -1.5%      max: -1.5%
ripgrep-check
        avg: -1.3%      min: -1.3%      max: -1.3%
regex-check
        avg: -1.2%      min: -1.2%      max: -1.2%

Here are the max-rss changes exceeding 1%:

clap-rs-check
        avg: 29.5%      min: 29.5%      max: 29.5%
coercions-check
        avg: -14.8%    min: -14.8%    max: -14.8%
inflate-check
        avg: 5.7%       min: 5.7%       max: 5.7%
regression-31157-check
        avg: 1.6%       min: 1.6%       max: 1.6%
syn-check
        avg: 1.0%       min: 1.0%       max: 1.0%
helloworld-check
        avg: -1.0%      min: -1.0%      max: -1.0%
regex-check
        avg: 1.0%       min: 1.0%       max: 1.0%

@nnethercote
Copy link
Contributor Author

Here are some measurement of how full the BitMatrix instances get, for inflate

(  1)       24 (46.2%, 46.2%): after: 2 x 4 = 8; 6 used (75%)
(  2)        4 ( 7.7%, 53.8%): after: 2 x 7 = 14; 12 used (85.71%)
(  3)        2 ( 3.8%, 57.7%): after: 4 x 5 = 20; 12 used (60%)
(  4)        1 ( 1.9%, 59.6%): after: 120 x 835 = 100200; 22099 used (22.05%)
(  5)        1 ( 1.9%, 61.5%): after: 16 x 29 = 464; 250 used (53.87%)
(  6)        1 ( 1.9%, 63.5%): after: 2 x 37 = 74; 72 used (97.29%)
(  7)        1 ( 1.9%, 65.4%): after: 87 x 463 = 40281; 4158 used (10.32%)
(  8)        1 ( 1.9%, 67.3%): after: 4 x 8 = 32; 24 used (75%)
(  9)        1 ( 1.9%, 69.2%): after: 40 x 52 = 2080; 865 used (41.58%)
( 10)        1 ( 1.9%, 71.2%): after: 42 x 454 = 19068; 1902 used (9.97%)
( 11)        1 ( 1.9%, 73.1%): after: 18 x 51 = 918; 522 used (56.86%)
( 12)        1 ( 1.9%, 75.0%): after: 12 x 58 = 696; 503 used (72.27%)
( 13)        1 ( 1.9%, 76.9%): after: 132 x 506 = 66792; 15589 used (23.33%)
( 14)        1 ( 1.9%, 78.8%): after: 18 x 9 = 162; 96 used (59.25%)
( 15)        1 ( 1.9%, 80.8%): after: 12 x 38 = 456; 319 used (69.95%)
( 16)        1 ( 1.9%, 82.7%): after: 2 x 15 = 30; 28 used (93.33%)
( 17)        1 ( 1.9%, 84.6%): after: 4912 x 40782 = 200321184; 39886050 used (19.91%)
( 18)        1 ( 1.9%, 86.5%): after: 2 x 23 = 46; 44 used (95.65%)
( 19)        1 ( 1.9%, 88.5%): after: 101 x 501 = 50601; 5036 used (9.95%)
( 20)        1 ( 1.9%, 90.4%): after: 17 x 133 = 2261; 1063 used (47.01%)
( 21)        1 ( 1.9%, 92.3%): after: 24 x 172 = 4128; 450 used (10.90%)
( 22)        1 ( 1.9%, 94.2%): after: 52 x 202 = 10504; 5022 used (47.81%)
( 23)        1 ( 1.9%, 96.2%): after: 81 x 338 = 27378; 14638 used (53.46%)
( 24)        1 ( 1.9%, 98.1%): after: 18 x 90 = 1620; 1052 used (64.93%)
( 25)        1 ( 1.9%,100.0%): after: 11 x 15 = 165; 130 used (78.78%)

Note the very large one for (17) which dominates.

style-servo is broadly similar, though it has a number of larger ones, instead of being dominated by a single large one.

@nnethercote
Copy link
Contributor Author

I just measured html5ever with NLL as well. It reduces its instruction count by 35%, and its max-rss by 10%.

@nnethercote
Copy link
Contributor Author

The max-rss increase for clap-rs is because of one very large BitMatrix:

  after: 25897 x 24965 = 646518605; 94492819 used (14.61%)

This is 77.5 MiB, and it gets doubled because it gets cloned here:

let mut inferred_values = self.liveness_constraints.clone();

I tried getting rid of that clone -- which would greatly reduce the max-rss increase -- by transferring ownership of the BitMatrix from self.liveness_constraints to self.inferred_values (which required making liveness_constaints an Option<RegionValues>) but it caused test failures -- looks like liveness_constraints is used for error message production after inferred_values is created.

Anyway, even if the clone remains, some benchmarks take more memory but some take less, so it's basically a wash on that front, and the speed improvements are large enough to make this compelling.

@nnethercote
Copy link
Contributor Author

I was able to speed up inflate and clap-rs a bit more by optimizing BitVector::merge some more.

@nikomatsakis
Copy link
Contributor

Hmm, this change will interact poorly with @davidtwco's changes in #52190, because I think that in that context we don't know the number of region variables when we allocate the RegionValues.

We might want a kind of hybrid -- maybe we want to modify DenseMatrix to use an IndexVec<BitSet> instead of one big allocation?

(The family of bitset types also needs a bit of cleanup... this change though might allow us to remove the "buf vs slice" distinction which would simplify things.)

@nikomatsakis
Copy link
Contributor

@nnethercote note that the final values will probably be affected also by rebasing over #51987, which .. modifies that clone sort of. (The clone is removed, but a variant of it remains.)

We could probably free the liveness matrix at some point, though it wouldn't affect peak memory usage. It would potentially require a bit of work on the diagnostic side.

@nikomatsakis
Copy link
Contributor

Hmm, #51987 also reduces inflate-check's running time dramatically (by 43%). I would not however expect these two to "multiply" -- rather I suspect the benefits of this PR may be subsumed by #51987, since it reduces dramatically the number of sparse matrix merges that we do.

@nikomatsakis
Copy link
Contributor

Probably worth testing, in any case.

@nnethercote
Copy link
Contributor Author

Yes, the benefit here is entirely from making matrix merges faster.

I guess I'll wait until #51987 and #52190 play out and see if this PR still makes sense. This PR has a large effect for a small change, hopefully those two other PRs have as big or bigger effect.

@nnethercote
Copy link
Contributor Author

I just got "try" privileges, so I'm doing to test them in this PR.

@bors try

@bors
Copy link
Contributor

bors commented Jul 13, 2018

@nnethercote: 🔑 Insufficient privileges: not in try users

@kennytm

This comment has been minimized.

@bors

This comment has been minimized.

@bors

This comment has been minimized.

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 13, 2018
@kennytm kennytm added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jul 13, 2018
@bors
Copy link
Contributor

bors commented Jul 13, 2018

☔ The latest upstream changes (presumably #51987) made this pull request unmergeable. Please resolve the merge conflicts.

@kennytm kennytm added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 14, 2018
@nikomatsakis
Copy link
Contributor

OK, so #51987 has landed -- @nnethercote do you have thoughts on whether it makes sense to continue with this PR?

@nnethercote nnethercote changed the title Use BitMatrix instead of SparseBitMatrix in RegionValues. Speed up SparseBitMatrix use in RegionValues. Jul 18, 2018
@nikomatsakis
Copy link
Contributor

Hmm, those results look great! One concern though: in the branch I'm working on, I'm growing the number of elements in the matrix on the fly, which I guess wouldn't be compatible with this change. I'm thinking about how to solve this -- one way might be to split up the matrices into pieces. So for example we could store one matrix for points and then a separate matrix for regions.

@nnethercote
Copy link
Contributor Author

I'm growing the number of elements in the matrix on the fly, which I guess wouldn't be compatible with this change.

If the number of rows is growing, it should be fine as is. If the number of columns is growing, then that's different... it should be possible to make the number of columns in SparseBitMatrix extensible, though the added flexibility will likely shrink the size of the wins here.

@nikomatsakis
Copy link
Contributor

It's the number of columns that changes, yes, but I suspect I may be able to finesse it by breaking things into two matrices -- one for "points" and one for "regions" -- and allocating them at separate times. (In other words, we'd wait to allocate the region matrix until we know its proper size.)

To that end, I think we should probably land this PR, and I can try to rebase over it.

@nikomatsakis
Copy link
Contributor

@bors r+

@bors
Copy link
Contributor

bors commented Jul 19, 2018

📌 Commit 9bfd1c17620a88e8d24f3bcd7710976522c202ee has been approved by nikomatsakis

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jul 19, 2018
@bors
Copy link
Contributor

bors commented Jul 20, 2018

⌛ Testing commit 9bfd1c17620a88e8d24f3bcd7710976522c202ee with merge 4ebbaa1809e39d08dfc02e86b3c59612e61a7eed...

@bors
Copy link
Contributor

bors commented Jul 20, 2018

💔 Test failed - status-appveyor

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jul 20, 2018
Using a `BTreeMap` to represent rows in the bit matrix is really slow.
This patch changes things so that each row is represented by a
`BitVector`. This is a less sparse representation, but a much faster
one.

As a result, `SparseBitSet` and `SparseChunk` can be removed.

Other minor changes in this patch.

- It renames `BitVector::insert()` as `merge()`, which matches the
  terminology in the other classes in bitvec.rs.

- It removes `SparseBitMatrix::is_subset()`, which is unused.

- It reinstates `RegionValueElements::num_elements()`, which rust-lang#52190 had
  removed.

- It removes a low-value `debug!` call in `SparseBitMatrix::add()`.
@nnethercote
Copy link
Contributor Author

@bors retry

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 20, 2018
@nnethercote
Copy link
Contributor Author

Apparently I have "try" permissions but not "retry" permissions. @nikomatsakis, can you reapprove this? Thanks.

@kennytm
Copy link
Member

kennytm commented Jul 20, 2018

@nnethercote retry works only if you haven't pushed anything new. Pushing a new commit does require re-r+.

@nikomatsakis
Copy link
Contributor

@bors delegate=nnethercote

@bors
Copy link
Contributor

bors commented Jul 21, 2018

✌️ @nnethercote can now approve this pull request

@nikomatsakis
Copy link
Contributor

@bors r+

@bors
Copy link
Contributor

bors commented Jul 21, 2018

📌 Commit 798209e has been approved by nikomatsakis

@bors
Copy link
Contributor

bors commented Jul 22, 2018

⌛ Testing commit 798209e with merge a57d5d7...

bors added a commit that referenced this pull request Jul 22, 2018
Speed up `SparseBitMatrix` use in `RegionValues`.

In practice, these matrices range from 10% to 90%+ full once they are
filled in, so the dense representation is better.

This reduces the runtime of Check Nll builds of `inflate` by 32%, and
several other benchmarks by 1--5%.

It also increases max-rss of `clap-rs` by 30% and a couple of others by
up to 5%, while decreasing max-rss of `coercions` by 14%. I think the
speed-ups justify the max-rss increases.

r? @nikomatsakis
@bors
Copy link
Contributor

bors commented Jul 22, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: nikomatsakis
Pushing a57d5d7 to master...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants