Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BTree: lazily locate leaves in rangeless iterators #86031

Merged
merged 1 commit into from
Aug 2, 2021

Conversation

ssomers
Copy link
Contributor

@ssomers ssomers commented Jun 5, 2021

BTree iterators always locate both the first and last leaf edge and often only need either one, i.e., whenever they are traversed in a single direction, like in for-loops and in the common use of iter().next() or iter().next_back() to retrieve the first or last key/value-pair (#62924). It's fairly easy to avoid because the iterators with this disadvantage already are quite separate from other iterators.

r? @Mark-Simulacrum

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 5, 2021
@rust-log-analyzer

This comment has been minimized.

@ssomers
Copy link
Contributor Author

ssomers commented Jun 5, 2021

About alloc's own benches:

  • They're generally positive, but who knows what that means in other code.
  • The btree::map::iter_X benches obviously benefit, but they're quite artificial now since they merely invoke iter().
  • In small maps, or intersection benches that can quickly bail out, the "descend" to the leaf node didn't take long anyway and the extra flags that take care of avoiding it slow down.
  • The btree::map::first_and_last_X_stable benches show that legacy iter().next() calls are twice as fast, but they are still considerably slower than the nightly first and last functions promoted in Tracking issue for map_first_last: first/last methods on BTreeSet and BTreeMap #62924. No idea why.
 name                                           mid ns/iter  new ns/iter  diff ns/iter   diff %  speedup
 btree::map::clone_fat_val_100                  10,772       11,452                680    6.31%   x 0.94
 btree::map::clone_fat_val_100_and_clear        10,746       10,737                 -9   -0.08%   x 1.00
 btree::map::clone_fat_val_100_and_drain_all    44,356       44,764                408    0.92%   x 0.99
 btree::map::clone_fat_val_100_and_drain_half   31,685       31,179               -506   -1.60%   x 1.02
 btree::map::clone_fat_val_100_and_into_iter    9,913        10,070                157    1.58%   x 0.98
 btree::map::clone_fat_val_100_and_pop_all      54,122       54,185                 63    0.12%   x 1.00
 btree::map::clone_fat_val_100_and_remove_all   45,772       46,705                933    2.04%   x 0.98
 btree::map::clone_fat_val_100_and_remove_half  29,832       29,763                -69   -0.23%   x 1.00
 btree::map::clone_slim_100                     4,303        4,309                   6    0.14%   x 1.00
 btree::map::clone_slim_100_and_clear           4,255        4,231                 -24   -0.56%   x 1.01
 btree::map::clone_slim_100_and_drain_all       6,091        6,150                  59    0.97%   x 0.99
 btree::map::clone_slim_100_and_drain_half      5,370        5,307                 -63   -1.17%   x 1.01
 btree::map::clone_slim_100_and_into_iter       4,303        4,240                 -63   -1.46%   x 1.01
 btree::map::clone_slim_100_and_pop_all         6,321        6,144                -177   -2.80%   x 1.03
 btree::map::clone_slim_100_and_remove_all      7,097        6,901                -196   -2.76%   x 1.03
 btree::map::clone_slim_100_and_remove_half     5,590        5,575                 -15   -0.27%   x 1.00
 btree::map::clone_slim_10k                     181,173      182,340             1,167    0.64%   x 0.99
 btree::map::clone_slim_10k_and_clear           179,972      185,704             5,732    3.18%   x 0.97
 btree::map::clone_slim_10k_and_drain_all       375,875      372,945            -2,930   -0.78%   x 1.01
 btree::map::clone_slim_10k_and_drain_half      316,386      313,505            -2,881   -0.91%   x 1.01
 btree::map::clone_slim_10k_and_into_iter       184,300      184,380                80    0.04%   x 1.00
 btree::map::clone_slim_10k_and_pop_all         406,045      406,948               903    0.22%   x 1.00
 btree::map::clone_slim_10k_and_remove_all      537,710      482,480           -55,230  -10.27%   x 1.11
 btree::map::clone_slim_10k_and_remove_half     482,560      491,320             8,760    1.82%   x 0.98
 btree::map::find_rand_100                      9            11                      2   22.22%   x 0.82
 btree::map::find_rand_10_000                   52           53                      1    1.92%   x 0.98
 btree::map::find_seq_100                       11           12                      1    9.09%   x 0.92
 btree::map::find_seq_10_000                    36           35                     -1   -2.78%   x 1.03
 btree::map::first_and_last_0_nightly           9            9                       0    0.00%   x 1.00
 btree::map::first_and_last_0_stable            24           24                      0    0.00%   x 1.00
 btree::map::first_and_last_100_nightly         33           48                     15   45.45%   x 0.69
 btree::map::first_and_last_100_stable          236          102                  -134  -56.78%   x 2.31
 btree::map::first_and_last_10k_nightly         68           60                     -8  -11.76%   x 1.13
 btree::map::first_and_last_10k_stable          259          121                  -138  -53.28%   x 2.14
 btree::map::insert_rand_100                    38           38                      0    0.00%   x 1.00
 btree::map::insert_rand_10_000                 38           38                      0    0.00%   x 1.00
 btree::map::insert_seq_100                     46           46                      0    0.00%   x 1.00
 btree::map::insert_seq_10_000                  93           95                      2    2.15%   x 0.98
 btree::map::iter_0                             1,254        1,755                 501   39.95%   x 0.71
 btree::map::iter_1                             4,013        1,755              -2,258  -56.27%   x 2.29
 btree::map::iter_100                           4,388        1,724              -2,664  -60.71%   x 2.55
 btree::map::iter_10k                           5,952        1,755              -4,197  -70.51%   x 3.39
 btree::map::iter_1m                            8,179        1,755              -6,424  -78.54%   x 4.66
 btree::map::iteration_1000                     4,092        3,850                -242   -5.91%   x 1.06
 btree::map::iteration_100000                   526,900      464,360           -62,540  -11.87%   x 1.13
 btree::map::iteration_20                       65           60                     -5   -7.69%   x 1.08
 btree::map::iteration_mut_1000                 4,250        4,011                -239   -5.62%   x 1.06
 btree::map::iteration_mut_100000               532,400      476,360           -56,040  -10.53%   x 1.12
 btree::map::iteration_mut_20                   66           62                     -4   -6.06%   x 1.06
 btree::map::range_included_excluded            393,385      406,255            12,870    3.27%   x 0.97
 btree::map::range_included_included            401,615      402,760             1,145    0.29%   x 1.00
 btree::map::range_included_unbounded           137,408      139,191             1,783    1.30%   x 0.99
 btree::map::range_unbounded_unbounded          45,399       42,853             -2,546   -5.61%   x 1.06
 btree::map::range_unbounded_vs_iter            45,758       18,328            -27,430  -59.95%   x 2.50
 btree::set::clone_100                          1,408        1,401                  -7   -0.50%   x 1.00
 btree::set::clone_100_and_clear                1,410        1,411                   1    0.07%   x 1.00
 btree::set::clone_100_and_drain_all            2,960        2,950                 -10   -0.34%   x 1.00
 btree::set::clone_100_and_drain_half           2,309        2,304                  -5   -0.22%   x 1.00
 btree::set::clone_100_and_into_iter            1,436        1,557                 121    8.43%   x 0.92
 btree::set::clone_100_and_pop_all              2,704        2,713                   9    0.33%   x 1.00
 btree::set::clone_100_and_remove_all           3,706        3,561                -145   -3.91%   x 1.04
 btree::set::clone_100_and_remove_half          2,410        2,469                  59    2.45%   x 0.98
 btree::set::clone_10k                          170,150      166,192            -3,958   -2.33%   x 1.02
 btree::set::clone_10k_and_clear                170,544      170,363              -181   -0.11%   x 1.00
 btree::set::clone_10k_and_drain_all            326,732      323,010            -3,722   -1.14%   x 1.01
 btree::set::clone_10k_and_drain_half           282,630      281,396            -1,234   -0.44%   x 1.00
 btree::set::clone_10k_and_into_iter            171,660      182,752            11,092    6.46%   x 0.94
 btree::set::clone_10k_and_pop_all              314,910      310,033            -4,877   -1.55%   x 1.02
 btree::set::clone_10k_and_remove_all           455,900      415,210           -40,690   -8.93%   x 1.10
 btree::set::clone_10k_and_remove_half          446,190      438,070            -8,120   -1.82%   x 1.02
 btree::set::difference_random_100_vs_100       1,009        1,077                  68    6.74%   x 0.94
 btree::set::difference_random_100_vs_10k       2,128        2,116                 -12   -0.56%   x 1.01
 btree::set::difference_random_10k_vs_100       69,723       76,655              6,932    9.94%   x 0.91
 btree::set::difference_random_10k_vs_10k       217,273      227,112             9,839    4.53%   x 0.96
 btree::set::difference_staggered_100_vs_100    1,111        1,266                 155   13.95%   x 0.88
 btree::set::difference_staggered_100_vs_10k    2,050        2,080                  30    1.46%   x 0.99
 btree::set::difference_staggered_10k_vs_10k    110,005      126,791            16,786   15.26%   x 0.87
 btree::set::intersection_100_neg_vs_100_pos    17           28                     11   64.71%   x 0.61
 btree::set::intersection_100_neg_vs_10k_pos    18           30                     12   66.67%   x 0.60
 btree::set::intersection_100_pos_vs_100_neg    17           28                     11   64.71%   x 0.61
 btree::set::intersection_100_pos_vs_10k_neg    18           30                     12   66.67%   x 0.60
 btree::set::intersection_10k_neg_vs_100_pos    18           29                     11   61.11%   x 0.62
 btree::set::intersection_10k_neg_vs_10k_pos    19           32                     13   68.42%   x 0.59
 btree::set::intersection_10k_pos_vs_100_neg    18           29                     11   61.11%   x 0.62
 btree::set::intersection_10k_pos_vs_10k_neg    19           32                     13   68.42%   x 0.59
 btree::set::intersection_random_100_vs_100     717          582                  -135  -18.83%   x 1.23
 btree::set::intersection_random_100_vs_10k     1,882        1,896                  14    0.74%   x 0.99
 btree::set::intersection_random_10k_vs_100     1,861        1,852                  -9   -0.48%   x 1.00
 btree::set::intersection_random_10k_vs_10k     166,006      164,202            -1,804   -1.09%   x 1.01
 btree::set::intersection_staggered_100_vs_100  641          617                   -24   -3.74%   x 1.04
 btree::set::intersection_staggered_100_vs_10k  1,770        1,847                  77    4.35%   x 0.96
 btree::set::intersection_staggered_10k_vs_10k  63,001       61,322             -1,679   -2.67%   x 1.03
 btree::set::is_subset_100_vs_100               626          621                    -5   -0.80%   x 1.01
 btree::set::is_subset_100_vs_10k               1,441        1,349                 -92   -6.38%   x 1.07
 btree::set::is_subset_10k_vs_100               2            2                       0    0.00%   x 1.00
 btree::set::is_subset_10k_vs_10k               63,428       61,306             -2,122   -3.35%   x 1.03

@ssomers ssomers marked this pull request as draft June 5, 2021 17:52
@ssomers

This comment has been minimized.

@ssomers ssomers force-pushed the btree_lazy_iterator branch from 2dab5be to 9e1ac54 Compare June 6, 2021 12:28
@ssomers ssomers marked this pull request as ready for review June 6, 2021 12:28
@Mark-Simulacrum
Copy link
Member

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 7, 2021
@bors
Copy link
Contributor

bors commented Jun 7, 2021

⌛ Trying commit 9e1ac543d134a115fe253691eb9b137192560b7a with merge ae3cf1a90b2081be4fab0ca50bf8e1c75877a29d...

@bors
Copy link
Contributor

bors commented Jun 7, 2021

☀️ Try build successful - checks-actions
Build commit: ae3cf1a90b2081be4fab0ca50bf8e1c75877a29d (ae3cf1a90b2081be4fab0ca50bf8e1c75877a29d)

@rust-timer
Copy link
Collaborator

Queued ae3cf1a90b2081be4fab0ca50bf8e1c75877a29d with parent 35fff69, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (ae3cf1a90b2081be4fab0ca50bf8e1c75877a29d): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 7, 2021
@ssomers ssomers marked this pull request as draft June 9, 2021 10:54
@Mark-Simulacrum Mark-Simulacrum added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 9, 2021
@ssomers ssomers force-pushed the btree_lazy_iterator branch from 9e1ac54 to 0272c3a Compare June 21, 2021 07:20
@ssomers ssomers marked this pull request as ready for review June 21, 2021 07:51
@ssomers
Copy link
Contributor Author

ssomers commented Jun 21, 2021

Last commits fix variance and try to maintain #[inline] placement, which seems to dominate performance.

@ssomers ssomers marked this pull request as draft June 21, 2021 19:33
@ssomers

This comment has been minimized.

@ssomers ssomers force-pushed the btree_lazy_iterator branch from 0272c3a to 1e977a7 Compare June 26, 2021 11:43
@ssomers ssomers marked this pull request as ready for review June 26, 2021 11:43
@bors
Copy link
Contributor

bors commented Jul 8, 2021

☔ The latest upstream changes (presumably #86520) made this pull request unmergeable. Please resolve the merge conflicts.

@ssomers ssomers force-pushed the btree_lazy_iterator branch from 1e977a7 to 35d02e2 Compare July 8, 2021 20:34
@Mark-Simulacrum
Copy link
Member

@bors try @rust-timer queue

Seems generally OK, though I wonder if the performance regressions are largely down to slightly more complicated code. The theoretical improvement from avoiding the extra traversal (in practice usually just one, I imagine; most iterators are likely traversed at least a few times) is worth the possible compile time overheads...

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Aug 1, 2021
@bors
Copy link
Contributor

bors commented Aug 1, 2021

⌛ Trying commit 35d02e2 with merge d3ac1a943e62f13e360607ac74b33332a0916f76...

@bors
Copy link
Contributor

bors commented Aug 1, 2021

☀️ Try build successful - checks-actions
Build commit: d3ac1a943e62f13e360607ac74b33332a0916f76 (d3ac1a943e62f13e360607ac74b33332a0916f76)

@rust-timer
Copy link
Collaborator

Queued d3ac1a943e62f13e360607ac74b33332a0916f76 with parent 8d57c0a, future comparison URL.

@Mark-Simulacrum Mark-Simulacrum removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 1, 2021
@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (d3ac1a943e62f13e360607ac74b33332a0916f76): comparison url.

Summary: This benchmark run did not return any significant changes.

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Aug 1, 2021
@Mark-Simulacrum
Copy link
Member

@bors r+

@bors
Copy link
Contributor

bors commented Aug 1, 2021

📌 Commit 35d02e2 has been approved by Mark-Simulacrum

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 1, 2021
@ssomers
Copy link
Contributor Author

ssomers commented Aug 1, 2021

avoiding the extra traversal (in practice usually just one, I imagine; most iterators are likely traversed at least a few times)

I don't understand what you're referring to (but I don't have to). It's always about just one (multi-level) traversal, and one that's going to be done anyway if iterating to the end. For huge trees, iterating to the end could save multiple memory fetches of nodes that get bored in the cache and leave, before they're read for real, but then it's a huge tree anyway, and just traversing to the fixed first leaf node is wasteful.

As to the slightly more complicated code, it should allow reverting #81486.

@bors
Copy link
Contributor

bors commented Aug 1, 2021

⌛ Testing commit 35d02e2 with merge cd5a90f...

@bors
Copy link
Contributor

bors commented Aug 2, 2021

☀️ Test successful - checks-actions
Approved by: Mark-Simulacrum
Pushing cd5a90f to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Aug 2, 2021
@bors bors merged commit cd5a90f into rust-lang:master Aug 2, 2021
@rustbot rustbot added this to the 1.56.0 milestone Aug 2, 2021
@ssomers ssomers deleted the btree_lazy_iterator branch August 2, 2021 08:54
ssomers added a commit to ssomers/rust that referenced this pull request Aug 2, 2021
bors added a commit to rust-lang-ci/rust that referenced this pull request Aug 16, 2021
…r=Mark-Simulacrum

BTree: merge the complication introduced by rust-lang#81486 and rust-lang#86031

Also:
- Deallocate the last few tree nodes as soon as an `into_iter` iterator steps beyond the end, instead of waiting around for the drop of the iterator (just to share more code).
- Symmetric code for backward iteration.
- Mark unsafe the methods on dying handles, modelling dying handles after raw pointers: it's the caller's responsibility to use them safely.

r? `@Mark-Simulacrum`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants