Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Set Difference size_hint lower bound #530

Merged
merged 1 commit into from
Jun 17, 2024

Conversation

ToMe25
Copy link
Contributor

@ToMe25 ToMe25 commented Jun 7, 2024

This PR makes the Set Difference iterator generate a non-zero lower bound in some situations.

Specifically, it now returns a non-zero lower bound if the difference method is called on a larger set with a smaller set.
That is because in those cases the fact that sets can't really contains duplicates* guarantees that a minimum of self.len() - other.len() items will be returned by the iterator.

* Well, they can, but that is already documented to be causing a random mess

This implementation has the disadvantage that a single next() call may reduce the lower bound by more than one.
Every size hint generated, even the first largest one, is guaranteed to be correct, but it may be confusing if one next() call causes the lower bound to drop by more than one.

This could be avoided by storing the minimum number of resulting elements in the iterator and subtracting one each time next() is called, but I don't think its worth the added complexity.

This implementation isn't perfect, but it is spec compliant as far as I
can tell
@ToMe25
Copy link
Contributor Author

ToMe25 commented Jun 7, 2024

My motivation for this was that I'm a perfectionist and get annoyed by this kind of thing, but I still did A LOT of benchmarking.

The result of my benchmarking was that while the changes to collecting the various iterators to a HashSet were as expected, the changes to collecting to Vec were kinda weird.
For some reason this change improved the performance of collecting an Itersection to a Vec, even though this change doesn't affect Intersection at all.
This also drastically decreases the performance of collecting to a Vec, which I believe to be solely due to the fact that the inlined lower bound is no longer a constant.

The summary of my results is this:
Collecting to a HashSet gets faster by ~20% in the affected cases, and remains unchanged in the other cases.
Collecting to a Vec gets faster or slower by ~20-50% seemingly at random, but overall slower on average.

Here my exact results(click me):
Name Before After Diff (%)
set_iter_collect_difference_large_small_hashset 55,917.77 44,172.59 -21.00%
set_iter_collect_difference_large_small_vec 22,996.61 34,235.16 48.87%
set_iter_collect_difference_small_large_hashset 2,785.54 2,737.30 -1.73%
set_iter_collect_difference_small_large_vec 2,071.05 1,598.02 -22.84%
set_iter_collect_intersection_large_small_hashset 2,862.06 2,860.77 -0.05%
set_iter_collect_intersection_large_small_vec 2,403.75 1,630.63 -32.16%
set_iter_collect_intersection_small_large_hashset 2,866.35 2,885.31 0.66%
set_iter_collect_intersection_small_large_vec 2,490.41 1,623.99 -34.79%
set_iter_collect_symmetric_difference_large_small_hashset 60,269.34 47,027.25 -21.97%
set_iter_collect_symmetric_difference_large_small_vec 24,543.87 34,968.29 42.47%
set_iter_collect_symmetric_difference_small_large_hashset 60,668.83 45,848.87 -24.43%
set_iter_collect_symmetric_difference_small_large_vec 24,864.28 34,386.12 38.30%
set_iter_collect_union_large_small_hashset 42,177.86 42,743.26 1.34%
set_iter_collect_union_large_small_vec 31,568.72 30,323.12 -3.95%
set_iter_collect_union_small_large_hashset 42,909.41 42,364.49 -1.27%
set_iter_collect_union_small_large_vec 25,763.16 30,302.66 17.62%

The tests were run with a "large" 1000 element set and a "small" 100 element set with an overlap of 50 elements.
The large_small/small_large portion of the name indicates which set the operation was called on.
large_small translates to large.$op(small).

That means that from a performance point of view, whether this change is worth it depends on whether collecting to a Vec or a HashSet is more common.

@ToMe25
Copy link
Contributor Author

ToMe25 commented Jun 11, 2024

One way to potentially get rid of most of the collect::<Vec<T>>::() performance penalty of this change would be to implement a specialized version for Vecs.

I'm not entirely sure how trait based specialization works though, so I can't guarantee that this will work.
Another disadvantage would be that this would add a significant amount of extra code complexity.

The advantage is, though, that doing so should allow making collecting into a Vec faster than without this PR, since the initial first allocation could be larger.
I'm not certain, however, if this outweighs the performance penalty that the trait based specialization probably brings with it.
Nor am I sure whether this would be worth the added complexity.

@Amanieu
Copy link
Member

Amanieu commented Jun 17, 2024

The overhead of a saturating add is tiny compared to memory allocation so that is most likely not the direct cause of the regression. Finding the actual cause would require looking at the exact assembly code generated by rustc and comparing the two versions along with profiling information.

However I think it is still better to merge this for now since it does provide a more accurate lower bound for the iterator size.

@bors r+

@bors
Copy link
Contributor

bors commented Jun 17, 2024

📌 Commit f4361bc has been approved by Amanieu

It is now in the queue for this repository.

@bors
Copy link
Contributor

bors commented Jun 17, 2024

⌛ Testing commit f4361bc with merge 65c553d...

@bors
Copy link
Contributor

bors commented Jun 17, 2024

☀️ Test successful - checks-actions
Approved by: Amanieu
Pushing 65c553d to master...

@bors bors merged commit 65c553d into rust-lang:master Jun 17, 2024
24 checks passed
@ToMe25 ToMe25 deleted the fix_set_diff_size_hint branch June 17, 2024 14:28
bors added a commit that referenced this pull request Jun 18, 2024
Implement XxxAssign operations on HashSets

This PR primarily implements the XxxAssign operation traits for `HashSet`.

My primary motivation to do so is for convenience, but depending on the situation they can provide a significant performance improvement as well.\*

In my tests, which may not be ideal because I don't have much benchmarking experience, the assigning operations are, with the exception of `Sub`, a minimum of 25% faster.\*
Note that when swapping the large and the small set around, some of these are significantly slower than the non-assigning variant.
Therefore using them is likely only worth it performance wise, if you already know which set is larger, and the right one of the sets just so happens to be the one you don't need to keep.

 \* Results may have changed due to #530 being merged

Here my exact benchmark results, done with the newly added benchmark suit:

<!DOCTYPE html>

VER | LSIZE | SSIZE | OP | NS/ITER | DIFF (%) | COMMENT
-- | -- | -- | -- | -- | -- | --
1 | 1000 | 100 | and | 5,682.88 |   |  
1 | 1000 | 100 | or | 41,427.82 |   |  
1 | 1000 | 100 | xor | 57,404.27 |   |  
1 | 1000 | 100 | subls | 56,262.53 |   |  
1 | 1000 | 100 | subsl | 751.42 |   |  
1 | 1000 | 2 | and | 100.16 |   |  
1 | 1000 | 2 | or | 40,435.09 |   |  
1 | 1000 | 2 | xor | 59,058.05 |   |  
1 | 1000 | 2 | subls | 58,668.34 |   |  
1 | 1000 | 2 | subsl | 18.89 |   |  
1 | 1000 | 100 | or_ass | 32,888.49 | -20.61% | unconditional insert
2 | 1000 | 100 | or_ass | 29,397.04 | -29.04% | !contains insert
3 | 1000 | 100 | or_ass | 32,399.65 | -21.79% | extend iter().cloned()
4 | 1000 | 100 | or_ass | 30,693.33 | -25.91% | get_or_insert_owned
5 | 1000 | 100 | or_ass | 33,722.59 | -18.60% | calc intersection; extend rhs.iter() !intersection contains; Requires S: Clone
1 | 1000 | 100 | add_ass | 30,114.17 | -26.66% | !contains insert
1 | 1000 | 100 | xor_ass | 32,309.85 | -43.72% | contains remove else insert
2 | 1000 | 100 | xor_ass | 40,058.48 | -30.22% | extract_if rhs contains; extend !removed contains
3 | 1000 | 100 | xor_ass | 31,801.04 | -44.60% | raw_entry().from_key() replace_entry_with / insert
4 | 1000 | 100 | xor_ass | 31,935.07 | -44.37% | raw_entry().from_key_hashed_nocheck() replace_entry_with / insert_hashed_nocheck
5 | 1000 | 100 | xor_ass | 31,843.33 | -44.53% | self.map.table.get.is_none self.map.table.insert else self.map.table.remove_entry
1 | 1000 | 100 | subls_ass | 33,366.13 | -40.70% | contains remove
1 | 1000 | 100 | subsl_ass | 10,686.02 | 1322.11% | contains remove
2 | 1000 | 100 | subls_ass | 36,351.69 | -35.39% | retain !contains
2 | 1000 | 100 | subsl_ass | 3,939.67 | 424.30% | retain !contains
3 | 1000 | 100 | subls_ass | 32,012.82 | -43.10% | unconditional remove
3 | 1000 | 100 | subsl_ass | 9,908.76 | 1218.67% | unconditional remove
4 | 1000 | 100 | subls_ass | 36,232.13 | -35.60% | self.map.retain !contains
4 | 1000 | 100 | subsl_ass | 3,939.35 | 424.25% | self.map.retain !contains
5 | 1000 | 100 | subls_ass | 31,879.32 | -43.34% | if rhs smaller self unconditional remove else retain !contains
5 | 1000 | 100 | subsl_ass | 3,946.98 | 425.27% | if rhs smaller self unconditional remove else retain !contains
1 | 1000 | 2 | add_ass | 28,324.95 | -29.27% |  
2 | 1000 | 2 | or_ass | 28,322.62 | -29.96% |  
1 | 1000 | 2 | xor_ass | 29,107.31 | -50.71% |  
3 | 1000 | 2 | xor_ass | 29,026.82 | -50.85% |  
1 | 1000 | 2 | subls_ass | 29,310.04 | -50.04% |  
1 | 1000 | 2 | subsl_ass | 4,212.56 | 22200.48% |  
2 | 1000 | 2 | subls_ass | 34,074.85 | -41.92% |  
2 | 1000 | 2 | subsl_ass | 66.43 | 251.67% |  
3 | 1000 | 2 | subls_ass | 29,340.86 | -49.99% |  
3 | 1000 | 2 | subsl_ass | 5,972.25 | 31515.93% |  
5 | 1000 | 2 | subls_ass | 29,460.49 | -49.78% |  
5 | 1000 | 2 | subsl_ass | 65.32 | 245.79% |  

In addition to the Assigning operators this PR changes a few more things:
 * It changes the allocator bound on the non-assigning set operations to `A: Allocator + Default`.
 * I also added a benchmark suit for the set operations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants