-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[ty] ensure IntersectionBuilder does not have multiple identical InnerIntersectionBuilders
#22055
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Diagnostic diff on typing conformance testsNo changes detected when running ty on typing conformance tests ✅ |
|
|
| Lint rule | Added | Removed | Changed |
|---|---|---|---|
invalid-argument-type |
0 | 2 | 2 |
invalid-return-type |
0 | 0 | 3 |
| Total | 0 | 2 | 5 |
CodSpeed Performance ReportMerging #22055 will not alter performanceComparing Summary
Footnotes
|
| // create a union of intersections. | ||
| intersections: Vec<InnerIntersectionBuilder<'db>>, | ||
| /// Stores hash values of `intersections` to prevent adding identical `InnerIntersectionBuilder`s. | ||
| intersection_hashes: FxHashSet<u64>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems risky to rely solely on the hash value, given the possibility of hash collisions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Especially since my understanding is that FxHasher is optimized for speed, not for collision-resistance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the hash function were ideal, the probability of collisions would be almost negligible in this case (where the number of elements is only about 100).
The collision probability is calculated using the same formula as the birthday paradox. If the hash space is
Since
The problem is that fxhash is not an ideal hash function. In this case, the actual effective hash space may be much smaller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a cryptographic hash function seems too expensive, so it's better to simply use an equality check to remove duplicates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could intersections be a FxIndexMap ir is the issue that we keep updating the entries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't use hashmaps because we're using intersections as a mutable iterator.
|
It looks like pydantic has a huge performance improvement here, but most other benchmarks regress: https://codspeed.io/astral-sh/ruff/branches/mtshiba%3Aintersection-hash?utm_source=github&utm_medium=comment&utm_content=header. I'd be interested in seeing if we still see the huge performance improvement on pydantic after #22044 has been merged. |
|
Can you rebase on |
a3cacc3 to
ffe7422
Compare
Hmm, it seems that the performance improvement on pydantic has disappeared after merging #22044.
|
659334e to
2dd406c
Compare
|
This change no longer appears to have significant standalone benefits, so I'm sort-of inclined to say it should again just be part of #17371 rather than being a standalone PR |
Summary
From #17371
See the comments in #17371 for the motivation for this change.
Test Plan
N/A