-
Notifications
You must be signed in to change notification settings - Fork 888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use thrust::identity
as hash functions for byte pair encoding
#13665
Conversation
CMake changes will be reverted once rapidsai/rapids-cmake#435 is merged. |
thrust::identity
as hash functions for byte pair encoding
@@ -119,7 +119,7 @@ std::unique_ptr<detail::merge_pairs_map_type> initialize_merge_pairs_map( | |||
|
|||
merge_pairs_map->insert(iter, | |||
iter + input.size(), | |||
cuco::murmurhash3_32<cudf::hash_value_type>{}, | |||
thrust::identity<cudf::hash_value_type>{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the nature of this change? Is it just that it's pointless to be initializing the map with an actual hashed value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct. Previously, this was hashing a hashed value.
Also, so that the same hasher is used for both insert
and find
-- a bug that this fixes too.
/merge |
This PR bump the cuco version to the latest. Opened rapidsai/cudf#13665 to verify it won't break cudf. Opened rapidsai/raft#1641 to verify it won't break raft. rapidsai/cugraph#3692 to be addressed once this PR is merged. Authors: - Yunsong Wang (https://github.com/PointKernel) Approvers: - Robert Maynard (https://github.com/robertmaynard) URL: #435
Description
This PR fixes a minor issue that distinct hash functions are used for
insert
andfind
in byte pair encoding. It also verifies that the latest changes in cuco won't break cudf.Checklist