Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use thrust::identity as hash functions for byte pair encoding #13665

Merged
merged 10 commits into from
Jul 10, 2023

Conversation

PointKernel
Copy link
Member

@PointKernel PointKernel commented Jul 5, 2023

Description

This PR fixes a minor issue that distinct hash functions are used for insert and find in byte pair encoding. It also verifies that the latest changes in cuco won't break cudf.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@PointKernel PointKernel added the 5 - DO NOT MERGE Hold off on merging; see PR for details label Jul 5, 2023
@github-actions github-actions bot added libcudf Affects libcudf (C++/CUDA) code. CMake CMake build issue labels Jul 5, 2023
@PointKernel
Copy link
Member Author

CMake changes will be reverted once rapidsai/rapids-cmake#435 is merged.

@PointKernel PointKernel changed the title [DO NOT MERGE] Test the latest cuo Use thrust::identity as hash functions for byte pair encoding Jul 7, 2023
@PointKernel PointKernel added bug Something isn't working non-breaking Non-breaking change 3 - Ready for Review Ready for review by team and removed 5 - DO NOT MERGE Hold off on merging; see PR for details labels Jul 7, 2023
@PointKernel PointKernel marked this pull request as ready for review July 7, 2023 01:12
@PointKernel PointKernel requested review from a team as code owners July 7, 2023 01:12
@@ -119,7 +119,7 @@ std::unique_ptr<detail::merge_pairs_map_type> initialize_merge_pairs_map(

merge_pairs_map->insert(iter,
iter + input.size(),
cuco::murmurhash3_32<cudf::hash_value_type>{},
thrust::identity<cudf::hash_value_type>{},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the nature of this change? Is it just that it's pointless to be initializing the map with an actual hashed value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. Previously, this was hashing a hashed value.
Also, so that the same hasher is used for both insert and find -- a bug that this fixes too.

@PointKernel PointKernel self-assigned this Jul 7, 2023
@github-actions github-actions bot removed the CMake CMake build issue label Jul 10, 2023
@PointKernel
Copy link
Member Author

/merge

@rapids-bot rapids-bot bot merged commit 3c51c9e into rapidsai:branch-23.08 Jul 10, 2023
53 checks passed
@PointKernel PointKernel deleted the test-cuco branch July 10, 2023 23:28
rapids-bot bot pushed a commit to rapidsai/rapids-cmake that referenced this pull request Jul 12, 2023
This PR bump the cuco version to the latest.

Opened rapidsai/cudf#13665 to verify it won't break cudf.
Opened rapidsai/raft#1641 to verify it won't break raft.

rapidsai/cugraph#3692 to be addressed once this PR is merged.

Authors:
  - Yunsong Wang (https://github.com/PointKernel)

Approvers:
  - Robert Maynard (https://github.com/robertmaynard)

URL: #435
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants