Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

64-bit indexing for DeviceSegmentedReduce #414

Merged
merged 21 commits into from
Sep 27, 2023

Conversation

jecs
Copy link
Contributor

@jecs jecs commented Sep 7, 2023

Description

  • Added NumSegmentsT parameter to DeviceSegmentedReduce methods, so now num_segments can be of any integral type
  • Added OffsetT type resolution in a manner similar to DeviceReduce: set to std::common_type_t of BeginOffsetIterT::value_type and EndOffsetIterT::value_type. (see detail/choose_offset.cuh)
  • In a manner similar to DeviceReduce, we use static_cast to convert num_segments to OffsetT before dispatching
  • OffsetT type resolution uses std::common_type_t, std::void_t, and one inline constexpr variable (is_iterator_v).

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

* Added NumSegmentsT parameter to methods, so now num_segments can be
  of any integral type
* Added OffsetT type resolution in a manner similar to DeviceReduce:
  set to std::common_type_t of BeginOffsetIterT::value_type and
  EndOffsetIterT::value_type. (see detail/choose_offset.cuh)
* In a manner similar to DeviceReduce, we use static_cast to convert
  num_segments to OffsetT before dispatching
* OffsetT type resolution uses std::common_type_t, std::void_t, and
  one inline constexpr variable (is_iterator_v).
@jecs jecs requested review from a team as code owners September 7, 2023 18:03
@jecs jecs requested review from elstehle and griwes and removed request for a team September 7, 2023 18:03
@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 7, 2023

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@jecs
Copy link
Contributor Author

jecs commented Sep 7, 2023

Hi, should I create an Issue and add it to the description? I originally submitted this PR to the old Cub repository.

Copy link
Collaborator

@gevtushenko gevtushenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution! I've left a few comments, but before proceeding with the review, we'll need a test case that covers 64-bit problem sizes. Could you please add one? In order to keep moderate memory usage, you could try to rely on counting or constant iterators instead of actually allocating huge inputs.

cub/cub/device/dispatch/dispatch_reduce.cuh Outdated Show resolved Hide resolved
cub/cub/detail/choose_offset.cuh Outdated Show resolved Hide resolved
cub/cub/detail/choose_offset.cuh Outdated Show resolved Hide resolved
@jecs
Copy link
Contributor Author

jecs commented Sep 8, 2023

Thank you for the contribution! I've left a few comments, but before proceeding with the review, we'll need a test case that covers 64-bit problem sizes. Could you please add one? In order to keep moderate memory usage, you could try to rely on counting or constant iterators instead of actually allocating huge inputs.

Thank you for the feedback! Yes, sure thing. I'll write a quick test using some kind of counter iterator- or constant iterator-based approach.

Copy link
Collaborator

@miscco miscco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the contribution, I am so sorry about our support situation

cub/cub/detail/choose_offset.cuh Outdated Show resolved Hide resolved
cub/cub/detail/choose_offset.cuh Outdated Show resolved Hide resolved
cub/cub/detail/choose_offset.cuh Outdated Show resolved Hide resolved
cub/cub/detail/choose_offset.cuh Outdated Show resolved Hide resolved
cub/cub/detail/choose_offset.cuh Outdated Show resolved Hide resolved
cub/cub/detail/choose_offset.cuh Outdated Show resolved Hide resolved
cub/cub/detail/choose_offset.cuh Outdated Show resolved Hide resolved
cub/cub/detail/choose_offset.cuh Outdated Show resolved Hide resolved
* Removed NumSegmentsT portion
* Added catch2 test
* Modified previous test slightly to use correct type for
  num_segments
* Made type deduction SFINAE-friendly
@jecs
Copy link
Contributor Author

jecs commented Sep 11, 2023

@miscco @senior-zero Ready. (I hope.) I walked back 64-bit num_segments. I hope to submit that in a new Pull Request in the near future. For the test, I simply copied the previous test and modified it slightly.

@jecs jecs requested a review from miscco September 12, 2023 20:56
@miscco
Copy link
Collaborator

miscco commented Sep 13, 2023

/ok to test

@jecs
Copy link
Contributor Author

jecs commented Sep 13, 2023

Whoops! I deleted the offending lines and ran C++11 tests on my end. Everything seems to work now. Getting your code to be compliant with several standards is tricky business.

@jecs jecs requested a review from gevtushenko September 13, 2023 13:43
@miscco
Copy link
Collaborator

miscco commented Sep 13, 2023

/ok to test

@jecs
Copy link
Contributor Author

jecs commented Sep 13, 2023

All set?

@gevtushenko
Copy link
Collaborator

All set?

Not yet, I have a few comments. Will complete the review soon.

Copy link
Collaborator

@gevtushenko gevtushenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the work! There are a few minor changes that has to be made before we merge this.

cub/cub/detail/choose_offset.cuh Outdated Show resolved Hide resolved
cub/cub/detail/choose_offset.cuh Outdated Show resolved Hide resolved
cub/cub/detail/choose_offset.cuh Outdated Show resolved Hide resolved
* Incorporated suggested changes to
  cub/test/catch2_test_device_segmented_reduce_iterators_64bit.cu
  and greatly simplified the test itself.
* Ran clang-format of LLVM 16 using cub/.clang-format on modified
  files
@jecs
Copy link
Contributor Author

jecs commented Sep 14, 2023

So, did I misuse clang-format? It seems to have changed a lot more than expected, but the files look clean. It's my first time using the command. Are there options to avoid formatting comments? I used a standalone clang-format binary from LLVM 16.

@jecs jecs requested a review from gevtushenko September 14, 2023 19:27
@gevtushenko
Copy link
Collaborator

@jecs your repo seems to be protected. I've filed a PR with a few fixes: jecs#1 for your branch

@gevtushenko
Copy link
Collaborator

/ok to test

@jecs jecs requested a review from gevtushenko September 20, 2023 18:49
@jecs
Copy link
Contributor Author

jecs commented Sep 21, 2023

@senior-zero I added you as a collaborator on my fork in case you'd like to make more changes.

@gevtushenko
Copy link
Collaborator

/ok to test

@gevtushenko gevtushenko merged commit 94b358b into NVIDIA:main Sep 27, 2023
@jecs
Copy link
Contributor Author

jecs commented Sep 27, 2023

@senior-zero @miscco Thank you both so much for your excellent feedback and for guiding me through the process! I learned a lot from this experience. It's my first time submitting a pull request to a public project like this. And thank you for teaching me about the "forbidden fruit" hidden within ::detail and also for taking the wheel towards the end.

@gevtushenko
Copy link
Collaborator

@jecs thank you for the contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants