[hipDNN] [FIX] suboptimal tensor filling by umarinkovic · Pull Request #3471 · ROCm/rocm-libraries

umarinkovic · 2025-12-18T16:32:34Z

Motivation

Technical Details

Added branches for filling up packed tensors that don't compute indices for every single element when filling up the tensor with a fixed/random element.

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

BrianHarrisonAMD · 2025-12-18T18:40:58Z

Thanks for the contribution!

CMiservaAMD · 2025-12-18T19:06:08Z

I'll investigate why the CI is failing on this PR. I'll merge latest develop into this PR to clear that flag and keep an eye on it.

Thanks again for your contribution.

SamuelReeder

Thanks, the changes look good! Could you please add a sparse tensor test case for each modified method in TestTensor.cpp? The else branches are currently uncovered since it looks like these fill methods are only tested with packed tensors. I can also push these tests if you prefer.

mousdahl-amd · 2025-12-19T21:13:40Z

I like this. This made me look into our ITensorIterator and TensorView a bit and now I wonder if we couldn't have packed / unpacked versions of them

umarinkovic · 2025-12-22T08:52:07Z

Thanks, the changes look good! Could you please add a sparse tensor test case for each modified method in TestTensor.cpp? The else branches are currently uncovered since it looks like these fill methods are only tested with packed tensors. I can also push these tests if you prefer.

Sure, I'll add the test cases, no problem.

umarinkovic · 2025-12-24T15:55:47Z

I like this. This made me look into our ITensorIterator and TensorView a bit and now I wonder if we couldn't have packed / unpacked versions of them

I've looked into optimizing this as well, seems to me like the most practical way to do so would be to introduce an IsPacked template argument to the iterator and the entire hierarchy of classes. How would you feel about that change? The packed/strided dynamic seems to me to be a compile-time thing anyway, you pass a list of dims to the Tensor, it's bound to be packed. The constructor with strides passed explicitly could be followed by an assert to make sure the strides don't end up producing a packed Tensor, that way the tensors are differentiated in their constructors, therefore the "packedness" is determined compile-time.

This change would definitely allow for a more optimized approach to everything.

mousdahl-amd · 2025-12-24T19:49:27Z

I like that. We've got a backlog item to revisit / rework Tensor and the iterators, and we've added this note to it.

This pull request builds on #3267 by proving the "validation" infrastructure, the means to compare a set of `Outputs`. The design of the validation infrastructure is relatively straight forward: - Each SIGNATURE should come with a `validate()` implementation, which should be implemented in a similar way that the other functions/types from `testing.hpp` are implemented. - `validate()` returns a `ValidationReport`, which is a structure that keeps all relevant information about comparing the tensors from two `Outputs`. Note that crucially, `validate()` should not do any reporting by itself. Rather, glue logic should be implemented by the user to turn `ValidationReport` into a relevant error message. - You can see this clue code for CK-Builder itself in `testing_utils.hpp`, its `MatchesReference()`. This functionality is relatively barebones right now, it will be expanded upon in a different PR to keep the scope of this one down. The comparison is done on the GPU (using an atomic for now), to keep tests relatively quick. Some notable items from this PR: - To help compare the tensors and with writing tests, I've written a generic function `tensor_foreach` which invokes a callback on every element of a tensor. - For that it was useful that the `TensorDescriptor` has a rank which is known at compile-time, so I've changed the implementation of `TensorDescriptor` for that. I felt like it was a better approach than keeping it dynamic, for multiple reasons: - This is C++ and we should use static typing where possible and useful. This way, we don't have to implement runtime assertions about the tensor rank. - We know already know the rank of tensors statically, as it can be derived from the SIGNATURE. - It simpifies the implementation of `tensor_foreach` and other comparison code. - There are a lot of new tests for validating the validation implementation, validating validation validation tests (Only 3 recursive levels though...). For a few of those functions, I felt like it would be useful to expose them to the user. - Doc comments everywhere.

#3710) ## Motivation Optimizing the tensor filling functions started a discussion about optimizing tensor iteration in general: #3471 (comment) ## Technical Details After some deliberation, the approach taken here (using std::variant inside the iterator to represent the different types of indexing) reflects both the desire the improve iteration in the case of packed tensors while also maintaining the existing API. A fully templated approach would be more optimal but would require API changes to the ITensor class itself, whether making it templated or changing the definition of its iterator-related methods at the very least. ## Test Plan Ran ninja check inside the build folder of hipDNN. ## Test Result ``` [185/187] Running all tests via ctest Test project /therock/output/build/ml-libs/hipDNN/build Start 1: hipdnn_data_sdk_tests 1/7 Test #1: hipdnn_data_sdk_tests ............ Passed 1.30 sec Start 2: hipdnn_backend_tests 2/7 Test #2: hipdnn_backend_tests ............. Passed 1.29 sec Start 3: hipdnn_frontend_tests 3/7 Test #3: hipdnn_frontend_tests ............ Passed 0.03 sec Start 4: hipdnn_test_sdk_tests 4/7 Test #4: hipdnn_test_sdk_tests ............ Passed 4.32 sec Start 5: hipdnn_plugin_sdk_tests 5/7 Test #5: hipdnn_plugin_sdk_tests .......... Passed 0.03 sec Start 6: public_hipdnn_backend_tests 6/7 Test #6: public_hipdnn_backend_tests ...... Passed 0.33 sec Start 7: public_hipdnn_frontend_tests 7/7 Test #7: public_hipdnn_frontend_tests ..... Passed 0.26 sec 100% tests passed, 0 tests failed out of 7 Label Time Summary: integration_test = 0.59 sec*proc (2 tests) unit_test = 6.96 sec*proc (5 tests) Total Test time (real) = 7.56 sec ``` ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. --------- Co-authored-by: BrianHarrisonAMD <169072757+BrianHarrisonAMD@users.noreply.github.com>

This pull request builds on #3267 by proving the "validation" infrastructure, the means to compare a set of `Outputs`. The design of the validation infrastructure is relatively straight forward: - Each SIGNATURE should come with a `validate()` implementation, which should be implemented in a similar way that the other functions/types from `testing.hpp` are implemented. - `validate()` returns a `ValidationReport`, which is a structure that keeps all relevant information about comparing the tensors from two `Outputs`. Note that crucially, `validate()` should not do any reporting by itself. Rather, glue logic should be implemented by the user to turn `ValidationReport` into a relevant error message. - You can see this clue code for CK-Builder itself in `testing_utils.hpp`, its `MatchesReference()`. This functionality is relatively barebones right now, it will be expanded upon in a different PR to keep the scope of this one down. The comparison is done on the GPU (using an atomic for now), to keep tests relatively quick. Some notable items from this PR: - To help compare the tensors and with writing tests, I've written a generic function `tensor_foreach` which invokes a callback on every element of a tensor. - For that it was useful that the `TensorDescriptor` has a rank which is known at compile-time, so I've changed the implementation of `TensorDescriptor` for that. I felt like it was a better approach than keeping it dynamic, for multiple reasons: - This is C++ and we should use static typing where possible and useful. This way, we don't have to implement runtime assertions about the tensor rank. - We know already know the rank of tensors statically, as it can be derived from the SIGNATURE. - It simpifies the implementation of `tensor_foreach` and other comparison code. - There are a lot of new tests for validating the validation implementation, validating validation validation tests (Only 3 recursive levels though...). For a few of those functions, I felt like it would be useful to expose them to the user. - Doc comments everywhere. [ROCm/composable_kernel commit: e6e7dc2]

umarinkovic requested review from a team as code owners December 18, 2025 16:32

github-actions Bot added project: hipblas project: rocblas project: hipdnn labels Dec 18, 2025

umarinkovic added 2 commits December 18, 2025 16:33

optimize fillWithValue for packed tensors

4e44f5d

optimize fillWithRandomValues for packed tensors

7d040b8

umarinkovic force-pushed the fix/hipdnn_fill_tensor branch from 3d76dbb to 7d040b8 Compare December 18, 2025 16:33

assistant-librarian Bot added the external contribution Code contribution from users community.. label Dec 18, 2025

fix typo in element count

f8128df

SamuelReeder linked an issue Dec 18, 2025 that may be closed by this pull request

[hipDNN]: Suboptimal packed tensor filling #3051

Closed

BrianHarrisonAMD removed project: hipblas project: rocblas labels Dec 18, 2025

CMiservaAMD removed request for a team December 18, 2025 18:51

Merge branch 'develop' into fix/hipdnn_fill_tensor

26bc697

SamuelReeder requested changes Dec 19, 2025

View reviewed changes

Merge branch 'develop' into fix/hipdnn_fill_tensor

8aece2c

Added tests for non-packed codepaths

0d2f629

umarinkovic requested a review from SamuelReeder December 22, 2025 15:53

SamuelReeder approved these changes Dec 23, 2025

View reviewed changes

SamuelReeder merged commit 8258a7a into ROCm:develop Dec 23, 2025
33 of 36 checks passed

SamuelReeder mentioned this pull request Dec 23, 2025

[hipDNN]: Suboptimal packed tensor filling #3051

Closed

umarinkovic mentioned this pull request Jan 8, 2026

[enhancement] differentiating strided vs packed tensors when iterating #3710

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[hipDNN] [FIX] suboptimal tensor filling #3471

[hipDNN] [FIX] suboptimal tensor filling #3471
SamuelReeder merged 6 commits into
ROCm:developfrom
umarinkovic:fix/hipdnn_fill_tensor

umarinkovic commented Dec 18, 2025

Uh oh!

BrianHarrisonAMD commented Dec 18, 2025

Uh oh!

CMiservaAMD commented Dec 18, 2025 •

edited

Loading

Uh oh!

SamuelReeder left a comment •

edited

Loading

Uh oh!

mousdahl-amd commented Dec 19, 2025

Uh oh!

umarinkovic commented Dec 22, 2025

Uh oh!

Uh oh!

umarinkovic commented Dec 24, 2025

Uh oh!

mousdahl-amd commented Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

umarinkovic commented Dec 18, 2025

Motivation

Technical Details

Submission Checklist

Uh oh!

BrianHarrisonAMD commented Dec 18, 2025

Uh oh!

CMiservaAMD commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SamuelReeder left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mousdahl-amd commented Dec 19, 2025

Uh oh!

umarinkovic commented Dec 22, 2025

Uh oh!

Uh oh!

umarinkovic commented Dec 24, 2025

Uh oh!

mousdahl-amd commented Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

CMiservaAMD commented Dec 18, 2025 •

edited

Loading

SamuelReeder left a comment •

edited

Loading