Skip to content

[hipDNN] [FIX] suboptimal tensor filling #3471

Merged
SamuelReeder merged 6 commits into
ROCm:developfrom
umarinkovic:fix/hipdnn_fill_tensor
Dec 23, 2025
Merged

[hipDNN] [FIX] suboptimal tensor filling #3471
SamuelReeder merged 6 commits into
ROCm:developfrom
umarinkovic:fix/hipdnn_fill_tensor

Conversation

@umarinkovic
Copy link
Copy Markdown
Contributor

Motivation

Issue: #3051

Technical Details

Added branches for filling up packed tensors that don't compute indices for every single element when filling up the tensor with a fixed/random element.

Submission Checklist

@umarinkovic umarinkovic force-pushed the fix/hipdnn_fill_tensor branch from 3d76dbb to 7d040b8 Compare December 18, 2025 16:33
@assistant-librarian assistant-librarian Bot added the external contribution Code contribution from users community.. label Dec 18, 2025
@BrianHarrisonAMD
Copy link
Copy Markdown
Contributor

Thanks for the contribution!

@CMiservaAMD CMiservaAMD removed request for a team December 18, 2025 18:51
@CMiservaAMD
Copy link
Copy Markdown
Contributor

CMiservaAMD commented Dec 18, 2025

I'll investigate why the CI is failing on this PR. I'll merge latest develop into this PR to clear that flag and keep an eye on it.

Thanks again for your contribution.

Copy link
Copy Markdown
Contributor

@SamuelReeder SamuelReeder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the changes look good! Could you please add a sparse tensor test case for each modified method in TestTensor.cpp? The else branches are currently uncovered since it looks like these fill methods are only tested with packed tensors. I can also push these tests if you prefer.

@mousdahl-amd
Copy link
Copy Markdown
Contributor

I like this. This made me look into our ITensorIterator and TensorView a bit and now I wonder if we couldn't have packed / unpacked versions of them

@umarinkovic
Copy link
Copy Markdown
Contributor Author

Thanks, the changes look good! Could you please add a sparse tensor test case for each modified method in TestTensor.cpp? The else branches are currently uncovered since it looks like these fill methods are only tested with packed tensors. I can also push these tests if you prefer.

Sure, I'll add the test cases, no problem.

@SamuelReeder SamuelReeder merged commit 8258a7a into ROCm:develop Dec 23, 2025
33 of 36 checks passed
@umarinkovic
Copy link
Copy Markdown
Contributor Author

I like this. This made me look into our ITensorIterator and TensorView a bit and now I wonder if we couldn't have packed / unpacked versions of them

I've looked into optimizing this as well, seems to me like the most practical way to do so would be to introduce an IsPacked template argument to the iterator and the entire hierarchy of classes. How would you feel about that change? The packed/strided dynamic seems to me to be a compile-time thing anyway, you pass a list of dims to the Tensor, it's bound to be packed. The constructor with strides passed explicitly could be followed by an assert to make sure the strides don't end up producing a packed Tensor, that way the tensors are differentiated in their constructors, therefore the "packedness" is determined compile-time.

This change would definitely allow for a more optimized approach to everything.

@mousdahl-amd
Copy link
Copy Markdown
Contributor

I like that. We've got a backlog item to revisit / rework Tensor and the iterators, and we've added this note to it.

assistant-librarian Bot pushed a commit that referenced this pull request Jan 5, 2026
This pull request builds on #3267 by proving the "validation" infrastructure, the means to compare a set of `Outputs`.

The design of the validation infrastructure is relatively straight forward:
- Each SIGNATURE should come with a `validate()` implementation, which should be implemented in a similar way that the other functions/types from `testing.hpp` are implemented.
- `validate()` returns a `ValidationReport`, which is a structure that keeps all relevant information about comparing the tensors from two `Outputs`. Note that crucially, `validate()` should not do any reporting by itself. Rather, glue logic should be implemented by the user to turn `ValidationReport` into a relevant error message.
- You can see this clue code for CK-Builder itself in `testing_utils.hpp`, its `MatchesReference()`. This functionality is relatively barebones right now, it will be expanded upon in a different PR to keep the scope of this one down.

The comparison is done on the GPU (using an atomic for now), to keep tests relatively quick. Some notable items from this PR:
- To help compare the tensors and with writing tests, I've written a generic function `tensor_foreach` which invokes a callback on every element of a tensor.
- For that it was useful that the `TensorDescriptor` has a rank which is known at compile-time, so I've changed the implementation of `TensorDescriptor` for that. I felt like it was a better approach than keeping it dynamic, for multiple reasons:
  - This is C++ and we should use static typing where possible and useful. This way, we don't have to implement runtime assertions about the tensor rank.
  - We know already know the rank of tensors statically, as it can be derived from the SIGNATURE.
  - It simpifies the implementation of `tensor_foreach` and other comparison code.
- There are a lot of new tests for validating the validation implementation, validating validation validation tests (Only 3 recursive levels though...). For a few of those functions, I felt like it would be useful to expose them to the user.
- Doc comments everywhere.
BrianHarrisonAMD added a commit that referenced this pull request Jan 30, 2026
#3710)

## Motivation

Optimizing the tensor filling functions started a discussion about
optimizing tensor iteration in general:
#3471 (comment)

## Technical Details

After some deliberation, the approach taken here (using std::variant
inside the iterator to represent the different types of indexing)
reflects both the desire the improve iteration in the case of packed
tensors while also maintaining the existing API.

A fully templated approach would be more optimal but would require API
changes to the ITensor class itself, whether making it templated or
changing the definition of its iterator-related methods at the very
least.

## Test Plan

Ran ninja check inside the build folder of hipDNN.

## Test Result

```
[185/187] Running all tests via ctest
Test project /therock/output/build/ml-libs/hipDNN/build
    Start 1: hipdnn_data_sdk_tests
1/7 Test #1: hipdnn_data_sdk_tests ............   Passed    1.30 sec
    Start 2: hipdnn_backend_tests
2/7 Test #2: hipdnn_backend_tests .............   Passed    1.29 sec
    Start 3: hipdnn_frontend_tests
3/7 Test #3: hipdnn_frontend_tests ............   Passed    0.03 sec
    Start 4: hipdnn_test_sdk_tests
4/7 Test #4: hipdnn_test_sdk_tests ............   Passed    4.32 sec
    Start 5: hipdnn_plugin_sdk_tests
5/7 Test #5: hipdnn_plugin_sdk_tests ..........   Passed    0.03 sec
    Start 6: public_hipdnn_backend_tests
6/7 Test #6: public_hipdnn_backend_tests ......   Passed    0.33 sec
    Start 7: public_hipdnn_frontend_tests
7/7 Test #7: public_hipdnn_frontend_tests .....   Passed    0.26 sec

100% tests passed, 0 tests failed out of 7

Label Time Summary:
integration_test    =   0.59 sec*proc (2 tests)
unit_test           =   6.96 sec*proc (5 tests)

Total Test time (real) =   7.56 sec
```

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Co-authored-by: BrianHarrisonAMD <169072757+BrianHarrisonAMD@users.noreply.github.com>
ammallya pushed a commit that referenced this pull request Feb 3, 2026
This pull request builds on #3267 by proving the "validation" infrastructure, the means to compare a set of `Outputs`.

The design of the validation infrastructure is relatively straight forward:
- Each SIGNATURE should come with a `validate()` implementation, which should be implemented in a similar way that the other functions/types from `testing.hpp` are implemented.
- `validate()` returns a `ValidationReport`, which is a structure that keeps all relevant information about comparing the tensors from two `Outputs`. Note that crucially, `validate()` should not do any reporting by itself. Rather, glue logic should be implemented by the user to turn `ValidationReport` into a relevant error message.
- You can see this clue code for CK-Builder itself in `testing_utils.hpp`, its `MatchesReference()`. This functionality is relatively barebones right now, it will be expanded upon in a different PR to keep the scope of this one down.

The comparison is done on the GPU (using an atomic for now), to keep tests relatively quick. Some notable items from this PR:
- To help compare the tensors and with writing tests, I've written a generic function `tensor_foreach` which invokes a callback on every element of a tensor.
- For that it was useful that the `TensorDescriptor` has a rank which is known at compile-time, so I've changed the implementation of `TensorDescriptor` for that. I felt like it was a better approach than keeping it dynamic, for multiple reasons:
  - This is C++ and we should use static typing where possible and useful. This way, we don't have to implement runtime assertions about the tensor rank.
  - We know already know the rank of tensors statically, as it can be derived from the SIGNATURE.
  - It simpifies the implementation of `tensor_foreach` and other comparison code.
- There are a lot of new tests for validating the validation implementation, validating validation validation tests (Only 3 recursive levels though...). For a few of those functions, I felt like it would be useful to expose them to the user.
- Doc comments everywhere.

[ROCm/composable_kernel commit: e6e7dc2]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external contribution Code contribution from users community.. project: hipdnn

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[hipDNN]: Suboptimal packed tensor filling

5 participants