Skip to content

[enhancement] differentiating strided vs packed tensors when iterating#3710

Merged
BrianHarrisonAMD merged 9 commits into
ROCm:developfrom
umarinkovic:enhancement/itensor_iterator_variant
Jan 30, 2026
Merged

[enhancement] differentiating strided vs packed tensors when iterating#3710
BrianHarrisonAMD merged 9 commits into
ROCm:developfrom
umarinkovic:enhancement/itensor_iterator_variant

Conversation

@umarinkovic
Copy link
Copy Markdown
Contributor

Motivation

Optimizing the tensor filling functions started a discussion about optimizing tensor iteration in general:
#3471 (comment)

Technical Details

After some deliberation, the approach taken here (using std::variant inside the iterator to represent the different types of indexing) reflects both the desire the improve iteration in the case of packed tensors while also maintaining the existing API.

A fully templated approach would be more optimal but would require API changes to the ITensor class itself, whether making it templated or changing the definition of its iterator-related methods at the very least.

Test Plan

Ran ninja check inside the build folder of hipDNN.

Test Result

[185/187] Running all tests via ctest
Test project /therock/output/build/ml-libs/hipDNN/build
    Start 1: hipdnn_data_sdk_tests
1/7 Test #1: hipdnn_data_sdk_tests ............   Passed    1.30 sec
    Start 2: hipdnn_backend_tests
2/7 Test #2: hipdnn_backend_tests .............   Passed    1.29 sec
    Start 3: hipdnn_frontend_tests
3/7 Test #3: hipdnn_frontend_tests ............   Passed    0.03 sec
    Start 4: hipdnn_test_sdk_tests
4/7 Test #4: hipdnn_test_sdk_tests ............   Passed    4.32 sec
    Start 5: hipdnn_plugin_sdk_tests
5/7 Test #5: hipdnn_plugin_sdk_tests ..........   Passed    0.03 sec
    Start 6: public_hipdnn_backend_tests
6/7 Test #6: public_hipdnn_backend_tests ......   Passed    0.33 sec
    Start 7: public_hipdnn_frontend_tests
7/7 Test #7: public_hipdnn_frontend_tests .....   Passed    0.26 sec

100% tests passed, 0 tests failed out of 7

Label Time Summary:
integration_test    =   0.59 sec*proc (2 tests)
unit_test           =   6.96 sec*proc (5 tests)

Total Test time (real) =   7.56 sec

Submission Checklist

@umarinkovic umarinkovic requested a review from a team as a code owner January 8, 2026 15:01
@assistant-librarian assistant-librarian Bot added the external contribution Code contribution from users community.. label Jan 8, 2026
@umarinkovic
Copy link
Copy Markdown
Contributor Author

Some notes:

Using std::variant trims one branch in the case of incrementing a packed tensor and adds one branch in the case of a non-packed tensor. It also adds a branch to the dereferencing of the iterators. In general the branches added by std::variant are predictable but so were the trimmed branches so it seems like a net-zero at first glance. The real benefit is that packed tensors don't compute their index over and over again when dereferencing the iterator, which is one inner product of indices and strides per iteration.

The other approach of making the iterator a templated class with the IsPacked parameter would require the API to change in a non-trivial way, either modifying ITensor to also be a templated class which imo seems counterintuitive, or at the very least getting rid of its virtual begin(), end() etc. methods and shadowing them from more concrete implementations. This would be marginally better for performance but it would require ITensor to use the most universal iterator which is the current implementation. So the optimization would be available to those who explicitly ask for it but anyone just using ITensor to iterate would be stuck with the unoptimized version.

Side question, what is the purpose of overriding assignment operators in the ITensorIterator class? Since the class holds a reference which cannot be reassigned, it effectively calls the copy-assignment operator of the underlying tensor. From my POV it seems kind of dangerous, no? Although I'm not entirely familiar with the logic behind it so I might be mistaken. I noticed that because that overridden operator forced me to use tensor pointers instead of tensor references inside my index structs.

@adickin-amd
Copy link
Copy Markdown
Contributor

Side question, what is the purpose of overriding assignment operators in the ITensorIterator class? Since the class holds a reference which cannot be reassigned, it effectively calls the copy-assignment operator of the underlying tensor. From my POV it seems kind of dangerous, no? Although I'm not entirely familiar with the logic behind it so I might be mistaken. I noticed that because that overridden operator forced me to use tensor pointers instead of tensor references inside my index structs.

Great observation, we don't need those assignment operators overridden and they can be removed. I went through several development iterations trying to understand how to build the iterator well, its possible at some point they were valid. Thank you for catching this. Do you want to remove it as part of this PR or shall I open up a new issue to deal with this later?

@umarinkovic
Copy link
Copy Markdown
Contributor Author

Do you want to remove it as part of this PR or shall I open up a new issue to deal with this later?

I can address it in this PR, I think it also makes for a cleaner implementation of this solution if references can be used in place of pointers.

@adickin-amd adickin-amd requested a review from Copilot January 9, 2026 15:54
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes tensor iteration by differentiating between packed (contiguous memory) and strided (non-contiguous) tensors. The optimization uses std::variant to represent different indexing strategies while maintaining the existing API.

Changes:

  • Refactored ITensorIterator to use std::variant containing either LinearIndex (for packed tensors) or CompositeIndex (for strided tensors)
  • Simplified fillWithValue() and fillRandom() methods to use the new iterator implementation instead of conditional logic
  • Removed copy assignment and move constructor tests that are no longer applicable
  • Updated and added tests to validate both packed and strided tensor iteration paths

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
projects/hipdnn/data_sdk/include/hipdnn_data_sdk/utilities/Tensor.hpp Refactored iterator to use variant-based indexing and simplified tensor filling methods
projects/hipdnn/data_sdk/tests/utilities/TestTensorIterator.cpp Removed obsolete tests and added new tests for linear and composite index access patterns
projects/hipdnn/data_sdk/tests/utilities/TestTensorView.cpp Removed copy assignment test that is no longer valid

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread projects/hipdnn/data_sdk/include/hipdnn_data_sdk/utilities/Tensor.hpp Outdated
@adickin-amd
Copy link
Copy Markdown
Contributor

I did some debugging of why the linux builds failed and remembered that windows has all of our clang-tidy checks disabled during the build since we had major compilation time increases if we didn't. We should have added a warning to mention this so I will log an issue for this.

In order to fix the build you'll need to build with -DENABLE_CLANG_TIDY=ON so you can see the build errors. If you encounter issues getting the errors fixed, ill be happy to help you out.

@umarinkovic
Copy link
Copy Markdown
Contributor Author

I did some debugging of why the linux builds failed and remembered that windows has all of our clang-tidy checks disabled during the build since we had major compilation time increases if we didn't. We should have added a warning to mention this so I will log an issue for this.

In order to fix the build you'll need to build with -DENABLE_CLANG_TIDY=ON so you can see the build errors. If you encounter issues getting the errors fixed, ill be happy to help you out.

Hi, I ran clang-tidy during the build. However, at least on my end I see a bunch of errors unrelated to the changes I made in this PR. Even running clang-tidy on just these files produces an error in Tensor.hpp with regards to the #pragma directive. That being said, I did identify all the clang-tidy warnings/errors caused by my changes and fixed them.

@umarinkovic umarinkovic force-pushed the enhancement/itensor_iterator_variant branch from 996c4cd to d668444 Compare January 12, 2026 11:42
Copy link
Copy Markdown
Contributor

@adickin-amd adickin-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great optimization. Thank you for your contribution. LGTM

@adickin-amd
Copy link
Copy Markdown
Contributor

I did some debugging of why the linux builds failed and remembered that windows has all of our clang-tidy checks disabled during the build since we had major compilation time increases if we didn't. We should have added a warning to mention this so I will log an issue for this.
In order to fix the build you'll need to build with -DENABLE_CLANG_TIDY=ON so you can see the build errors. If you encounter issues getting the errors fixed, ill be happy to help you out.

Hi, I ran clang-tidy during the build. However, at least on my end I see a bunch of errors unrelated to the changes I made in this PR. Even running clang-tidy on just these files produces an error in Tensor.hpp with regards to the #pragma directive. That being said, I did identify all the clang-tidy warnings/errors caused by my changes and fixed them.

Thank you for fixing the tidy errors despite the additional tidy warnings you got This is likely due to a clang-tidy version mismatch, we are on version 20 and newer versions have warnings for the pragma directives. I have an incoming PR which will exclude these new clang-tidy warnings so we can use newer versions without issues.

@adickin-amd
Copy link
Copy Markdown
Contributor

Looks like windows doesnt like the deleted assignment operators on the CI. Attached is the relevant logs. Ill look to debug it when I can.

2026-01-12T23:53:25.7935554Z [hipDNN] ccache B:\build\core\clr\dist\lib\llvm\bin\clang++.exe -DCOMPONENT_NAME=\"hipdnn_frontend\" -DFMT_HEADER_ONLY -DSPDLOG_FMT_EXTERNAL -DUSE_PROF_API=1 -D__HIPCC__ -D__HIP_PLATFORM_AMD__ -D__HIP_PLATFORM_AMD__=1 -IC:/home/runner/_work/rocm-libraries/rocm-libraries/projects/hipdnn/data_sdk/include -IC:/home/runner/_work/rocm-libraries/rocm-libraries/projects/hipdnn/test_sdk/include -IC:/home/runner/_work/rocm-libraries/rocm-libraries/projects/hipdnn/frontend/include -IC:/home/runner/_work/rocm-libraries/rocm-libraries/projects/hipdnn/backend/include -IB:/build/ml-libs/hipDNN/build/backend/include -isystem B:/build/third-party/googletest/dist/include -isystem B:/build/third-party/flatbuffers/dist/include -isystem B:/build/third-party/spdlog/dist/include -isystem B:/build/third-party/nlohmann-json/dist/include -isystem B:/build/third-party/fmt/dist/include -isystem B:/build/core/clr/dist/include -DWIN32 -DWIN32_LEAN_AND_MEAN -D_CRT_SECURE_NO_WARNINGS -DNOMINMAX -fms-extensions -fms-compatibility -D_ENABLE_EXTENDED_ALIGNED_STORAGE  -Wno-documentation-unknown-command -Wno-documentation-pedantic -Wno-unused-command-line-argument -Wno-explicit-specialization-storage-class -Wno-ignored-attributes -Wno-unknown-attributes -Wno-duplicate-decl-specifier --hip-path=B:/build/core/clr/dist --hip-device-lib-path=B:/build/core/clr/dist/lib/llvm/amdgcn/bitcode -O3 -DNDEBUG -std=gnu++17 -D_DLL -D_MT -Xclang --dependent-lib=msvcrt -Werror -Wall -Wextra -Wpedantic -Wshadow -Wnon-virtual-dtor -Wold-style-cast -Wcast-align -Woverloaded-virtual -Wconversion -Wsign-conversion -Wnull-dereference -Wdouble-promotion -Wformat=2 -Winit-self -Wunreachable-code -Wno-error=unused-command-line-argument -Wswitch-default -MD -MT data_sdk/tests/CMakeFiles/hipdnn_data_sdk_tests.dir/utilities/TestTensorView.cpp.obj -MF data_sdk\tests\CMakeFiles\hipdnn_data_sdk_tests.dir\utilities\TestTensorView.cpp.obj.d -o data_sdk/tests/CMakeFiles/hipdnn_data_sdk_tests.dir/utilities/TestTensorView.cpp.obj -c C:/home/runner/_work/rocm-libraries/rocm-libraries/projects/hipdnn/data_sdk/tests/utilities/TestTensorView.cpp
2026-01-12T23:53:25.7945806Z [hipDNN] In file included from C:/home/runner/_work/rocm-libraries/rocm-libraries/projects/hipdnn/data_sdk/tests/utilities/TestTensorView.cpp:4:
2026-01-12T23:53:25.7946774Z [hipDNN] In file included from B:/build/third-party/googletest/dist/include\gtest/gtest.h:55:
2026-01-12T23:53:25.7947603Z [hipDNN] In file included from C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.44.35207\include\memory:14:
2026-01-12T23:53:25.7948576Z [hipDNN] In file included from C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.44.35207\include\xmemory:15:
2026-01-12T23:53:25.7950177Z [hipDNN] C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.44.35207\include\xutility:1481:13: error: object of type 'hipdnn_data_sdk::utilities::TensorViewIterator<float>' cannot be assigned because its copy assignment operator is implicitly deleted
2026-01-12T23:53:25.7951469Z [hipDNN]  1481 |         _It = _STD forward<_UIter>(_UIt);
2026-01-12T23:53:25.7951799Z [hipDNN]       |             ^
2026-01-12T23:53:25.7953228Z [hipDNN] C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.44.35207\include\algorithm:3803:10: note: in instantiation of function template specialization 'std::_Seek_wrapped<hipdnn_data_sdk::utilities::TensorViewIterator<float>, hipdnn_data_sdk::utilities::TensorViewIterator<float> &>' requested here
2026-01-12T23:53:25.7954750Z [hipDNN]  3803 |     _STD _Seek_wrapped(_Dest, _UDest);
2026-01-12T23:53:25.7955065Z [hipDNN]       |          ^
2026-01-12T23:53:25.7957113Z [hipDNN] C:/home/runner/_work/rocm-libraries/rocm-libraries/projects/hipdnn/data_sdk/tests/utilities/TestTensorView.cpp:513:10: note: in instantiation of function template specialization 'std::transform<hipdnn_data_sdk::utilities::TensorViewIterator<float>, hipdnn_data_sdk::utilities::TensorViewIterator<float>, (lambda at C:\home\runner\_work\rocm-libraries\rocm-libraries\projects\hipdnn\data_sdk\tests\utilities\TestTensorView.cpp:513:60)>' requested here
2026-01-12T23:53:25.7959416Z [hipDNN]   513 |     std::transform(view.begin(), view.end(), view.begin(), [](float val) { return val * 2.0f; });
2026-01-12T23:53:25.7959940Z [hipDNN]       |          ^
2026-01-12T23:53:25.7961435Z [hipDNN] C:/home/runner/_work/rocm-libraries/rocm-libraries/projects/hipdnn/data_sdk/include\hipdnn_data_sdk/utilities/TensorView.hpp:62:30: note: copy assignment operator of 'TensorViewIterator<float>' is implicitly deleted because field '_iter' has a deleted copy assignment operator
2026-01-12T23:53:25.7962800Z [hipDNN]    62 |     ITensorIterator<IsConst> _iter;
2026-01-12T23:53:25.7963132Z [hipDNN]       |                              ^
2026-01-12T23:53:25.7964046Z [hipDNN] C:/home/runner/_work/rocm-libraries/rocm-libraries/projects/hipdnn/data_sdk/include\hipdnn_data_sdk/utilities/Tensor.hpp:94:22: note: 'operator=' has been explicitly marked deleted here
2026-01-12T23:53:25.7965135Z [hipDNN]    94 |     ITensorIterator& operator=(const ITensorIterator& other) = delete;
2026-01-12T23:53:25.7965596Z [hipDNN]       |                      ^
2026-01-12T23:53:25.7965782Z 
2026-01-12T23:53:25.7965867Z [hipDNN] 1 error generated.

@umarinkovic
Copy link
Copy Markdown
Contributor Author

umarinkovic commented Jan 13, 2026

Looks like windows doesnt like the deleted assignment operators on the CI. Attached is the relevant logs. Ill look to debug it when I can.

std::transform on Windows (MSVC) uses the copy assignment operator down the line I guess. The options I see here are either disable that test on Windows from compiling or removing it entirely, or writing it in a way that doesn't use std::transform, although it seems to me that the entire purpose of the test is to test the behavior of std::transform. Alternatively, we could add the copy/move assignment operators back but imho that would also require switching from using references to only using pointers since references can't be re-assigned after initialization.

Here's the test causing the error:

TEST(TestTensorView, StdTransform)
{
Tensor<float> tensor({4});
TensorView<float> view(tensor);
// Fill with initial values
std::iota(view.begin(), view.end(), 1.0f);
// Double all values
std::transform(view.begin(), view.end(), view.begin(), [](float val) { return val * 2.0f; });
// Verify
int idx = 0;
for(float& value : view)
{
EXPECT_FLOAT_EQ(value, static_cast<float>((idx + 1) * 2));
++idx;
}
}

@BrianHarrisonAMD
Copy link
Copy Markdown
Contributor

Looks like windows doesnt like the deleted assignment operators on the CI. Attached is the relevant logs. Ill look to debug it when I can.

std::transform on Windows (MSVC) uses the copy assignment operator down the line I guess. The options I see here are either disable that test on Windows from compiling or removing it entirely, or writing it in a way that doesn't use std::transform, although it seems to me that the entire purpose of the test is to test the behavior of std::transform. Alternatively, we could add the copy/move assignment operators back but imho that would also require switching from using references to only using pointers since references can't be re-assigned after initialization.

Here's the test causing the error:

TEST(TestTensorView, StdTransform)
{
Tensor<float> tensor({4});
TensorView<float> view(tensor);
// Fill with initial values
std::iota(view.begin(), view.end(), 1.0f);
// Double all values
std::transform(view.begin(), view.end(), view.begin(), [](float val) { return val * 2.0f; });
// Verify
int idx = 0;
for(float& value : view)
{
EXPECT_FLOAT_EQ(value, static_cast<float>((idx + 1) * 2));
++idx;
}
}

I think we likely need to re-add a proper version of the operators that were removed.
When we get a chance we can take a look and resolve this before merging.

Thanks for the contribution, and highlighting this issue!

@umarinkovic
Copy link
Copy Markdown
Contributor Author

I think we likely need to re-add a proper version of the operators that were removed. When we get a chance we can take a look and resolve this before merging.

Thanks for the contribution, and highlighting this issue!

No problem. If you'd like I can address that in this PR.

@BrianHarrisonAMD
Copy link
Copy Markdown
Contributor

I think we likely need to re-add a proper version of the operators that were removed. When we get a chance we can take a look and resolve this before merging.
Thanks for the contribution, and highlighting this issue!

No problem. If you'd like I can address that in this PR.

Sorry just saw your response.
Yea if you are willing, then that would be appreciated!

Otherwise, we can follow-up when we are free.

Thanks!

@umarinkovic umarinkovic force-pushed the enhancement/itensor_iterator_variant branch from 2628c49 to d336ecf Compare January 27, 2026 20:49
@umarinkovic
Copy link
Copy Markdown
Contributor Author

@BrianHarrisonAMD

Hi, thanks for your response. I've changed the vanilla C++ ITensor& references to std::reference_wrapper<ITensor>, which now allows for proper assignment to occur. I've also re-added the tests regarding the assignment operators. I ran the build and tests on Linux successfully, but will still need the CI to verify on Windows.

@BrianHarrisonAMD
Copy link
Copy Markdown
Contributor

BrianHarrisonAMD commented Jan 27, 2026

@BrianHarrisonAMD

Hi, thanks for your response. I've changed the vanilla C++ ITensor& references to std::reference_wrapper<ITensor>, which now allows for proper assignment to occur. I've also re-added the tests regarding the assignment operators. I ran the build and tests on Linux successfully, but will still need the CI to verify on Windows.

Updated since I know some CI changes have happened since this was posted.
I also kicked off the workflow run.
Thanks really appreciate the follow-up, and contribution!

If Windows CI is looking good, then I think we are good to go.

@umarinkovic
Copy link
Copy Markdown
Contributor Author

@BrianHarrisonAMD

Hi, from what I see looks like CI passed. Is this ready for merging?

Copy link
Copy Markdown
Contributor

@adickin-amd adickin-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the latest changes, this looks good to me!

Copy link
Copy Markdown
Contributor

@BrianHarrisonAMD BrianHarrisonAMD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes, and updates!

LGTM.

@BrianHarrisonAMD BrianHarrisonAMD merged commit 5c38133 into ROCm:develop Jan 30, 2026
20 checks passed
@umarinkovic
Copy link
Copy Markdown
Contributor Author

umarinkovic commented Jan 30, 2026 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external contribution Code contribution from users community.. project: hipdnn

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants