[enhancement] differentiating strided vs packed tensors when iterating by umarinkovic · Pull Request #3710 · ROCm/rocm-libraries

umarinkovic · 2026-01-08T15:01:51Z

Motivation

Optimizing the tensor filling functions started a discussion about optimizing tensor iteration in general:
#3471 (comment)

Technical Details

After some deliberation, the approach taken here (using std::variant inside the iterator to represent the different types of indexing) reflects both the desire the improve iteration in the case of packed tensors while also maintaining the existing API.

A fully templated approach would be more optimal but would require API changes to the ITensor class itself, whether making it templated or changing the definition of its iterator-related methods at the very least.

Test Plan

Ran ninja check inside the build folder of hipDNN.

Test Result

[185/187] Running all tests via ctest
Test project /therock/output/build/ml-libs/hipDNN/build
    Start 1: hipdnn_data_sdk_tests
1/7 Test #1: hipdnn_data_sdk_tests ............   Passed    1.30 sec
    Start 2: hipdnn_backend_tests
2/7 Test #2: hipdnn_backend_tests .............   Passed    1.29 sec
    Start 3: hipdnn_frontend_tests
3/7 Test #3: hipdnn_frontend_tests ............   Passed    0.03 sec
    Start 4: hipdnn_test_sdk_tests
4/7 Test #4: hipdnn_test_sdk_tests ............   Passed    4.32 sec
    Start 5: hipdnn_plugin_sdk_tests
5/7 Test #5: hipdnn_plugin_sdk_tests ..........   Passed    0.03 sec
    Start 6: public_hipdnn_backend_tests
6/7 Test #6: public_hipdnn_backend_tests ......   Passed    0.33 sec
    Start 7: public_hipdnn_frontend_tests
7/7 Test #7: public_hipdnn_frontend_tests .....   Passed    0.26 sec

100% tests passed, 0 tests failed out of 7

Label Time Summary:
integration_test    =   0.59 sec*proc (2 tests)
unit_test           =   6.96 sec*proc (5 tests)

Total Test time (real) =   7.56 sec

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

umarinkovic · 2026-01-08T15:23:25Z

Some notes:

Using std::variant trims one branch in the case of incrementing a packed tensor and adds one branch in the case of a non-packed tensor. It also adds a branch to the dereferencing of the iterators. In general the branches added by std::variant are predictable but so were the trimmed branches so it seems like a net-zero at first glance. The real benefit is that packed tensors don't compute their index over and over again when dereferencing the iterator, which is one inner product of indices and strides per iteration.

The other approach of making the iterator a templated class with the IsPacked parameter would require the API to change in a non-trivial way, either modifying ITensor to also be a templated class which imo seems counterintuitive, or at the very least getting rid of its virtual begin(), end() etc. methods and shadowing them from more concrete implementations. This would be marginally better for performance but it would require ITensor to use the most universal iterator which is the current implementation. So the optimization would be available to those who explicitly ask for it but anyone just using ITensor to iterate would be stuck with the unoptimized version.

Side question, what is the purpose of overriding assignment operators in the ITensorIterator class? Since the class holds a reference which cannot be reassigned, it effectively calls the copy-assignment operator of the underlying tensor. From my POV it seems kind of dangerous, no? Although I'm not entirely familiar with the logic behind it so I might be mistaken. I noticed that because that overridden operator forced me to use tensor pointers instead of tensor references inside my index structs.

adickin-amd · 2026-01-08T17:06:42Z

Side question, what is the purpose of overriding assignment operators in the ITensorIterator class? Since the class holds a reference which cannot be reassigned, it effectively calls the copy-assignment operator of the underlying tensor. From my POV it seems kind of dangerous, no? Although I'm not entirely familiar with the logic behind it so I might be mistaken. I noticed that because that overridden operator forced me to use tensor pointers instead of tensor references inside my index structs.

Great observation, we don't need those assignment operators overridden and they can be removed. I went through several development iterations trying to understand how to build the iterator well, its possible at some point they were valid. Thank you for catching this. Do you want to remove it as part of this PR or shall I open up a new issue to deal with this later?

umarinkovic · 2026-01-08T17:10:33Z

Do you want to remove it as part of this PR or shall I open up a new issue to deal with this later?

I can address it in this PR, I think it also makes for a cleaner implementation of this solution if references can be used in place of pointers.

Copilot

Pull request overview

This PR optimizes tensor iteration by differentiating between packed (contiguous memory) and strided (non-contiguous) tensors. The optimization uses std::variant to represent different indexing strategies while maintaining the existing API.

Changes:

Refactored ITensorIterator to use std::variant containing either LinearIndex (for packed tensors) or CompositeIndex (for strided tensors)
Simplified fillWithValue() and fillRandom() methods to use the new iterator implementation instead of conditional logic
Removed copy assignment and move constructor tests that are no longer applicable
Updated and added tests to validate both packed and strided tensor iteration paths

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
projects/hipdnn/data_sdk/include/hipdnn_data_sdk/utilities/Tensor.hpp	Refactored iterator to use variant-based indexing and simplified tensor filling methods
projects/hipdnn/data_sdk/tests/utilities/TestTensorIterator.cpp	Removed obsolete tests and added new tests for linear and composite index access patterns
projects/hipdnn/data_sdk/tests/utilities/TestTensorView.cpp	Removed copy assignment test that is no longer valid

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

adickin-amd · 2026-01-09T22:43:00Z

I did some debugging of why the linux builds failed and remembered that windows has all of our clang-tidy checks disabled during the build since we had major compilation time increases if we didn't. We should have added a warning to mention this so I will log an issue for this.

In order to fix the build you'll need to build with -DENABLE_CLANG_TIDY=ON so you can see the build errors. If you encounter issues getting the errors fixed, ill be happy to help you out.

umarinkovic · 2026-01-12T11:42:01Z

I did some debugging of why the linux builds failed and remembered that windows has all of our clang-tidy checks disabled during the build since we had major compilation time increases if we didn't. We should have added a warning to mention this so I will log an issue for this.

In order to fix the build you'll need to build with -DENABLE_CLANG_TIDY=ON so you can see the build errors. If you encounter issues getting the errors fixed, ill be happy to help you out.

Hi, I ran clang-tidy during the build. However, at least on my end I see a bunch of errors unrelated to the changes I made in this PR. Even running clang-tidy on just these files produces an error in Tensor.hpp with regards to the #pragma directive. That being said, I did identify all the clang-tidy warnings/errors caused by my changes and fixed them.

adickin-amd

This is a great optimization. Thank you for your contribution. LGTM

adickin-amd · 2026-01-12T23:24:11Z

I did some debugging of why the linux builds failed and remembered that windows has all of our clang-tidy checks disabled during the build since we had major compilation time increases if we didn't. We should have added a warning to mention this so I will log an issue for this.
In order to fix the build you'll need to build with -DENABLE_CLANG_TIDY=ON so you can see the build errors. If you encounter issues getting the errors fixed, ill be happy to help you out.

Hi, I ran clang-tidy during the build. However, at least on my end I see a bunch of errors unrelated to the changes I made in this PR. Even running clang-tidy on just these files produces an error in Tensor.hpp with regards to the #pragma directive. That being said, I did identify all the clang-tidy warnings/errors caused by my changes and fixed them.

Thank you for fixing the tidy errors despite the additional tidy warnings you got This is likely due to a clang-tidy version mismatch, we are on version 20 and newer versions have warnings for the pragma directives. I have an incoming PR which will exclude these new clang-tidy warnings so we can use newer versions without issues.

adickin-amd · 2026-01-13T17:02:11Z

Looks like windows doesnt like the deleted assignment operators on the CI. Attached is the relevant logs. Ill look to debug it when I can.

2026-01-12T23:53:25.7935554Z [hipDNN] ccache B:\build\core\clr\dist\lib\llvm\bin\clang++.exe -DCOMPONENT_NAME=\"hipdnn_frontend\" -DFMT_HEADER_ONLY -DSPDLOG_FMT_EXTERNAL -DUSE_PROF_API=1 -D__HIPCC__ -D__HIP_PLATFORM_AMD__ -D__HIP_PLATFORM_AMD__=1 -IC:/home/runner/_work/rocm-libraries/rocm-libraries/projects/hipdnn/data_sdk/include -IC:/home/runner/_work/rocm-libraries/rocm-libraries/projects/hipdnn/test_sdk/include -IC:/home/runner/_work/rocm-libraries/rocm-libraries/projects/hipdnn/frontend/include -IC:/home/runner/_work/rocm-libraries/rocm-libraries/projects/hipdnn/backend/include -IB:/build/ml-libs/hipDNN/build/backend/include -isystem B:/build/third-party/googletest/dist/include -isystem B:/build/third-party/flatbuffers/dist/include -isystem B:/build/third-party/spdlog/dist/include -isystem B:/build/third-party/nlohmann-json/dist/include -isystem B:/build/third-party/fmt/dist/include -isystem B:/build/core/clr/dist/include -DWIN32 -DWIN32_LEAN_AND_MEAN -D_CRT_SECURE_NO_WARNINGS -DNOMINMAX -fms-extensions -fms-compatibility -D_ENABLE_EXTENDED_ALIGNED_STORAGE  -Wno-documentation-unknown-command -Wno-documentation-pedantic -Wno-unused-command-line-argument -Wno-explicit-specialization-storage-class -Wno-ignored-attributes -Wno-unknown-attributes -Wno-duplicate-decl-specifier --hip-path=B:/build/core/clr/dist --hip-device-lib-path=B:/build/core/clr/dist/lib/llvm/amdgcn/bitcode -O3 -DNDEBUG -std=gnu++17 -D_DLL -D_MT -Xclang --dependent-lib=msvcrt -Werror -Wall -Wextra -Wpedantic -Wshadow -Wnon-virtual-dtor -Wold-style-cast -Wcast-align -Woverloaded-virtual -Wconversion -Wsign-conversion -Wnull-dereference -Wdouble-promotion -Wformat=2 -Winit-self -Wunreachable-code -Wno-error=unused-command-line-argument -Wswitch-default -MD -MT data_sdk/tests/CMakeFiles/hipdnn_data_sdk_tests.dir/utilities/TestTensorView.cpp.obj -MF data_sdk\tests\CMakeFiles\hipdnn_data_sdk_tests.dir\utilities\TestTensorView.cpp.obj.d -o data_sdk/tests/CMakeFiles/hipdnn_data_sdk_tests.dir/utilities/TestTensorView.cpp.obj -c C:/home/runner/_work/rocm-libraries/rocm-libraries/projects/hipdnn/data_sdk/tests/utilities/TestTensorView.cpp
2026-01-12T23:53:25.7945806Z [hipDNN] In file included from C:/home/runner/_work/rocm-libraries/rocm-libraries/projects/hipdnn/data_sdk/tests/utilities/TestTensorView.cpp:4:
2026-01-12T23:53:25.7946774Z [hipDNN] In file included from B:/build/third-party/googletest/dist/include\gtest/gtest.h:55:
2026-01-12T23:53:25.7947603Z [hipDNN] In file included from C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.44.35207\include\memory:14:
2026-01-12T23:53:25.7948576Z [hipDNN] In file included from C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.44.35207\include\xmemory:15:
2026-01-12T23:53:25.7950177Z [hipDNN] C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.44.35207\include\xutility:1481:13: error: object of type 'hipdnn_data_sdk::utilities::TensorViewIterator<float>' cannot be assigned because its copy assignment operator is implicitly deleted
2026-01-12T23:53:25.7951469Z [hipDNN]  1481 |         _It = _STD forward<_UIter>(_UIt);
2026-01-12T23:53:25.7951799Z [hipDNN]       |             ^
2026-01-12T23:53:25.7953228Z [hipDNN] C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.44.35207\include\algorithm:3803:10: note: in instantiation of function template specialization 'std::_Seek_wrapped<hipdnn_data_sdk::utilities::TensorViewIterator<float>, hipdnn_data_sdk::utilities::TensorViewIterator<float> &>' requested here
2026-01-12T23:53:25.7954750Z [hipDNN]  3803 |     _STD _Seek_wrapped(_Dest, _UDest);
2026-01-12T23:53:25.7955065Z [hipDNN]       |          ^
2026-01-12T23:53:25.7957113Z [hipDNN] C:/home/runner/_work/rocm-libraries/rocm-libraries/projects/hipdnn/data_sdk/tests/utilities/TestTensorView.cpp:513:10: note: in instantiation of function template specialization 'std::transform<hipdnn_data_sdk::utilities::TensorViewIterator<float>, hipdnn_data_sdk::utilities::TensorViewIterator<float>, (lambda at C:\home\runner\_work\rocm-libraries\rocm-libraries\projects\hipdnn\data_sdk\tests\utilities\TestTensorView.cpp:513:60)>' requested here
2026-01-12T23:53:25.7959416Z [hipDNN]   513 |     std::transform(view.begin(), view.end(), view.begin(), [](float val) { return val * 2.0f; });
2026-01-12T23:53:25.7959940Z [hipDNN]       |          ^
2026-01-12T23:53:25.7961435Z [hipDNN] C:/home/runner/_work/rocm-libraries/rocm-libraries/projects/hipdnn/data_sdk/include\hipdnn_data_sdk/utilities/TensorView.hpp:62:30: note: copy assignment operator of 'TensorViewIterator<float>' is implicitly deleted because field '_iter' has a deleted copy assignment operator
2026-01-12T23:53:25.7962800Z [hipDNN]    62 |     ITensorIterator<IsConst> _iter;
2026-01-12T23:53:25.7963132Z [hipDNN]       |                              ^
2026-01-12T23:53:25.7964046Z [hipDNN] C:/home/runner/_work/rocm-libraries/rocm-libraries/projects/hipdnn/data_sdk/include\hipdnn_data_sdk/utilities/Tensor.hpp:94:22: note: 'operator=' has been explicitly marked deleted here
2026-01-12T23:53:25.7965135Z [hipDNN]    94 |     ITensorIterator& operator=(const ITensorIterator& other) = delete;
2026-01-12T23:53:25.7965596Z [hipDNN]       |                      ^
2026-01-12T23:53:25.7965782Z 
2026-01-12T23:53:25.7965867Z [hipDNN] 1 error generated.

umarinkovic · 2026-01-13T22:31:03Z

Looks like windows doesnt like the deleted assignment operators on the CI. Attached is the relevant logs. Ill look to debug it when I can.

std::transform on Windows (MSVC) uses the copy assignment operator down the line I guess. The options I see here are either disable that test on Windows from compiling or removing it entirely, or writing it in a way that doesn't use std::transform, although it seems to me that the entire purpose of the test is to test the behavior of std::transform. Alternatively, we could add the copy/move assignment operators back but imho that would also require switching from using references to only using pointers since references can't be re-assigned after initialization.

Here's the test causing the error:

rocm-libraries/projects/hipdnn/data_sdk/tests/utilities/TestTensorView.cpp

Lines 520 to 538 in e076ed7

    
           TEST(TestTensorView, StdTransform) 
        
           { 
        
               Tensor<float> tensor({4}); 
        
               TensorView<float> view(tensor); 
        
               // Fill with initial values 
        
               std::iota(view.begin(), view.end(), 1.0f); 
        
               // Double all values 
        
               std::transform(view.begin(), view.end(), view.begin(), [](float val) { return val * 2.0f; }); 
        
               // Verify 
        
               int idx = 0; 
        
               for(float& value : view) 
        
               { 
        
                   EXPECT_FLOAT_EQ(value, static_cast<float>((idx + 1) * 2)); 
        
                   ++idx; 
        
               } 
        
           }

BrianHarrisonAMD · 2026-01-14T20:13:02Z

Looks like windows doesnt like the deleted assignment operators on the CI. Attached is the relevant logs. Ill look to debug it when I can.

std::transform on Windows (MSVC) uses the copy assignment operator down the line I guess. The options I see here are either disable that test on Windows from compiling or removing it entirely, or writing it in a way that doesn't use std::transform, although it seems to me that the entire purpose of the test is to test the behavior of std::transform. Alternatively, we could add the copy/move assignment operators back but imho that would also require switching from using references to only using pointers since references can't be re-assigned after initialization.

Here's the test causing the error:

rocm-libraries/projects/hipdnn/data_sdk/tests/utilities/TestTensorView.cpp

Lines 520 to 538 in e076ed7

TEST(TestTensorView, StdTransform)

{

Tensor<float> tensor({4});

TensorView<float> view(tensor);

// Fill with initial values

std::iota(view.begin(), view.end(), 1.0f);

// Double all values

std::transform(view.begin(), view.end(), view.begin(), [](float val) { return val * 2.0f; });

// Verify

int idx = 0;

for(float& value : view)

{

EXPECT_FLOAT_EQ(value, static_cast<float>((idx + 1) * 2));

++idx;

}

}

I think we likely need to re-add a proper version of the operators that were removed.
When we get a chance we can take a look and resolve this before merging.

Thanks for the contribution, and highlighting this issue!

umarinkovic · 2026-01-16T15:57:33Z

I think we likely need to re-add a proper version of the operators that were removed. When we get a chance we can take a look and resolve this before merging.

Thanks for the contribution, and highlighting this issue!

No problem. If you'd like I can address that in this PR.

BrianHarrisonAMD · 2026-01-27T00:45:21Z

I think we likely need to re-add a proper version of the operators that were removed. When we get a chance we can take a look and resolve this before merging.
Thanks for the contribution, and highlighting this issue!

No problem. If you'd like I can address that in this PR.

Sorry just saw your response.
Yea if you are willing, then that would be appreciated!

Otherwise, we can follow-up when we are free.

Thanks!

…rom ITensorITerator

…ead of pointer

…ITensorIterator and its Index types

umarinkovic · 2026-01-27T21:19:13Z

@BrianHarrisonAMD

Hi, thanks for your response. I've changed the vanilla C++ ITensor& references to std::reference_wrapper<ITensor>, which now allows for proper assignment to occur. I've also re-added the tests regarding the assignment operators. I ran the build and tests on Linux successfully, but will still need the CI to verify on Windows.

BrianHarrisonAMD · 2026-01-27T23:58:43Z

@BrianHarrisonAMD

Hi, thanks for your response. I've changed the vanilla C++ ITensor& references to std::reference_wrapper<ITensor>, which now allows for proper assignment to occur. I've also re-added the tests regarding the assignment operators. I ran the build and tests on Linux successfully, but will still need the CI to verify on Windows.

Updated since I know some CI changes have happened since this was posted.
I also kicked off the workflow run.
Thanks really appreciate the follow-up, and contribution!

If Windows CI is looking good, then I think we are good to go.

umarinkovic · 2026-01-30T15:21:43Z

@BrianHarrisonAMD

Hi, from what I see looks like CI passed. Is this ready for merging?

adickin-amd

Reviewed the latest changes, this looks good to me!

BrianHarrisonAMD

Thanks for the fixes, and updates!

LGTM.

umarinkovic · 2026-01-30T17:12:41Z

[laugh] Uros Marinkovic reacted to your message:

…

________________________________ From: BrianHarrisonAMD ***@***.***> Sent: Friday, January 30, 2026 5:12:05 PM To: ROCm/rocm-libraries ***@***.***> Cc: Uros Marinkovic ***@***.***>; Author ***@***.***> Subject: Re: [ROCm/rocm-libraries] [enhancement] differentiating strided vs packed tensors when iterating (PR #3710) CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. @BrianHarrisonAMD approved this pull request. Thanks for the fixes, and updates! LGTM. — Reply to this email directly, view it on GitHub<#3710 (review)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BU7IDG2ESCWOV5CDCQ4OXLL4JOGGLAVCNFSM6AAAAACRCNDWQSVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTOMRZGQ3TOOJUGY>. You are receiving this because you authored the thread.Message ID: ***@***.***>

umarinkovic requested a review from a team as a code owner January 8, 2026 15:01

github-actions Bot added the project: hipdnn label Jan 8, 2026

assistant-librarian Bot added the external contribution Code contribution from users community.. label Jan 8, 2026

adickin-amd requested a review from Copilot January 9, 2026 15:54

Copilot AI reviewed Jan 9, 2026

View reviewed changes

Comment thread projects/hipdnn/data_sdk/include/hipdnn_data_sdk/utilities/Tensor.hpp

Comment thread projects/hipdnn/data_sdk/include/hipdnn_data_sdk/utilities/Tensor.hpp Outdated

Comment thread projects/hipdnn/data_sdk/include/hipdnn_data_sdk/utilities/Tensor.hpp

adickin-amd reviewed Jan 9, 2026

View reviewed changes

Comment thread projects/hipdnn/data_sdk/include/hipdnn_data_sdk/utilities/Tensor.hpp

umarinkovic force-pushed the enhancement/itensor_iterator_variant branch from 996c4cd to d668444 Compare January 12, 2026 11:42

adickin-amd approved these changes Jan 12, 2026

View reviewed changes

umarinkovic added 7 commits January 27, 2026 09:51

Differentiated strided vs packed tensor iterators by using std::variant

0c5e3d6

Added differentiated tests for linear/strided index access

3bfadb9

Delete copy/move assignment operators and their corresponding tests f…

3c40a4f

…rom ITensorITerator

Changed LinearIndex and CompositeIndex to use a tensor reference inst…

84d45ff

…ead of pointer

Addressing clang-tidy errors

6f30e08

Changed TensorType in ITensorIterator to be a std::reference_wrapper

9cea976

Added default copy and move assignment operators and constructors to …

d336ecf

…ITensorIterator and its Index types

umarinkovic force-pushed the enhancement/itensor_iterator_variant branch from 2628c49 to d336ecf Compare January 27, 2026 20:49

Adding assignment operator tests that were removed

fee51fe

Merge branch 'develop' into enhancement/itensor_iterator_variant

5a65abd

adickin-amd approved these changes Jan 30, 2026

View reviewed changes

BrianHarrisonAMD approved these changes Jan 30, 2026

View reviewed changes

BrianHarrisonAMD merged commit 5c38133 into ROCm:develop Jan 30, 2026
20 checks passed

Conversation

umarinkovic commented Jan 8, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

umarinkovic commented Jan 8, 2026

Uh oh!

adickin-amd commented Jan 8, 2026

Uh oh!

umarinkovic commented Jan 8, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adickin-amd commented Jan 9, 2026

Uh oh!

umarinkovic commented Jan 12, 2026

Uh oh!

adickin-amd left a comment

Choose a reason for hiding this comment

Uh oh!

adickin-amd commented Jan 12, 2026

Uh oh!

adickin-amd commented Jan 13, 2026

Uh oh!

umarinkovic commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BrianHarrisonAMD commented Jan 14, 2026

Uh oh!

umarinkovic commented Jan 16, 2026

Uh oh!

BrianHarrisonAMD commented Jan 27, 2026

Uh oh!

umarinkovic commented Jan 27, 2026

Uh oh!

BrianHarrisonAMD commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

umarinkovic commented Jan 30, 2026

Uh oh!

adickin-amd left a comment

Choose a reason for hiding this comment

Uh oh!

BrianHarrisonAMD left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

umarinkovic commented Jan 30, 2026 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

umarinkovic commented Jan 13, 2026 •

edited

Loading

BrianHarrisonAMD commented Jan 27, 2026 •

edited

Loading