Skip to content

Add unit tests for CustomSchedule.py and refactors for scale out of schedules#2554

Closed
talumbau wants to merge 5 commits into
users/common/cvs_devfrom
users/talumbau/make_tests_pass
Closed

Add unit tests for CustomSchedule.py and refactors for scale out of schedules#2554
talumbau wants to merge 5 commits into
users/common/cvs_devfrom
users/talumbau/make_tests_pass

Conversation

@talumbau
Copy link
Copy Markdown
Contributor

@talumbau talumbau commented Nov 8, 2025

Motivation

We are about to land a significant number of custom schedules for various tile sizes and transpose orientations. By using the convention "a schedule is described in a function" we can reasonably scale the code out, instead of a chain of if/elif blocks that spans hundreds or thousands of lines. Initially we keep the existing schedules in functions inside CustomSchedule.py, but we can refactor as needed.

Technical Details

Adds basic unit tests that follow pytest conventions.

Test Plan

py.test Tests/unit/test_CustomSchedule.py

Test Result

pass

Submission Checklist

from the returned value in matrixInstructionToMIParameters but this test
was never updated. Removing from the test allows the test to pass again.
- Functions are a scalable way to organize custom schedule
  implementations. Takes all existing schedule implementations and puts
  each in its own function. Over time, we can factor out the functions
  to a separate module, if desired.
@talumbau talumbau requested a review from a team as a code owner November 8, 2025 16:16
@msujon-AMD msujon-AMD requested a review from b-shi November 8, 2025 16:22
@msujon-AMD
Copy link
Copy Markdown
Collaborator

Separating the tiles in custom schedule is a good idea, but can we use existing tox testing framework for CMS kernels? We will add CI tests on this branch soon. We can just add parameters (e.g., tiles, sizes, etc.) in this file: tensilelite/Tensile/Tests/common/gemm/gfx950/custom_mainloop_scheduling.yaml. If it becomes too large, we can have a directory and add the tests in separate files as well.

@msujon-AMD
Copy link
Copy Markdown
Collaborator

Here is the command to run tox tests locally:
tox -v --workdir /tmp/.tensile-tox -e py3 -- Tensile/Tests -m gfx950
or you can try this:
pytest-3 ./Tensile/Tests/ -v --durations=0 -m "gfx950"

@talumbau
Copy link
Copy Markdown
Contributor Author

talumbau commented Nov 8, 2025

Thanks Majed. I think the custom_mainloop_scheduling.yaml file is essential in order to be able to commit the code. In that file, we can force the code generation of a particular MT size and then put UseCustomMainLoopScheduling = [1] so that you will get an error if the code generator fails. We definitely need that. Here is my experience using tox for this case:

First, sorry for my ignorance but I don't understand how the testing can work as-is when the tox.ini specifies a py35 and py36 environment, but rocisa requires at least 3.8. So I added an appropriate Python 3.9 environment for tox to use.

Then I was ready to run tox and ran as follows based on the README (focusing just on the CMS testing yaml):

TENSILELITE_CLIENT_ARGS="--build-type Debug --gpu-targets gfx950 --clean" tox -e py39 -- Tensile/Tests -k custom_mainloop_scheduling.yaml

The result was:

============= 1 passed, 60 warnings in 809.55s (0:13:29) ==========

So it was 13 minutes to codegen everything and verify we are good. The Python unit test takes about 3 or 4 seconds. I think the unit tests are complementary to the more comprehensive testing with tox. The unit test allows us to refactor Python code and make sure we have a basic "sanity check" that we can execute quickly as we make changes. The tox tests would be a final "I think I've got this right. Let's make sure codegen passes and get the PR up for review".

Do you see what I mean?

@msujon-AMD
Copy link
Copy Markdown
Collaborator

msujon-AMD commented Nov 8, 2025

Thanks Majed. I think the custom_mainloop_scheduling.yaml file is essential in order to be able to commit the code. In that file, we can force the code generation of a particular MT size and then put UseCustomMainLoopScheduling = [1] so that you will get an error if the code generator fails. We definitely need that. Here is my experience using tox for this case:

First, sorry for my ignorance but I don't understand how the testing can work as-is when the tox.ini specifies a py35 and py36 environment, but rocisa requires at least 3.8. So I added an appropriate Python 3.9 environment for tox to use.

Then I was ready to run tox and ran as follows based on the README (focusing just on the CMS testing yaml):

TENSILELITE_CLIENT_ARGS="--build-type Debug --gpu-targets gfx950 --clean" tox -e py39 -- Tensile/Tests -k custom_mainloop_scheduling.yaml

The result was:

============= 1 passed, 60 warnings in 809.55s (0:13:29) ==========

So it was 13 minutes to codegen everything and verify we are good. The Python unit test takes about 3 or 4 seconds. I think the unit tests are complementary to the more comprehensive testing with tox. The unit test allows us to refactor Python code and make sure we have a basic "sanity check" that we can execute quickly as we make changes. The tox tests would be a final "I think I've got this right. Let's make sure codegen passes and get the PR up for review".

Do you see what I mean?

We can have different yaml files for each tiles under gfx950/cms/ and run only that yaml files for development. I am wondering if we have different tests in different places, it may complicate the testing for developers. What do you think?

@msujon-AMD msujon-AMD closed this Nov 8, 2025
@msujon-AMD msujon-AMD reopened this Nov 8, 2025
@msujon-AMD msujon-AMD added the gfx950 run CI on gfx950 label Nov 8, 2025
- Using the tox environment "unit", you can quickly call unit tests once
  you have a build of tensile-client already available.

- Add documentation to the README advertising this fact.
@talumbau talumbau requested a review from a team as a code owner November 8, 2025 23:40
@talumbau
Copy link
Copy Markdown
Contributor Author

talumbau commented Nov 8, 2025

We can have different yaml files for each tiles under gfx950/cms/ and run only that yaml files for development. I am wondering if we have different tests in different places, it may complicate the testing for developers. What do you think?

I definitely understand! I agree we should reduce the possibility of confusion where we can. I updated the tox.ini to add an additional target for just unit testing and I added a line to README to describe how it works. So the user can call:

tox -e py3 -- Tensile/Tests -m common

and then follow that up with

tox -e unit -- Tensile/Tests/unit

if they are interested in that testing path. PTAL!

return module, numCodePath


def _get_schedule_256x256x64_16bit(kernel, isNN, isNT, isTT, isTN, useLDSTr, TLDS):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer not to have parameters for each of isNN, ..., isTN and just infer them from kernel via

    transA = kernel["ProblemType"]["TransposeA"]
    transB = kernel["ProblemType"]["TransposeB"]
    isNN = transA == False and transB == False
    isNT = transA == False and transB == True
    isTT = transA == True and transB == True
    isTN = transA == True and transB == False

or we could have a shared helper function which returns these values if we want to avoid duplicating code.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes good call. I added the helpers and used them to make a consistent calling convention. PTAL.

 - use helpers for _get_schedule... functions to make a consistent
   calling convention.
@talumbau
Copy link
Copy Markdown
Contributor Author

Converted this to draft for now and split the commits as follows:

CustomSchedule.py changes only - #2574

Unit testing - #2575

@talumbau
Copy link
Copy Markdown
Contributor Author

Closing. CustomSchedule.py was added in #2574. Unit test enablement was originally scheduled in #2575 but is now closed in favor of #2765. Unit test for CustomSchedule.py is still in #2575 but will have to be added in a follow up.

@talumbau talumbau closed this Nov 21, 2025
ammallya pushed a commit that referenced this pull request Feb 3, 2026
`#include <bit>` led a building failure on RHEL8.

`<bit>` is C++20 header file. It is not supported on RHEL8.

[ROCm/composable_kernel commit: 1d89415]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants