Skip to content

[hipblaslt] Custom mainloop schedule (CMS) validator scaffolding #2840

Merged
newling merged 9 commits into
ROCm:hipblaslt_common_cms_devfrom
newling:hipblaslt_cms_validator
Nov 26, 2025
Merged

[hipblaslt] Custom mainloop schedule (CMS) validator scaffolding #2840
newling merged 9 commits into
ROCm:hipblaslt_common_cms_devfrom
newling:hipblaslt_cms_validator

Conversation

@newling
Copy link
Copy Markdown
Contributor

@newling newling commented Nov 21, 2025

(top commit is the validator, others set up testing)

Motivation

This introduces scaffolding for adding rules to validate the correct of CMS (custom main loop schedules) in Tensile, that developers handcraft. The motivation is to automatically detect scheduling issues is bespoke instruction schedules. Rules will include the obvious ones like getting the order of instructions to increment the global pointer for A (GRIncA) correct, and more subtle ones like ensuring that there are no RAW and WAR race conditions as data moves (HBM -> LDS -> registers). These will be added later.

Note: A non-goal here is to create something to completely replace careful human verification of CMS schedules

Design

This PR adds basic functionality to ScheduleInfo class. Only 1 rule is added, more can be added by developers in parallel. See the internal doc for suggested rules.

By default, all custom schedules are validated before codegen. It is possible to override the validation, if a CMS writer knows their schedule is valid and the rules are flagging a false positive.

Test Plan

Initial unit tests added. Note: only the top commit(s) here are specific to validator, they are on top of other commits from @talumbau setting up test infra.

@newling newling force-pushed the hipblaslt_cms_validator branch from 106e556 to fc6bd96 Compare November 21, 2025 19:22
@newling newling changed the title Hipblaslt cms validator [hipblaslt] Custom mainloop schedule (CMS) validator scaffolding Nov 21, 2025
@newling newling marked this pull request as ready for review November 21, 2025 19:29
@newling newling requested review from a team as code owners November 21, 2025 19:29
@sebvince
Copy link
Copy Markdown
Contributor

Great idea !

@newling
Copy link
Copy Markdown
Contributor Author

newling commented Nov 24, 2025

Great idea !

Thanks to @talumbau for getting this rolling!

I'm going to start looking a rules that ensure GRA/GRB are not too early, but will leave this PR as it currently is.

Copy link
Copy Markdown
Contributor

@talumbau talumbau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this as a starting point. Hopefully we can get #2765 merged in first to make this PR feel a bit cleaner. PTAL at my suggestions and let me know what you think.

Comment thread projects/hipblaslt/tensilelite/Tensile/Components/CustomSchedule.py Outdated
Comment thread projects/hipblaslt/tensilelite/Tensile/Components/CustomSchedule.py Outdated
Comment thread projects/hipblaslt/tensilelite/Tensile/Components/CustomSchedule.py Outdated
Comment thread projects/hipblaslt/tensilelite/Tensile/Components/CustomSchedule.py Outdated
Comment thread projects/hipblaslt/tensilelite/Tensile/Components/CustomSchedule.py Outdated
@newling newling force-pushed the hipblaslt_cms_validator branch from feb37d4 to ab9c009 Compare November 25, 2025 01:36
@@ -464,8 +545,8 @@ def _get_schedule_256x192x64_16bit(kernel, useLDSTr, TLDS):
55, SWaitCnt(dscnt=6, vlcnt=-1, vscnt=-1, comment="wait for LRB0-1"),
63, SWaitCnt(dscnt=6, vlcnt=-1, vscnt=-1, comment="wait for LRB0-2"),
70, SWaitCnt(dscnt=-1, vlcnt=12, vscnt=-1, comment="for LRB1"),
71, SWaitCnt(dscnt=11, vlcnt=-1, vscnt=-1, comment="wait for LRB0-3"),
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched, verified by @jfactory07 to be valid

Copy link
Copy Markdown
Contributor

@talumbau talumbau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, LGTM. Needs to be rebased as CustomSchedule.py is in a state of flux unfortunately. Here is something up to you: I don't want to make it too easy to skip validation. So I personally would make the attribute __skipValidation__ and make skipValidation a property with a setter and getter. And then if the setter gets called, you get a big log message that says "VALIDATION SKIPPED" or something similar. I just think we should have a log to the output if someone is skipping validation . WDYT?

from typing import Dict


def verifyAscendingOrder(scheduleInfo, context: Dict = {}):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I like the free-standing functions here as a convention. Once we have a collection we can move to its own module.

@newling newling force-pushed the hipblaslt_cms_validator branch from ab9c009 to a195df7 Compare November 25, 2025 20:17
@newling
Copy link
Copy Markdown
Contributor Author

newling commented Nov 25, 2025

Overall, LGTM. Needs to be rebased as CustomSchedule.py is in a state of flux unfortunately. Here is something up to you: I don't want to make it too easy to skip validation. So I personally would make the attribute __skipValidation__ and make skipValidation a property with a setter and getter. And then if the setter gets called, you get a big log message that says "VALIDATION SKIPPED" or something similar. I just think we should have a log to the output if someone is skipping validation . WDYT?

Good idea. I've made it a bit trickier to skip validation, along these lines.

talumbau and others added 9 commits November 25, 2025 22:38
from the returned value in matrixInstructionToMIParameters but this test
was never updated. Removing from the test allows the test to pass again.
- Using the tox environment "unit", you can quickly call unit tests once
  you have a build of tensile-client already available.

- Add documentation to the README advertising this fact.
Signed-off-by: James Newling <james.newling@gmail.com>
Signed-off-by: James Newling <james.newling@gmail.com>
Signed-off-by: James Newling <james.newling@gmail.com>
@newling newling force-pushed the hipblaslt_cms_validator branch from 7633b69 to cddb483 Compare November 26, 2025 04:38
@newling newling merged commit c74f484 into ROCm:hipblaslt_common_cms_dev Nov 26, 2025
27 checks passed
b-shi pushed a commit that referenced this pull request Dec 12, 2025
## Motivation

This introduces scaffolding for adding rules to validate the correctness of
CMS (custom mainloop schedules) in Tensile, that developers handcraft.
The motivation is to automatically detect scheduling issues in bespoke
instruction schedules. Rules will include the obvious ones like getting
the order of instructions to increment the global pointer for A (GRIncA)
correctly, and more subtle ones like ensuring that there are no RAW and
WAR race conditions as data moves (HBM -> LDS -> registers). These will
be added later.

Note: A non-goal here is to create something to completely replace
careful human verification of CMS schedules

## Design 

This PR adds basic functionality to ScheduleInfo class. Only 1 rule is
added, more can be added by developers in parallel. See the internal doc
for suggested rules.

By default, all custom schedules are validated before codegen. It is
possible to override the validation, if a CMS writer knows their
schedule is valid and the rules are flagging a false positive.

## Test Plan

Initial unit tests added. Note: only the top commit(s) here are specific
to validator, they are on top of other commits from @talumbau setting up
test infra. Note, unit tests are not currently enabled on any branch, soon 
will be. 

## To run locally

 tox -e unit -- Tensile/Tests/unit/test_CustomSchedule.py


---------

Signed-off-by: James Newling <james.newling@gmail.com>
Co-authored-by: T.J. Alumbaugh <talumbau@amd.com>
ammallya pushed a commit that referenced this pull request Feb 3, 2026
-added parameter to change group count for grouped_gemm examples.

Signed-off-by: Michal Kulikowski <Michal.Kulikowski@amd.com>

[ROCm/composable_kernel commit: 5c4f52a]
@newling newling deleted the hipblaslt_cms_validator branch April 3, 2026 21:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants