Add validation tests for `amd-smi` CLI output by HRISHIKESHTHULA-AMD · Pull Request #3805 · ROCm/TheRock

HRISHIKESHTHULA-AMD · 2026-03-06T07:31:18Z

Motivation

Adding amd-smi list test

Technical Details

amd-smi list with different options are tested

Test Plan

--json, --csv, --file and no option is being tested

Test Result

6 tests passed. https://github.com/ROCm/TheRock/actions/runs/22840058512

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Copilot

Pull request overview

Adds a new CI test to validate amd-smi list output across human/JSON/CSV modes (including --file output), and wires it into the GitHub Actions test matrix so it runs on Linux.

Changes:

Add test_amdsmi_cli.py pytest suite to validate required GPU fields in amd-smi list output across multiple output formats.
Register a new amdsmi_cli job in the GitHub Actions test configuration matrix to execute the new pytest test on Linux.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
build_tools/github_actions/test_executable_scripts/test_amdsmi_cli.py	New pytest-based end-to-end validation for `amd-smi list` output modes (stdout/file, human/JSON/CSV).
build_tools/github_actions/fetch_test_configurations.py	Adds new `amdsmi_cli` test job entry to CI test matrix.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-06T07:36:38Z

+    th = os.getenv("THEROCK_BIN_DIR")
+    if not th:
+        pytest.skip("THEROCK_BIN_DIR not set; skipping amdsmi tests")
+    p = Path(th) / "amd-smi"


THEROCK_BIN_DIR can be a relative path in CI (e.g. ./build/bin). Using Path(th) / 'amd-smi' without resolve() makes this test depend on the current working directory; running the test from a different cwd can fail to find the binary. Consider resolving THEROCK_BIN_DIR (and/or constructing THEROCK_DIR like other test scripts) before checking existence.

Suggested change

p = Path(th) / "amd-smi"

bin_dir = Path(th).resolve()

p = bin_dir / "amd-smi"

Copilot · 2026-03-06T07:36:38Z

+    cmd = [str(amd_smi), "list"] + args
+    proc = subprocess.run(cmd, capture_output=True, text=True)
+    return proc.returncode, proc.stdout, proc.stderr


_run_amd_smi runs the binary without setting cwd (and without explicitly propagating any environment tweaks). Since CI commonly sets THEROCK_BIN_DIR as a relative path, calling amd-smi from a different working directory can break. Recommend running with an explicit cwd (e.g. repo root like other scripts) and/or resolving the binary path to an absolute path before invoking subprocess.

Copilot · 2026-03-06T07:36:38Z

            "windows": 1,
        },
    },
+


There is trailing whitespace on the blank line before the new amdsmi_cli entry. Please remove it to avoid lint/noise in future diffs.

Suggested change

…thula/smi_test

sriasapu · 2026-03-06T08:25:46Z

+    th = os.getenv("THEROCK_BIN_DIR")
+    if not th:
+        pytest.skip("THEROCK_BIN_DIR not set; skipping amdsmi tests")
+    p = Path(th) / "amd-smi"


Can we use meaningful names instead of short names

sriasapu · 2026-03-06T08:43:19Z

+    return blocks
+
+
+def _validate_human_block(block_text: str) -> list[str]:


_validate_human_block??
human block is misleading, instead can we use _validate_gpu_block() ?? or something

sriasapu · 2026-03-06T08:49:45Z

It will be good if we add the PR description as well

…larity

…thula/smi_test

geomin12 · 2026-03-09T17:07:58Z

identical to this?

TheRock/tests/test_rocm_sanity.py

Line 167 in af5cb3a

def test_amdsmi_suite(self):

although, i am fine with this being separate as it can be used in other repos
cc: @jayhawk-commits

I checked the script you mentioned, that's lib testing of amdsmi, this is for cli testing.

meant to link this one:

TheRock/tests/test_rocm_sanity.py

Line 167 in 0478f9e

def test_amdsmi_suite(self):

even so, i would think CLI testing is a sanity check and should be added there

geomin12 · 2026-03-09T17:08:24Z

+    Skips the test via pytest if `THEROCK_BIN_DIR` is not set. Asserts that
+    the expected `amd-smi` binary exists at the resolved path.
+
+    Args:
+        None
+
+    Returns:
+        pathlib.Path: Path to the `amd-smi` binary.


looks like claude-generated docs. i think this function is self explanatory and can remove these comments

geomin12 · 2026-03-09T17:09:58Z

+
+    The function invokes the binary via subprocess.run and captures text
+    output for assertions in the tests.
+
+    Args:
+        amd_smi_path (pathlib.Path): Path to the `amd-smi` binary.
+        modifiers (list[str]): Arguments to pass after `amd-smi list`.
+
+    Returns:
+        tuple[int, str, str]: Return code, stdout text, stderr text.


same here, looks like claude generated code

this can be removed as function is self explanatory

geomin12 · 2026-03-09T17:12:37Z

    },
+    "amdsmi_cli": {
+        "job_name": "amdsmi_cli",
+        "fetch_artifact_args": "--tests",


let's add --base-only as we only need base packages

geomin12 · 2026-03-09T17:15:14Z

this might be a good opportunity to combine with amdsmi tests, since this just validates output, might as well try to utilize GPU for amdsmi tests in parallel

@HRISHIKESHTHULA-AMD we currently run amdsmi tests in https://github.com/ROCm/TheRock/blob/main/build_tools/github_actions/test_executable_scripts/test_amdsmi.py, is there an opportunity to combine? as this script will also install identical artifacts. no need for two separate jobs to do identical artifact extraction

I went through the script you mentioned, the script's description suggests "this script must be run
manually by developers inside a privileged ROCm environment or container" , so it seems to be difficult combining manually triggered script and CI triggered script. Please share your thoughts on this.

meant to link this one:

TheRock/tests/test_rocm_sanity.py

Line 167 in 0478f9e

def test_amdsmi_suite(self):

Addressed in this comment.
As recently added test_sanity.py calls different components' sanity tests, I've integrated that with amd-smi cli sanity tests .

non-sanity cli tests part of the same amd-smi cli script won't be executed by test_sanity.py, but will be executed by amdsmi_cli job.

Please review and share your thoughts.

please review latest update as per latest review comments

… and refactor test_amdsmi_cli.py for improved clarity and functionality

…thula/smi_test

…in conftest.py

…workflow split)

…utput

geomin12 · 2026-03-14T04:11:48Z

not sure this is needed?

geomin12 · 2026-03-14T04:12:31Z

+    "amdsmi_cli": {
+        "job_name": "amdsmi_cli",
+        "fetch_artifact_args": "--base-only",
+        "timeout_minutes": 15,
+        "test_script": "pytest tests/test_amdsmi_cli.py -m not_sanity -o log_cli=true --log-cli-level=INFO",
+        "platform": ["linux"],
+        "total_shards_dict": {
+            "linux": 1,
+        },
+    },


this can be removed! as now sanity checks test this, we don't need to run these tests twice

geomin12 · 2026-03-14T04:13:52Z

+def _run_pytest(
+    cmd: list[str], *, cwd: Path, env: dict[str, str], check: bool
+) -> subprocess.CompletedProcess[str]:
+    logging.info("++ Exec [%s]$ %s", cwd, " ".join(cmd))
+    return subprocess.run(cmd, cwd=cwd, env=env, check=check, text=True)
+
+


this can be removed. i would consider checking amdsmi_cli as a "sanity check". if amdsmi failed, i would see bigger problems

geomin12 · 2026-03-14T04:13:56Z

+# Default sanity behavior: run everything except tests marked as not_sanity.
+phase_cmd = cmd + ["-m", "not not_sanity"]
+_run_pytest(phase_cmd, cwd=THEROCK_DIR, env=env, check=True)


this can be removed. i would consider checking amdsmi_cli as a "sanity check". if amdsmi failed, i would see bigger problems

geomin12 · 2026-03-14T04:14:15Z

+    assert gpu_blocks, "No GPU blocks found in amd-smi output"
+
+
+@pytest.mark.not_sanity


this can be removed. i would consider checking amdsmi_cli as a "sanity check". if amdsmi failed, i would see bigger problems

…handling in ROCm sanity tests

…thula/smi_test

…ctly executing pytest command

geomin12

lgtm but we do not have any linux signal right now. we will wait until machines are back

geomin12

i added a label, so this will trigger gfx94X tests (and check sanity). once signal is proven, this can be landed

geomin12

is this the PR to review ? or #4004 ? we should close the PR that is no longer needed

HRISHIKESHTHULA-AMD · 2026-03-25T04:01:27Z

is this the PR to review ? or #4004 ? we should close the PR that is no longer needed

#4004 is raised recently than this and having changes of this branch plus addition of more smi tests. So, both are valid PRs.
As the review of this PR is already done, can we merge this PR once approved so that tests of this PR at least test smi sanity till #4004 is being reviewed & approved?

madkasul · 2026-03-25T05:20:12Z

@HRISHIKESHTHULA-AMD, I noticed there is an workflow for running both elevated/non-elevated tests of amd-smi.
Please cross-check is your script is not an duplicate effort of it.
#ref: rocm-systems/.github/workflows/amdsmi-build.yml

HRISHIKESHTHULA-AMD · 2026-03-25T05:57:55Z

@HRISHIKESHTHULA-AMD, I noticed there is an workflow for running both elevated/non-elevated tests of amd-smi. Please cross-check is your script is not an duplicate effort of it. #ref: rocm-systems/.github/workflows/amdsmi-build.yml

@madkasul , as confirmed by @phani544 , Dev are not testing cli yet

geomin12

much cleaner, thanks for the updates and great work

Add validation tests for amd-smi CLI output

5e38611

HRISHIKESHTHULA-AMD requested a review from Copilot March 6, 2026 07:31

HRISHIKESHTHULA-AMD assigned sriasapu Mar 6, 2026

github-project-automation Bot added this to TheRock Triage Mar 6, 2026

github-project-automation Bot moved this to TODO in TheRock Triage Mar 6, 2026

Copilot started reviewing on behalf of HRISHIKESHTHULA-AMD March 6, 2026 07:32 View session

Copilot AI reviewed Mar 6, 2026

View reviewed changes

Merge branch 'main' of https://github.com/ROCm/TheRock into users/hri…

ae69d07

…thula/smi_test

sriasapu reviewed Mar 6, 2026

View reviewed changes

sriasapu assigned HRISHIKESHTHULA-AMD and unassigned sriasapu Mar 6, 2026

HRISHIKESH THULA added 5 commits March 6, 2026 17:52

Refactor amd-smi path resolution and update function signatures for c…

7edbb8d

…larity

Add logging for amd-smi command execution and output

bac9e22

Add parameterized test cases for amd-smi output modes

e7bebb2

Update logging level for amd-smi command execution in test_amdsmi_cli.py

9c80073

command enhance

30c2a1d

HRISHIKESHTHULA-AMD assigned sriasapu Mar 6, 2026

HRISHIKESHTHULA-AMD requested a review from sriasapu March 6, 2026 17:39

Refactor assertions for clarity and consistency in test_amdsmi_cli.py

7a9f5db

sriasapu approved these changes Mar 9, 2026

View reviewed changes

Merge branch 'main' of https://github.com/ROCm/TheRock into users/hri…

1588c09

…thula/smi_test

HRISHIKESHTHULA-AMD marked this pull request as ready for review March 9, 2026 14:15

rahulc-gh mentioned this pull request Mar 9, 2026

Bump rocm-systems submodule from 93bc019 to 093b66caa3 #3754

Merged

HRISHIKESHTHULA-AMD assigned geomin12 Mar 9, 2026

HRISHIKESHTHULA-AMD requested a review from geomin12 March 9, 2026 16:58

geomin12 reviewed Mar 9, 2026

View reviewed changes

HRISHIKESH THULA added 3 commits March 11, 2026 10:57

Update fetch_test_configurations.py to use --base-only for amdsmi_cli…

3f23ba4

… and refactor test_amdsmi_cli.py for improved clarity and functionality

Merge branch 'main' of https://github.com/ROCm/TheRock into users/hri…

daca4b5

…thula/smi_test

Merge branch 'main' of https://github.com/ROCm/TheRock into users/hri…

b002cc5

…thula/smi_test

HRISHIKESH THULA added 9 commits March 13, 2026 18:26

Add missing newline in test_rocm_sanity.py and fix marker formatting …

e45209f

…in conftest.py

Fix formatting of markers in conftest.py

8ccca35

Remove timeout parameter from pytest commands in test_sanity_check.yml

31f476b

Add timeout parameter to pytest commands in test_sanity_check.yml

f39a95e

Merge origin/main: keep local test_sanity_check.yml (preserve sanity …

c65ba70

…workflow split)

Refactor amd-smi CLI tests

f2eec65

clean up

8bfba37

Refactor code for improved readability and consistency in test scripts

69cc284

Enhance logging in _run_amd_smi function to include return code and o…

053d2bb

…utput

HRISHIKESHTHULA-AMD requested a review from geomin12 March 13, 2026 20:37

geomin12 reviewed Mar 14, 2026

View reviewed changes

HRISHIKESH THULA added 3 commits March 14, 2026 11:19

Remove amdsmi_cli tests and related configurations; refactor amd-smi …

d3aa5f6

…handling in ROCm sanity tests

Merge branch 'main' of https://github.com/ROCm/TheRock into users/hri…

3a678d0

…thula/smi_test

Refactor test_sanity.py by removing the _run_pytest function and dire…

a794b26

…ctly executing pytest command

HRISHIKESHTHULA-AMD requested a review from geomin12 March 14, 2026 06:06

geomin12 reviewed Mar 17, 2026

View reviewed changes

HRISHIKESHTHULA-AMD requested a review from geomin12 March 20, 2026 05:01

geomin12 added the test:hipcub For pull requests, runs full tests for only hipcub and other labeled projects. label Mar 20, 2026

geomin12 reviewed Mar 20, 2026

View reviewed changes

HRISHIKESHTHULA-AMD requested a review from geomin12 March 24, 2026 05:34

HRISHIKESHTHULA-AMD mentioned this pull request Mar 24, 2026

Test amd-smi sanity #4004

Closed

1 task

geomin12 reviewed Mar 24, 2026

View reviewed changes

HRISHIKESHTHULA-AMD requested a review from geomin12 March 25, 2026 03:47

geomin12 approved these changes Mar 27, 2026

View reviewed changes

HRISHIKESHTHULA-AMD closed this Apr 9, 2026

github-project-automation Bot moved this from TODO to Done in TheRock Triage Apr 9, 2026

HRISHIKESHTHULA-AMD deleted the users/hrithula/smi_test branch April 9, 2026 05:46

	p = Path(th) / "amd-smi"
	bin_dir = Path(th).resolve()
	p = bin_dir / "amd-smi"

		return blocks


		def _validate_human_block(block_text: str) -> list[str]:

		assert gpu_blocks, "No GPU blocks found in amd-smi output"


		@pytest.mark.not_sanity

Conversation

HRISHIKESHTHULA-AMD commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sriasapu commented Mar 6, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HRISHIKESHTHULA-AMD commented Mar 6, 2026 •

edited

Loading

HRISHIKESHTHULA-AMD commented Mar 25, 2026 •

edited

Loading