Skip to content

Deb and rpm install sanity check#3126

Merged
nunnikri merged 1 commit into
mainfrom
users/jonatluu/install_test
Apr 28, 2026
Merged

Deb and rpm install sanity check#3126
nunnikri merged 1 commit into
mainfrom
users/jonatluu/install_test

Conversation

@jonatluu
Copy link
Copy Markdown
Contributor

@jonatluu jonatluu commented Jan 28, 2026

Motivation

This pull request enhances the native Linux package build and test workflows by introducing automated install verification for built packages. It adds new workflow inputs, implements conditional install testing, and creates a dedicated workflow to validate package installation across multiple OS profiles. These changes improve the reliability and traceability of package builds, especially for prerelease and nightly releases.

Technical Details

Summary

  • test_native_linux_packages_install.yml (new file):
    • Full repo-based install and verification workflow for native packages
    • Supports DEB (Ubuntu 24.04) and RPM (RHEL 10, SLES 16) via OS profile
    • Configurable test types: sanity (repo install + basic verification) and
      full (sanity + RDHC verification)
    • Sets up package manager repo from package_install_url, installs ROCm
      packages, verifies key components and installed package list
    • Uses a shared Python venv across steps for consistent dependency management
    • Supports optional GPG key verification for prerelease builds
    • Install test invocation using pytest

Test plan

  • Trigger build_native_linux_packages.yml for a dev/nightly build and
    verify package_install_url is correctly emitted to job outputs
  • Trigger test_native_linux_packages_install.yml with DEB repo URL on
    ubuntu2404 and confirm amdrocm and amdrocm-core-sdk install cleanly
  • Trigger with RPM repo URL on rhel10 and sles16 and confirm install

Test Result

No Verify nightly:
https://github.com/ROCm/TheRock/actions/runs/23602292215
https://github.com/ROCm/TheRock/actions/runs/23610612186

Verify nightly:
https://github.com/ROCm/TheRock/actions/runs/23785348518 (rpm)
https://github.com/ROCm/TheRock/actions/runs/23778928188 (deb)

Full Test workflow:
https://github.com/ROCm/TheRock/actions/runs/24353145372
https://github.com/ROCm/TheRock/actions/runs/24571097705, https://github.com/ROCm/TheRock/actions/runs/24569164778

marbre
marbre previously requested changes Jan 28, 2026
Copy link
Copy Markdown
Member

@marbre marbre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate of #3124! Will not further review this one.

@ScottTodd
Copy link
Copy Markdown
Member

Duplicate of #3124! Will not further review this one.

Looks like this one is sent from a branch in the shared repository instead of a fork. Neither PR has a sufficiently descriptive PR title though (which install script? verify how?)

@nunnikri
Copy link
Copy Markdown
Contributor

Thanks @marbre for your review. Copying your comments from #3124 for better tracking.
Marius Commented:
There are some design and logic issues here. Furthermore the PR title and description is not appropriate. This PR is not limited to adding a script and rather aims to "Add testing for native packages" as it also touched the workflow. Furthermore, pre-submit is failing and there are no test results linked.

Take a look into how we test ROCm wheels and especially PyTorch wheels. Testing should go to a separate job. @HereThereBeDragons can give further guidance. Will review again once this was addressed and Laura's reviewed.

This has some design flaws from my perspective.

Testing should probably be run in a separate job and should depend on building. This would allow to run the job in a container. Furthermore, this it would be preferred to use a matrix to spawn different distributions.
The job always run ins a manylinux docker container. Spawning a manylinux container in a manylinux container is not what we want and it should rather match on of the OS profiles.
Deb testing wont work if installing in the manylinux container. Instead an Ubuntu or Debian container needs to be spawned.
@HereThereBeDragons can give some guidance here as some of the designed are most likely used / considered in the upcoming weekly CI.

[Response:] The description is slightly red herring. This is not the full test of native packages. But just a sanity check before uploading to s3. @jonatluu lets update the description and file name to reflect that. either to package sanity check or something like that. Hope sanity check is fine to add in the build job. @marbre @HereThereBeDragons

Regarding docker, the native build package job is run on a Ubunutu24 machine. So the rpm package verification cant be done on that machine, hence the manylinux docker in started for rpm verification.

@jonatluu jonatluu changed the title install script Deb and rpm install sanity check Jan 30, 2026
@jonatluu jonatluu marked this pull request as draft January 30, 2026 18:34
@jonatluu jonatluu marked this pull request as ready for review January 30, 2026 22:29
Comment on lines +172 to +187
# Install using apt
cmd = ["sudo", "apt", "install", "-y"] + package_paths

print(f"\nRunning: {' '.join(cmd)}\n")

try:
result = subprocess.run(
cmd,
check=True,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
text=True,
)
print(result.stdout)
print("\n✅ DEB packages installed successfully")
return True
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing should probably be run in a separate job and should depend on building. This would allow to run the job in a container. Furthermore, this it would be preferred to use a matrix to spawn different distributions. The job always run ins a manylinux docker container. Spawning a manylinux container in a manylinux container is not what we want and it should rather match on of the OS profiles. Deb testing wont work if installing in the manylinux container. Instead an Ubuntu or Debian container needs to be spawned. @HereThereBeDragons can give some guidance here as some of the designed are most likely used / considered in the upcoming weekly CI.

[Response:] The description is slightly red herring. This is not the full test of native packages. But just a sanity check before uploading to s3. @jonatluu lets update the description and file name to reflect that. either to package sanity check or something like that. Hope sanity check is fine to add in the build job. @marbre @HereThereBeDragons

Regarding docker, the native build package job is run on a Ubunutu24 machine. So the rpm package verification cant be done on that machine, hence the manylinux docker in started for rpm verification.

In my opinion this is more than a "sanity check" as it actually installs the packages. I would expect a sanity check for these packages to check that files exist, have certain file names, are certain sizes, etc. (static analysis that doesn't actually affect the current system state). The fact that this "sanity check" script has a --uninstall option tips it over the edge IMO.

We have a similar "sanity check by installing" step for the Python packages right now:

- name: Sanity check Python packages
run: |
piprepo build "${{ env.PACKAGES_DIR }}/dist"
pip install rocm[libraries,devel]==${{ inputs.package_version }} \
--extra-index-url "${{ env.PACKAGES_DIR }}/dist/simple/"
rocm-sdk test
# TODO(#1559): upload packages to artifacts S3 bucket and/or a dedicated Python packages bucket

I'm in the process of moving to that to https://github.com/ROCm/TheRock/blob/main/.github/workflows/test_rocm_wheels.yml and being able to independently trigger the test job on any type of runner with any input packages (without waiting for packages to be built) is very useful.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ScottTodd for the feedback. Reading through your comments I am proposing a slightly modified flow. Could you let us know your thoughts and preferred model which matches the envision for ci/cd system.

Proposal 1:
build_native_packages.yml:

  1. Build packages
  2. Upload to GitHub Artifacts
  3. Upload to S3 STAGING bucket ← Modified
  4. Trigger promotion test workflow (waits for completion)
  5. If tests pass → promote to production S3
  6. Workflow dispatch for full test of ROCm (No waiting for results)

promotion_test_native_packages.yml:

  1. Download from artifacts
  2. Run tests
  3. Report success/failure

full_test_native_packages.yml:

  1. Install from s3 bucket
  2. Run tests
  3. Report success/failure

Proposal 2:
build_native_packages.yml:

  1. Build packages
  2. Upload to GitHub Artifacts
  3. Upload to S3 bucket
  4. Workflow dispatch for full test of ROCm (No waiting for results)

full_test_native_packages.yml:

  1. Install from s3 bucket
  2. Run tests
  3. Report success/failure

Copy link
Copy Markdown
Member

@ScottTodd ScottTodd Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late response.

First off, we shouldn't use "GitHub Artifacts" at all. Everything should go through S3 where we can set our own retention policies, add our own index pages, control billing centrally, etc.

The "upload to staging" approach is okay as that is what existing release workflows do. I'd like to rework all of our release workflows to this instead though:

  1. Build artifacts/packages/etc. in one job (on a CPU runner)
  2. Upload those files to a therock-*-artifacts bucket chosen by
    def retrieve_bucket_info(
    github_repository: str | None = None,
    workflow_run_id: str | None = None,
    workflow_run: dict | None = None,
    ) -> tuple[str, str]:
    """Given a github repository and a workflow run, retrieves bucket information.
    This is intended to segment artifacts by repository and trust level, with
    artifacts split across several buckets:
    * therock-ci-artifacts
    * therock-ci-artifacts-external
    * therock-artifacts-internal
    * therock-dev-artifacts
    * therock-nightly-artifacts
    * therock-release-artifacts
  3. If running in a release pipeline, copy those files to staging S3 in a therock-*-packages or therock-*-python bucket
  4. Download and test those files (on a GPU runner)
  5. If running in a release pipeline and tests pass, copy those files to production S3 in a therock-*-packages or therock-*-python

That way, we'll be able to build and test on CI workflows too, and release workflows will look just like CI workflows... but with extra "push to prod" steps.

I'd say try to follow what we do for PyTorch and JAX for now, and I can come through and refactor to that approach when I get to it. See also #3177, where I've added table cells for "native Linux packages".

Comment thread .github/workflows/build_native_linux_packages.yml Outdated
@jonatluu jonatluu marked this pull request as draft February 4, 2026 16:46
@jonatluu jonatluu marked this pull request as ready for review February 6, 2026 21:57
@jonatluu jonatluu marked this pull request as draft February 9, 2026 20:45
@arvindcheru arvindcheru force-pushed the users/jonatluu/install_test branch 3 times, most recently from c695f85 to fb91ac2 Compare March 6, 2026 03:31
@arvindcheru arvindcheru force-pushed the users/jonatluu/install_test branch 5 times, most recently from 03955a2 to 088e244 Compare March 11, 2026 02:21
@nunnikri nunnikri force-pushed the users/jonatluu/install_test branch from 088e244 to c8b7f88 Compare March 11, 2026 02:49
@arvindcheru arvindcheru force-pushed the users/jonatluu/install_test branch 4 times, most recently from 36bcaf9 to 5fd1605 Compare March 17, 2026 20:42
@arvindcheru arvindcheru force-pushed the users/jonatluu/install_test branch 4 times, most recently from 8616314 to ba7f968 Compare April 9, 2026 22:28
@arvindcheru arvindcheru force-pushed the users/jonatluu/install_test branch 4 times, most recently from 4d91665 to ee26838 Compare April 16, 2026 04:56
@nunnikri nunnikri requested a review from marbre April 16, 2026 17:15
@arvindcheru arvindcheru force-pushed the users/jonatluu/install_test branch from ee26838 to 51542de Compare April 17, 2026 07:49
@nunnikri nunnikri requested a review from ScottTodd April 17, 2026 17:51
@arvindcheru arvindcheru force-pushed the users/jonatluu/install_test branch from 51542de to cc934d3 Compare April 22, 2026 04:32
@nunnikri nunnikri force-pushed the users/jonatluu/install_test branch from cc934d3 to 6ef3306 Compare April 24, 2026 18:59
@arvindcheru arvindcheru force-pushed the users/jonatluu/install_test branch from 17de1e7 to 408fc0b Compare April 26, 2026 16:44
Copy link
Copy Markdown
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started reviewing but I can't tell the current status of this PR. Are you looking for a review here or is this still in development?

  • The PR description is out of date (there are no changes to build_native_linux_packages.yml on the branch right now)
  • This review branch has been repeatedly force-pushed to: Image

Comment thread .github/workflows/test_native_linux_packages_install.yml Outdated
Comment thread .github/workflows/test_native_linux_packages_install.yml Outdated
Comment thread .github/workflows/test_native_linux_packages_install.yml
Comment thread .github/workflows/test_native_linux_packages_install.yml
Comment thread .github/workflows/test_native_linux_packages_install.yml Outdated
Comment thread .github/workflows/test_native_linux_packages_install.yml Outdated
@nunnikri
Copy link
Copy Markdown
Contributor

nunnikri commented Apr 27, 2026

I started reviewing but I can't tell the current status of this PR. Are you looking for a review here or is this still in development?

  • The PR description is out of date (there are no changes to build_native_linux_packages.yml on the branch right now)
  • This review branch has been repeatedly force-pushed to: Image

My bad @ScottTodd Switching between PR's i am still missed to update the PR description. But thanks your time and comments. Will get all comments addressed asp.
Just noticed Cluade is sometimes using force push. Will take care of that going forward

Comment thread .github/workflows/build_native_linux_packages.yml Outdated
Comment thread .github/workflows/test_native_linux_packages_install.yml Outdated
Comment thread .github/workflows/test_native_linux_packages_install.yml Outdated
Comment thread .github/workflows/test_native_linux_packages_install.yml
Comment thread .github/workflows/test_native_linux_packages_install.yml Outdated
Comment thread .github/workflows/test_native_linux_packages_install.yml
@nunnikri nunnikri requested a review from ScottTodd April 28, 2026 18:27
Copy link
Copy Markdown
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvements!

@nunnikri nunnikri force-pushed the users/jonatluu/install_test branch from a507b31 to 5dbabbb Compare April 28, 2026 21:37
Adds a reusable workflow for testing native Linux package installation
across Ubuntu/Debian (deb) and RHEL/SLES (rpm) OS profiles.

- Derives package type, GPU architecture, and container image via
  get_url_repo_params.py in a prepare_install_context job.
- Supports optional repository and ref inputs for custom checkout targets.
- Installs prerequisites per distro and runs native_linux_package_install_test.py.
- Reports pass/fail with a structured test report step.
@nunnikri nunnikri force-pushed the users/jonatluu/install_test branch from 5dbabbb to 12a77f5 Compare April 28, 2026 21:39
@nunnikri nunnikri dismissed marbre’s stale review April 28, 2026 21:40

Changes were requested in the first version. Now the workflow has improved and all concerns are already addressed

@nunnikri
Copy link
Copy Markdown
Contributor

Had to squash the changes to a single merge using git push --force-with-lease to avoid merge conflicts. A git merge commit was another option. But it will show 50+ commit history in the main.

@nunnikri nunnikri merged commit 1eb0283 into main Apr 28, 2026
12 checks passed
@nunnikri nunnikri deleted the users/jonatluu/install_test branch April 28, 2026 21:51
@github-project-automation github-project-automation Bot moved this from TODO to Done in TheRock Triage Apr 28, 2026
@ScottTodd
Copy link
Copy Markdown
Member

Had to squash the changes to a single merge using git push --force-with-lease to avoid merge conflicts. A git merge commit was another option. But it will show 50+ commit history in the main.

I regularly git merge upstream/main in my own work to sync. That's what the "Update branch" button in the UI does. We merge with squash and merge by default, so the commit history on the PR doesn't matter for the merge. Force pushing during review breaks many review workflows that depend on being able to see what changed from commit to commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants