Deb and rpm install sanity check#3126
Conversation
Looks like this one is sent from a branch in the shared repository instead of a fork. Neither PR has a sufficiently descriptive PR title though (which install script? verify how?) |
|
Thanks @marbre for your review. Copying your comments from #3124 for better tracking. Take a look into how we test ROCm wheels and especially PyTorch wheels. Testing should go to a separate job. @HereThereBeDragons can give further guidance. Will review again once this was addressed and Laura's reviewed. This has some design flaws from my perspective. Testing should probably be run in a separate job and should depend on building. This would allow to run the job in a container. Furthermore, this it would be preferred to use a matrix to spawn different distributions. [Response:] The description is slightly red herring. This is not the full test of native packages. But just a sanity check before uploading to s3. @jonatluu lets update the description and file name to reflect that. either to package sanity check or something like that. Hope sanity check is fine to add in the build job. @marbre @HereThereBeDragons Regarding docker, the native build package job is run on a Ubunutu24 machine. So the rpm package verification cant be done on that machine, hence the manylinux docker in started for rpm verification. |
| # Install using apt | ||
| cmd = ["sudo", "apt", "install", "-y"] + package_paths | ||
|
|
||
| print(f"\nRunning: {' '.join(cmd)}\n") | ||
|
|
||
| try: | ||
| result = subprocess.run( | ||
| cmd, | ||
| check=True, | ||
| stdout=subprocess.PIPE, | ||
| stderr=subprocess.STDOUT, | ||
| text=True, | ||
| ) | ||
| print(result.stdout) | ||
| print("\n✅ DEB packages installed successfully") | ||
| return True |
There was a problem hiding this comment.
Testing should probably be run in a separate job and should depend on building. This would allow to run the job in a container. Furthermore, this it would be preferred to use a matrix to spawn different distributions. The job always run ins a manylinux docker container. Spawning a manylinux container in a manylinux container is not what we want and it should rather match on of the OS profiles. Deb testing wont work if installing in the manylinux container. Instead an Ubuntu or Debian container needs to be spawned. @HereThereBeDragons can give some guidance here as some of the designed are most likely used / considered in the upcoming weekly CI.
[Response:] The description is slightly red herring. This is not the full test of native packages. But just a sanity check before uploading to s3. @jonatluu lets update the description and file name to reflect that. either to package sanity check or something like that. Hope sanity check is fine to add in the build job. @marbre @HereThereBeDragons
Regarding docker, the native build package job is run on a Ubunutu24 machine. So the rpm package verification cant be done on that machine, hence the manylinux docker in started for rpm verification.
In my opinion this is more than a "sanity check" as it actually installs the packages. I would expect a sanity check for these packages to check that files exist, have certain file names, are certain sizes, etc. (static analysis that doesn't actually affect the current system state). The fact that this "sanity check" script has a --uninstall option tips it over the edge IMO.
We have a similar "sanity check by installing" step for the Python packages right now:
TheRock/.github/workflows/build_windows_python_packages.yml
Lines 77 to 84 in 8466141
I'm in the process of moving to that to https://github.com/ROCm/TheRock/blob/main/.github/workflows/test_rocm_wheels.yml and being able to independently trigger the test job on any type of runner with any input packages (without waiting for packages to be built) is very useful.
There was a problem hiding this comment.
Thanks @ScottTodd for the feedback. Reading through your comments I am proposing a slightly modified flow. Could you let us know your thoughts and preferred model which matches the envision for ci/cd system.
Proposal 1:
build_native_packages.yml:
- Build packages
- Upload to GitHub Artifacts
- Upload to S3 STAGING bucket ← Modified
- Trigger promotion test workflow (waits for completion)
- If tests pass → promote to production S3
- Workflow dispatch for full test of ROCm (No waiting for results)
promotion_test_native_packages.yml:
- Download from artifacts
- Run tests
- Report success/failure
full_test_native_packages.yml:
- Install from s3 bucket
- Run tests
- Report success/failure
Proposal 2:
build_native_packages.yml:
- Build packages
- Upload to GitHub Artifacts
- Upload to S3 bucket
- Workflow dispatch for full test of ROCm (No waiting for results)
full_test_native_packages.yml:
- Install from s3 bucket
- Run tests
- Report success/failure
There was a problem hiding this comment.
Sorry for the late response.
First off, we shouldn't use "GitHub Artifacts" at all. Everything should go through S3 where we can set our own retention policies, add our own index pages, control billing centrally, etc.
The "upload to staging" approach is okay as that is what existing release workflows do. I'd like to rework all of our release workflows to this instead though:
- Build artifacts/packages/etc. in one job (on a CPU runner)
- Upload those files to a
therock-*-artifactsbucket chosen byTheRock/build_tools/github_actions/github_actions_utils.py
Lines 506 to 520 in 1df317b
- If running in a release pipeline, copy those files to staging S3 in a
therock-*-packagesortherock-*-pythonbucket - Download and test those files (on a GPU runner)
- If running in a release pipeline and tests pass, copy those files to production S3 in a
therock-*-packagesortherock-*-python
That way, we'll be able to build and test on CI workflows too, and release workflows will look just like CI workflows... but with extra "push to prod" steps.
I'd say try to follow what we do for PyTorch and JAX for now, and I can come through and refactor to that approach when I get to it. See also #3177, where I've added table cells for "native Linux packages".
c695f85 to
fb91ac2
Compare
03955a2 to
088e244
Compare
088e244 to
c8b7f88
Compare
36bcaf9 to
5fd1605
Compare
8616314 to
ba7f968
Compare
4d91665 to
ee26838
Compare
ee26838 to
51542de
Compare
51542de to
cc934d3
Compare
cc934d3 to
6ef3306
Compare
17de1e7 to
408fc0b
Compare
ScottTodd
left a comment
There was a problem hiding this comment.
I started reviewing but I can't tell the current status of this PR. Are you looking for a review here or is this still in development?
My bad @ScottTodd Switching between PR's i am still missed to update the PR description. But thanks your time and comments. Will get all comments addressed asp. |
ScottTodd
left a comment
There was a problem hiding this comment.
Thanks for the improvements!
a507b31 to
5dbabbb
Compare
Adds a reusable workflow for testing native Linux package installation across Ubuntu/Debian (deb) and RHEL/SLES (rpm) OS profiles. - Derives package type, GPU architecture, and container image via get_url_repo_params.py in a prepare_install_context job. - Supports optional repository and ref inputs for custom checkout targets. - Installs prerequisites per distro and runs native_linux_package_install_test.py. - Reports pass/fail with a structured test report step.
5dbabbb to
12a77f5
Compare
Changes were requested in the first version. Now the workflow has improved and all concerns are already addressed
|
Had to squash the changes to a single merge using git push --force-with-lease to avoid merge conflicts. A git merge commit was another option. But it will show 50+ commit history in the main. |
I regularly |


Motivation
This pull request enhances the native Linux package build and test workflows by introducing automated install verification for built packages. It adds new workflow inputs, implements conditional install testing, and creates a dedicated workflow to validate package installation across multiple OS profiles. These changes improve the reliability and traceability of package builds, especially for prerelease and nightly releases.
Technical Details
Summary
test_native_linux_packages_install.yml(new file):sanity(repo install + basic verification) andfull(sanity + RDHC verification)package_install_url, installs ROCmpackages, verifies key components and installed package list
Test plan
build_native_linux_packages.ymlfor a dev/nightly build andverify
package_install_urlis correctly emitted to job outputstest_native_linux_packages_install.ymlwith DEB repo URL onubuntu2404 and confirm
amdrocmandamdrocm-core-sdkinstall cleanlyTest Result
No Verify nightly:
https://github.com/ROCm/TheRock/actions/runs/23602292215
https://github.com/ROCm/TheRock/actions/runs/23610612186
Verify nightly:
https://github.com/ROCm/TheRock/actions/runs/23785348518 (rpm)
https://github.com/ROCm/TheRock/actions/runs/23778928188 (deb)
Full Test workflow:
https://github.com/ROCm/TheRock/actions/runs/24353145372
https://github.com/ROCm/TheRock/actions/runs/24571097705, https://github.com/ROCm/TheRock/actions/runs/24569164778