-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update run-readme-pr-linuxaarch64.yml to use correct runner #1469
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1469
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit 1a46c9a with merge base b2d8f2a ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@@ -11,7 +11,7 @@ jobs: | |||
test-readme-cpu: | |||
uses: pytorch/test-infra/.github/workflows/linux_job.yml@main | |||
with: | |||
runner: linux-aarch64 | |||
runner: linux.arm64.m7g.4xlarge | |||
gpu-arch-type: cuda | |||
gpu-arch-version: "12.1" | |||
timeout: 60 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try passing
docker-image: "pytorch/manylinuxaarch64-builder:cuda12.1-main"
Looks like the error:
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
Is related to the fact that this is using docker-image=pytorch/conda-builder:cuda12.1
image by default which is not correct for linux.arm64.m7g.4xlarge
runner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't look like it can find the Docker-image verbatim, testing with the 12.6 version found in pt/pt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If using linux_job_v2.yml you can try using latest image pytorch/manylinux2_28_aarch64-builder:cuda12.6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't look like the cuda version is there manylinux2_28_aarch64-builder:cuda12.6
, but the CPU variant :cpu-aarch64-main
with linux_job_v2
seems to be the right track
Now we're just down to missing devtoolset-10-binutils, which is curious since pt/pt uses v10 for aarch64
Edit: Resolved; the pip installs were unnecessary
fyi: @mikekgfb we looking into it |
* Update run-readme-pr-linuxaarch64.yml to use correct runner * Move to linux.arm64.m7g.4xlarge * Explicitly overriding the docker-image * Bumping Cuda version to 12.6 * Updating GPU Arch type * Testing various linux_job combos: v2 cuda, v2 cpu, v1 cpu * Adding permissions to linux job v2 * Switch everything to CPU linux v2 * Test with devtoolset-11 * Remove devtoolset install * Removing devtoolset from commands
#1350 used
linux-aarch64
as the runner when we should be usinglinux.arm64.2xlarge
for aarch64 instead