Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update run-readme-pr-linuxaarch64.yml to use correct runner #1469

Merged
merged 12 commits into from
Jan 23, 2025

Conversation

Jack-Khuu
Copy link
Contributor

@Jack-Khuu Jack-Khuu commented Jan 22, 2025

#1350 used linux-aarch64 as the runner when we should be using linux.arm64.2xlarge for aarch64 instead

Copy link

pytorch-bot bot commented Jan 22, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1469

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 1a46c9a with merge base b2d8f2a (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 22, 2025
@@ -11,7 +11,7 @@ jobs:
test-readme-cpu:
uses: pytorch/test-infra/.github/workflows/linux_job.yml@main
with:
runner: linux-aarch64
runner: linux.arm64.m7g.4xlarge
gpu-arch-type: cuda
gpu-arch-version: "12.1"
timeout: 60
Copy link

@atalman atalman Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try passing

docker-image: "pytorch/manylinuxaarch64-builder:cuda12.1-main"

Looks like the error:
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested

Is related to the fact that this is using docker-image=pytorch/conda-builder:cuda12.1 image by default which is not correct for linux.arm64.m7g.4xlarge runner

Copy link
Contributor Author

@Jack-Khuu Jack-Khuu Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't look like it can find the Docker-image verbatim, testing with the 12.6 version found in pt/pt

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If using linux_job_v2.yml you can try using latest image pytorch/manylinux2_28_aarch64-builder:cuda12.6

Copy link
Contributor Author

@Jack-Khuu Jack-Khuu Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't look like the cuda version is there manylinux2_28_aarch64-builder:cuda12.6, but the CPU variant :cpu-aarch64-main with linux_job_v2 seems to be the right track

Now we're just down to missing devtoolset-10-binutils, which is curious since pt/pt uses v10 for aarch64
Edit: Resolved; the pip installs were unnecessary

@Jack-Khuu
Copy link
Contributor Author

fyi: @mikekgfb we looking into it

@Jack-Khuu Jack-Khuu merged commit 3ce9c8e into main Jan 23, 2025
61 of 62 checks passed
@Jack-Khuu Jack-Khuu deleted the Jack-Khuu-patch-33 branch January 24, 2025 23:48
vmpuri pushed a commit that referenced this pull request Feb 4, 2025
* Update run-readme-pr-linuxaarch64.yml to use correct runner

* Move to linux.arm64.m7g.4xlarge

* Explicitly overriding the docker-image

* Bumping Cuda version to 12.6

* Updating GPU Arch type

* Testing various linux_job combos: v2 cuda, v2 cpu, v1 cpu

* Adding permissions to linux job v2

* Switch everything to CPU linux v2

* Test with devtoolset-11

* Remove devtoolset install

* Removing devtoolset from commands
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants