Skip to content

Add workflow for JAX wheel builds#1033

Merged
charleshofer merged 7 commits into
mainfrom
add-jax-ci
Aug 1, 2025
Merged

Add workflow for JAX wheel builds#1033
charleshofer merged 7 commits into
mainfrom
add-jax-ci

Conversation

@charleshofer
Copy link
Copy Markdown
Contributor

Adds a workflow for building JAX with release tarballs

@charleshofer
Copy link
Copy Markdown
Contributor Author

Need to merge #1033 before this one

@marbre
Copy link
Copy Markdown
Member

marbre commented Jul 15, 2025

Need to merge #1033 before this one

This one is #1033 :)

@charleshofer
Copy link
Copy Markdown
Contributor Author

Sorry. Need to merge ROCm/rocm-jax#58 first.

@gabeweisz
Copy link
Copy Markdown

@ScottTodd - any feedback on this?

@@ -0,0 +1,117 @@
name: Build Linux PyTorch Wheels
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update workflow name to match the file name

Suggested change
name: Build Linux PyTorch Wheels
name: Build Linux JAX Wheels

Also, are these builds "portable", e.g. by building under a manylinux docker container? I recently renamed some workflows to show that more clearly: #1023. It doesn't look like it from a read of https://github.com/ROCm/rocm-jax/blob/master/build/ci_build, so are these Linux packages only runnable on Ubuntu (or whichever distro is used for the build)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these are built under a manylinux wheel. For a number of reasons, our build scripts are a little oniony. But we create a manylinux image with ROCm here, and then do the build in that: https://github.com/ROCm/rocm-jax/blob/master/jax_rocm_plugin/build/rocm/ci_build#L57

Comment thread .github/workflows/build_linux_jax_wheels.yml
Comment on lines +101 to +104
- name: Checkout TheRock
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
path: TheRock
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to omit path for the workflow's own repository and instead add path as needed for other repositories like rocm/rocm-jax above.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I flipped this around. I just want to point out that the only thing we use from checked-out TheRock is the ./build_tools/third_party/s3_management/manage.py. We use the tarball to actually do the build.

- name: Build JAX Wheels
run: |
cd rocm-jax
python3 build/ci_build \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make sense to move this script over to TheRock, similar to what we have for PyTorch. Wdyt?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. This is the script that all CI teams use to build JAX. TheRock aren't the only users. DevOps team uses this when they build stuff against regular ROCm, and we use this in our CI on the JAX team.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make more sense then to checkout TheRock in rocm-jax, similar to what is done in rocm-libraries?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You guys wanted the opposite above, so I changed it. I can change it back if you want?

When this workflow checks out TheRock, all it uses is ./build_tools/third_party/s3_management/manage.py. The actual build uses the tarball that you give it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You guys wanted the opposite above, so I changed it. I can change it back if you want?

I have not been involved in all prior discussions and might miss context here.

If it was agreed on earlier that this might be okay as is.

Comment thread .github/workflows/build_linux_jax_wheels.yml Outdated
Comment thread .github/workflows/build_linux_jax_wheels.yml Outdated
@charleshofer charleshofer marked this pull request as ready for review July 25, 2025 18:15
@stellaraccident
Copy link
Copy Markdown
Collaborator

How close are we to getting this landed? Any details we can defer until we get it in and some runs under our belt?

@charleshofer
Copy link
Copy Markdown
Contributor Author

How close are we to getting this landed? Any details we can defer until we get it in and some runs under our belt?

I've addressed everyone's comments on this, I think. Just waiting on an approval.

@ScottTodd ScottTodd requested review from ScottTodd and marbre July 28, 2025 16:18
@ScottTodd
Copy link
Copy Markdown
Member

Ah, please click "re-request review" (if you can, otherwise just ping with @username) when you are ready for another round of review please. That helps PRs get added back to review dashboards like https://github.com/pulls/review-requested

@charleshofer
Copy link
Copy Markdown
Contributor Author

Yeah, I'm not seeing the "re-request review" option where it normally is. But it's ready for another round @marbre.

@ScottTodd
Copy link
Copy Markdown
Member

I clicked it for you:
image
image

It looked closer to this before:
image

Comment on lines 106 to 102
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marbre any concerns with using the same S3 bucket, subdir, etc. for JAX wheels?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marbre any concerns with using the same S3 bucket, subdir, etc. for JAX wheels?

I am still undecided. We will move creating regular releases (and writing to the nightly releases bucket) to another repo but this does does not necessarily conflict with the proposed changes. What is a bit unfortunate is the fact that the build script is not in TheRock and that it gets called in a release workflow. Furthermore, changes to the script upstream it will not trigger a CI run in TheRock.

Furthermore, with #1110, we plan to land a gating mechanism soon. Untested wheels (and I consider the JAX wheels untested for now) will go to a staging subdir in the bucket to allow testing. If tests pass, they get copied to the final location. This is something we might want to have for JAX wheels as well or at least want it reworked after #1110 was landed.

Yes, without adding the package itself to manage.py, the index will not include the JAX wheel. Furthermore, it should be checked if additional packages must be added to update_dependencies.py.

  • If the build process outputs (or depends on) wheels other than JAX, is there a chance that they could conflict with what the PyTorch build produces?

Yes, running the workflow to see what it produces and what it pushes to the dev is indeed something we should check.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a bit unfortunate is the fact that the build script is not in TheRock and that it gets called in a release workflow. Furthermore, changes to the script upstream it will not trigger a CI run in TheRock.

Well it's the same for PyTorch. The scripts we have in TheRock are wrapping https://github.com/pytorch/pytorch/blob/main/setup.py. We're bootstrapping here, the bulk of support belongs upstream.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, do I need to do anything here to switch the bucket or anything? Or is this good as-is for now?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did add the packages that JAX needs to the list though. I think I've got everything in the right place.

Comment on lines 30 to 28
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you run this workflow to test it? Can you share the logs?

If you wanted to test in this repository with the self-hosted runners, you could add a placeholder workflow first like with #828. This looks far enough along to land as-is, test, and then send new PRs for improvements.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charleshofer can you try with the self-hosted runner here and post the log. Would really like to land this..

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just run things through act to make sure the syntax was okay. I think it'd be easier to land this one rather than create another PR.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to review a change a workflow file without being able to see any logs of the workflow running, especially for a new workflow. I can make an attempt to parse the code and walk through it by eye, but there are details that I can't foresee from just the code, like how long each step takes, if there are permissions issues, if there are syntax errors, etc.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remember for future 🫡

Comment on lines 20 to 23
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This input doesn't appear to be used yet. Safe to remove?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'll remove this

Comment on lines 30 to 28
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to review a change a workflow file without being able to see any logs of the workflow running, especially for a new workflow. I can make an attempt to parse the code and walk through it by eye, but there are details that I can't foresee from just the code, like how long each step takes, if there are permissions issues, if there are syntax errors, etc.

Comment on lines 135 to 136
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting, these package names encode the ROCm version? That isn't part of a version identifier or part of the index URL that the packages come from? As we rework packaging for PyTorch too, let's keep an eye on this.

Comment thread build_tools/third_party/s3_management/update_dependencies.py Outdated
@charleshofer charleshofer merged commit 53a722f into main Aug 1, 2025
1 check passed
@charleshofer charleshofer deleted the add-jax-ci branch August 1, 2025 16:33
@github-project-automation github-project-automation Bot moved this from TODO to Done in TheRock Triage Aug 1, 2025
charleshofer added a commit that referenced this pull request Sep 29, 2025
## Motivation

Fixes problems with the JAX CI workflow that was created in #1033 

## Technical Details

Fixes minor problems that make the workflow crash, and ensures some
command-line tools that we need are installed.

## Test Plan

Make sure the workflow works when run with Actions:
https://github.com/ROCm/TheRock/actions/workflows/build_linux_jax_wheels.yml

---------

Co-authored-by: Scott Todd <scott.todd0@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants