Skip to content

ci(fix): Wheel build#2192

Merged
ko3n1g merged 21 commits intomainfrom
ko3n1g/ci/wheel
Feb 5, 2026
Merged

ci(fix): Wheel build#2192
ko3n1g merged 21 commits intomainfrom
ko3n1g/ci/wheel

Conversation

@ko3n1g
Copy link
Copy Markdown
Contributor

@ko3n1g ko3n1g commented Feb 3, 2026

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Changelog

  • Add specific line by line info of high level changes in this PR.

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

Summary by CodeRabbit

  • Chores
    • Updated continuous integration workflow with enhanced security controls and improved build parameterization.
    • Refreshed container image and re-enabled build-test-publish pipeline features.
    • Added configuration inputs for flexible infrastructure and test data management.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 3, 2026

📝 Walkthrough

Walkthrough

Updated GitHub Actions workflow to re-enable previously commented build-test-publish-wheel jobs with enhanced configuration. Version reference updated from v0.64.2 to v0.69.1, new configuration inputs added, and multiple job parameters and secrets restored with updated container image and runner configuration.

Changes

Cohort / File(s) Summary
GitHub Actions Workflow Configuration
.github/workflows/build-test-publish-wheel.yml
Re-enabled build-test-publish-wheel and summary jobs with updated pre-flight workflow version (v0.69.1), added configuration inputs for runner prefixes and test data paths, introduced NVIDIA_MANAGEMENT_ORG_PAT secret, restored multiple job parameters and secrets, updated container image to PyTorch 25.11, and adjusted conditional logic in summary job.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'ci(fix): Wheel build' directly relates to the main change: reintroducing and fixing the build-test-publish-wheel workflow with updated configurations and job parameters.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Test Results For Major Changes ✅ Passed This PR contains only CI/workflow configuration changes to .github/workflows/build-test-publish-wheel.yml, which are infrastructure-level changes that do not affect algorithms, numerics, convergence, or performance.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ko3n1g/ci/wheel

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @.github/workflows/build-test-publish-wheel.yml:
- Around line 72-95: The summary job build-test-publish-wheel-summary is calling
gh run view but lacks actions:read permission and masks errors with "|| echo 0",
causing false success; update the job to include permissions: { actions: read }
and set GH_TOKEN: ${{ github.token }} in the Result step's environment, remove
the "|| echo 0" error-masking so gh run view errors surface, and ensure both gh
run view invocations use the GH_TOKEN env so authorization failures are reported
instead of being treated as zero failed jobs.

Comment on lines +72 to +95
build-test-publish-wheel-summary:
needs: [pre-flight, build-test-publish-wheel]
if: |
(
needs.pre-flight.outputs.docs_only == 'true'
|| needs.pre-flight.outputs.is_deployment_workflow == 'true'
|| always()
)
&& !cancelled()
runs-on: ubuntu-latest
steps:
- name: Result
run: |
FAILED_JOBS=$(gh run view $GITHUB_RUN_ID --json jobs --jq '[.jobs[] | select(.status == "completed" and .conclusion != "success")] | length') || echo 0

# if [ "${FAILED_JOBS:-0}" -eq 0 ] || [ "$SKIPPING_IS_ALLOWED" == "true" ]; then
# echo "✅ All previous jobs completed successfully"
# exit 0
# else
# echo "❌ Found $FAILED_JOBS failed job(s)"
# # Show which jobs failed
# gh run view $GITHUB_RUN_ID --json jobs --jq '.jobs[] | select(.status == "completed" and .conclusion != "success") | .name'
# exit 1
# fi
if [ "${FAILED_JOBS:-0}" -eq 0 ] || [ "$SKIPPING_IS_ALLOWED" == "true" ]; then
echo "✅ All previous jobs completed successfully"
exit 0
else
echo "❌ Found $FAILED_JOBS failed job(s)"
# Show which jobs failed
gh run view $GITHUB_RUN_ID --json jobs --jq '.jobs[] | select(.status == "completed" and .conclusion != "success") | .name'
exit 1
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "build-test-publish-wheel.yml" -type f

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 119


🏁 Script executed:

cat -n .github/workflows/build-test-publish-wheel.yml

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 4293


🌐 Web query:

GitHub Actions gh cli run view required permissions actions:read

💡 Result:

To run gh run view inside a GitHub Actions workflow, authenticate gh with GH_TOKEN and ensure the workflow token has Actions read permission.

jobs:
  view:
    runs-on: ubuntu-latest
    permissions:
      actions: read          # required for viewing workflow runs
      contents: read         # commonly needed if you also use actions/checkout
      # checks: read         # only needed if you use features that fetch annotations (e.g., gh run watch)
    steps:
      - uses: actions/checkout@v4
      - run: gh run view ${{ github.run_id }} --log
        env:
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

Notes:

  • In GitHub Actions, each step that uses gh must set GH_TOKEN to a token with the required permissions/scopes. [1]
  • Grant/limit GITHUB_TOKEN permissions via the permissions: block. [2]
  • If you also need annotations (commonly via gh run watch), you may need checks: read in addition to actions: read. [3]
  • actions/checkout recommends contents: read. [4]

Sources: [1] [2] [3] [4]


🏁 Script executed:

# Verify if there's any job-level permission override in the summary job
sed -n '73,96p' .github/workflows/build-test-publish-wheel.yml | grep -i "permissions"

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 53


gh run view fails silently due to missing actions:read permission.

The workflow's permission block (lines 28–30) grants only id-token: write and contents: read. The build-test-publish-wheel-summary job has no job-level permission override and calls gh run view on line 86 with error masking (|| echo 0), which hides the authorization failure and causes FAILED_JOBS to be set to 0. The job then exits with status 0, incorrectly reporting success even when the check itself failed. Line 94's second gh run view call is also unprotected and will fail without actions:read permission.

Add permissions: {actions: read} to the job and set GH_TOKEN: ${{ github.token }} in the step to grant the necessary access. Replace the error-masking || echo 0 with strict failure handling to catch and report permission errors.

Suggested fix
  build-test-publish-wheel-summary:
    needs: [pre-flight, build-test-publish-wheel]
    if: |
      (
        needs.pre-flight.outputs.docs_only == 'true'
        || needs.pre-flight.outputs.is_deployment_workflow == 'true'
        || always()
      )
      && !cancelled()
+   permissions:
+     actions: read
+     contents: read
    runs-on: ubuntu-latest
    steps:
      - name: Result
+       env:
+         GH_TOKEN: ${{ github.token }}
        run: |
-         FAILED_JOBS=$(gh run view $GITHUB_RUN_ID --json jobs --jq '[.jobs[] | select(.status == "completed" and .conclusion != "success")] | length') || echo 0
+         FAILED_JOBS=$(gh run view $GITHUB_RUN_ID --json jobs --jq '[.jobs[] | select(.status == "completed" and .conclusion != "success")] | length') || { echo "gh run view failed"; exit 1; }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
build-test-publish-wheel-summary:
needs: [pre-flight, build-test-publish-wheel]
if: |
(
needs.pre-flight.outputs.docs_only == 'true'
|| needs.pre-flight.outputs.is_deployment_workflow == 'true'
|| always()
)
&& !cancelled()
runs-on: ubuntu-latest
steps:
- name: Result
run: |
FAILED_JOBS=$(gh run view $GITHUB_RUN_ID --json jobs --jq '[.jobs[] | select(.status == "completed" and .conclusion != "success")] | length') || echo 0
# if [ "${FAILED_JOBS:-0}" -eq 0 ] || [ "$SKIPPING_IS_ALLOWED" == "true" ]; then
# echo "✅ All previous jobs completed successfully"
# exit 0
# else
# echo "❌ Found $FAILED_JOBS failed job(s)"
# # Show which jobs failed
# gh run view $GITHUB_RUN_ID --json jobs --jq '.jobs[] | select(.status == "completed" and .conclusion != "success") | .name'
# exit 1
# fi
if [ "${FAILED_JOBS:-0}" -eq 0 ] || [ "$SKIPPING_IS_ALLOWED" == "true" ]; then
echo "✅ All previous jobs completed successfully"
exit 0
else
echo "❌ Found $FAILED_JOBS failed job(s)"
# Show which jobs failed
gh run view $GITHUB_RUN_ID --json jobs --jq '.jobs[] | select(.status == "completed" and .conclusion != "success") | .name'
exit 1
fi
build-test-publish-wheel-summary:
needs: [pre-flight, build-test-publish-wheel]
if: |
(
needs.pre-flight.outputs.docs_only == 'true'
|| needs.pre-flight.outputs.is_deployment_workflow == 'true'
|| always()
)
&& !cancelled()
permissions:
actions: read
contents: read
runs-on: ubuntu-latest
steps:
- name: Result
env:
GH_TOKEN: ${{ github.token }}
run: |
FAILED_JOBS=$(gh run view $GITHUB_RUN_ID --json jobs --jq '[.jobs[] | select(.status == "completed" and .conclusion != "success")] | length') || { echo "gh run view failed"; exit 1; }
if [ "${FAILED_JOBS:-0}" -eq 0 ] || [ "$SKIPPING_IS_ALLOWED" == "true" ]; then
echo "✅ All previous jobs completed successfully"
exit 0
else
echo "❌ Found $FAILED_JOBS failed job(s)"
# Show which jobs failed
gh run view $GITHUB_RUN_ID --json jobs --jq '.jobs[] | select(.status == "completed" and .conclusion != "success") | .name'
exit 1
fi
🤖 Prompt for AI Agents
In @.github/workflows/build-test-publish-wheel.yml around lines 72 - 95, The
summary job build-test-publish-wheel-summary is calling gh run view but lacks
actions:read permission and masks errors with "|| echo 0", causing false
success; update the job to include permissions: { actions: read } and set
GH_TOKEN: ${{ github.token }} in the Result step's environment, remove the "||
echo 0" error-masking so gh run view errors surface, and ensure both gh run view
invocations use the GH_TOKEN env so authorization failures are reported instead
of being treated as zero failed jobs.

ko3n1g added 17 commits February 4, 2026 19:52
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
thomasdhc
thomasdhc previously approved these changes Feb 4, 2026
Signed-off-by: oliver könig <okoenig@nvidia.com>
thomasdhc
thomasdhc previously approved these changes Feb 4, 2026
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

r0.3.0 Cherry-pick label for r0.3.0 release branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants