-
Notifications
You must be signed in to change notification settings - Fork 243
cp: ci(fix): Wheel build (2192) into r0.3.0
#2238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -11,7 +11,6 @@ | |
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| name: Build, test, and publish a PyPi wheel (to testpypi). | ||
|
|
||
| on: | ||
|
|
@@ -35,55 +34,62 @@ concurrency: | |
|
|
||
| jobs: | ||
| pre-flight: | ||
| uses: NVIDIA-NeMo/FW-CI-templates/.github/workflows/_cicd_preflight.yml@v0.64.2 | ||
| uses: NVIDIA-NeMo/FW-CI-templates/.github/workflows/_cicd_preflight.yml@v0.69.1 | ||
| with: | ||
| default_runner_prefix: ${{ vars.DEFAULT_RUNNER_PREFIX }} | ||
| non_nvidia_runner_prefix: ${{ vars.NON_NVIDIA_RUNNER_PREFIX }} | ||
| default_test_data_path: ${{ vars.DEFAULT_TEST_DATA_PATH }} | ||
| non_nvidia_test_data_path: ${{ vars.NON_NVIDIA_TEST_DATA_PATH }} | ||
| secrets: | ||
| NVIDIA_MANAGEMENT_ORG_PAT: ${{ secrets.NVIDIA_MANAGEMENT_ORG_PAT }} | ||
|
|
||
| # build-test-publish-wheel: | ||
| # needs: [pre-flight] | ||
| # if: | | ||
| # !(needs.pre-flight.outputs.docs_only == 'true' | ||
| # || needs.pre-flight.outputs.is_deployment_workflow == 'true') | ||
| # uses: NVIDIA-NeMo/FW-CI-templates/.github/workflows/_build_test_publish_wheel.yml@v0.65.1 | ||
| # with: | ||
| # dry-run: true | ||
| # python-package: megatron.bridge | ||
| # python-version: "3.10" | ||
| # packaging: uv | ||
| # no-publish: ${{ !(github.ref == 'refs/heads/main' || startsWith(github.ref, 'refs/heads/r')) }} | ||
| # has-src-dir: true | ||
| # skip-test-wheel: true | ||
| # custom-container: nvcr.io/nvidia/pytorch:25.05-py3 | ||
| # runner: self-hosted-nemo | ||
| # no-build-isolation: true | ||
| # submodules: recursive | ||
| # container-options: "--gpus all --runtime=nvidia" | ||
| # secrets: | ||
| # TWINE_USERNAME: ${{ secrets.TWINE_USERNAME }} | ||
| # TWINE_PASSWORD: ${{ secrets.TWINE_PASSWORD }} | ||
| # SLACK_WEBHOOK: ${{ secrets.SLACK_RELEASE_ENDPOINT }} | ||
| # SLACK_WEBHOOK_ADMIN: ${{ secrets.SLACK_WEBHOOK_ADMIN }} | ||
| # GH_TOKEN: ${{ secrets.PAT }} | ||
| build-test-publish-wheel: | ||
| needs: [pre-flight] | ||
| if: | | ||
| !(needs.pre-flight.outputs.docs_only == 'true' | ||
| || needs.pre-flight.outputs.is_deployment_workflow == 'true') | ||
| uses: NVIDIA-NeMo/FW-CI-templates/.github/workflows/_build_test_publish_wheel.yml@v0.70.1 | ||
| with: | ||
| dry-run: true | ||
| python-package: megatron.bridge | ||
| python-version: "3.10" | ||
| packaging: uv | ||
| no-publish: ${{ !(github.ref == 'refs/heads/main' || startsWith(github.ref, 'refs/heads/r')) }} | ||
| has-src-dir: true | ||
| skip-test-wheel: true | ||
| custom-container: nvcr.io/nvidia/pytorch:25.11-py3 | ||
| runner: ${{ needs.pre-flight.outputs.runner_prefix }}-gpu-x2-container | ||
| no-build-isolation: true | ||
| submodules: recursive | ||
| container-options: "--gpus all --runtime=nvidia" | ||
| secrets: | ||
| TWINE_USERNAME: ${{ secrets.TWINE_USERNAME }} | ||
| TWINE_PASSWORD: ${{ secrets.TWINE_PASSWORD }} | ||
| SLACK_WEBHOOK: ${{ secrets.SLACK_RELEASE_ENDPOINT }} | ||
| SLACK_WEBHOOK_ADMIN: ${{ secrets.SLACK_WEBHOOK_ADMIN }} | ||
| GH_TOKEN: ${{ secrets.PAT }} | ||
|
|
||
| # build-test-publish-wheel-summary: | ||
| # needs: [pre-flight, build-test-publish-wheel] | ||
| # if: | | ||
| # ( | ||
| # needs.pre-flight.outputs.docs_only == 'true' | ||
| # || needs.pre-flight.outputs.is_deployment_workflow == 'true' | ||
| # || always() | ||
| # ) | ||
| # && !cancelled() | ||
| # runs-on: ubuntu-latest | ||
| # steps: | ||
| # - name: Result | ||
| # run: | | ||
| # FAILED_JOBS=$(gh run view $GITHUB_RUN_ID --json jobs --jq '[.jobs[] | select(.status == "completed" and .conclusion != "success")] | length') || echo 0 | ||
| build-test-publish-wheel-summary: | ||
| needs: [pre-flight, build-test-publish-wheel] | ||
| if: | | ||
| ( | ||
| needs.pre-flight.outputs.docs_only == 'true' | ||
| || needs.pre-flight.outputs.is_deployment_workflow == 'true' | ||
| || always() | ||
| ) | ||
| && !cancelled() | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - name: Result | ||
| run: | | ||
| FAILED_JOBS=$(gh run view $GITHUB_RUN_ID --json jobs --jq '[.jobs[] | select(.status == "completed" and .conclusion != "success")] | length') || echo 0 | ||
|
|
||
| # if [ "${FAILED_JOBS:-0}" -eq 0 ] || [ "$SKIPPING_IS_ALLOWED" == "true" ]; then | ||
| # echo "✅ All previous jobs completed successfully" | ||
| # exit 0 | ||
| # else | ||
| # echo "❌ Found $FAILED_JOBS failed job(s)" | ||
| # # Show which jobs failed | ||
| # gh run view $GITHUB_RUN_ID --json jobs --jq '.jobs[] | select(.status == "completed" and .conclusion != "success") | .name' | ||
| # exit 1 | ||
| # fi | ||
| if [ "${FAILED_JOBS:-0}" -eq 0 ] || [ "$SKIPPING_IS_ALLOWED" == "true" ]; then | ||
| echo "✅ All previous jobs completed successfully" | ||
| exit 0 | ||
| else | ||
| echo "❌ Found $FAILED_JOBS failed job(s)" | ||
| # Show which jobs failed | ||
| gh run view $GITHUB_RUN_ID --json jobs --jq '.jobs[] | select(.status == "completed" and .conclusion != "success") | .name' | ||
| exit 1 | ||
| fi | ||
|
Comment on lines
+83
to
+95
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: cat -n .github/workflows/build-test-publish-wheel.yml | head -100Repository: NVIDIA-NeMo/Megatron-Bridge Length of output: 4295 🏁 Script executed: # Search for SKIPPING_IS_ALLOWED and GH_TOKEN/GITHUB_TOKEN usage throughout the workflow
echo "=== Searching for SKIPPING_IS_ALLOWED ==="
grep -n "SKIPPING_IS_ALLOWED" .github/workflows/build-test-publish-wheel.yml
echo -e "\n=== Searching for GH_TOKEN or GITHUB_TOKEN ==="
grep -n "GH_TOKEN\|GITHUB_TOKEN" .github/workflows/build-test-publish-wheel.yml
echo -e "\n=== Checking if there's a global env block ==="
head -40 .github/workflows/build-test-publish-wheel.yml | grep -A 5 "^env:"Repository: NVIDIA-NeMo/Megatron-Bridge Length of output: 338 🏁 Script executed: # Check if there are any other workflow files that might define SKIPPING_IS_ALLOWED
find .github/workflows -name "*.yml" -type f -exec grep -l "SKIPPING_IS_ALLOWED" {} \;Repository: NVIDIA-NeMo/Megatron-Bridge Length of output: 255 🏁 Script executed: # Check other workflow files for SKIPPING_IS_ALLOWED usage
echo "=== Searching all workflow files for SKIPPING_IS_ALLOWED ==="
grep -n "SKIPPING_IS_ALLOWED" .github/workflows/*.ymlRepository: NVIDIA-NeMo/Megatron-Bridge Length of output: 1374 🏁 Script executed: # Check if we can find documentation or examples about FW-CI-templates outputs
grep -rn "docs_only\|is_deployment_workflow" .github/workflows/Repository: NVIDIA-NeMo/Megatron-Bridge Length of output: 3200 🏁 Script executed: # Check the copyright-check.yml Result step
sed -n '39,60p' .github/workflows/copyright-check.ymlRepository: NVIDIA-NeMo/Megatron-Bridge Length of output: 935 🏁 Script executed: # Check if build-docs.yml has env block with GH_TOKEN for the Result step
sed -n '50,65p' .github/workflows/build-docs.ymlRepository: NVIDIA-NeMo/Megatron-Bridge Length of output: 940 🏁 Script executed: # Let me verify that github.token is the correct approach vs secrets.PAT
# Check what build-docs uses and if there's any documentation
echo "=== Checking build-docs.yml Result step ==="
sed -n '48,65p' .github/workflows/build-docs.yml
echo -e "\n=== Also checking if copyright-check.yml has the same issue ==="
sed -n '39,60p' .github/workflows/copyright-check.yml | head -25Repository: NVIDIA-NeMo/Megatron-Bridge Length of output: 1978 Add missing environment variables to the Result step. The Result step uses The pattern is correctly implemented in other workflows (e.g., Fix: Add env block with GH_TOKEN and SKIPPING_IS_ALLOWED - name: Result
+ env:
+ GH_TOKEN: ${{ github.token }}
+ SKIPPING_IS_ALLOWED: ${{ needs.pre-flight.outputs.docs_only == 'true' || needs.pre-flight.outputs.is_deployment_workflow == 'true' }}
run: |
FAILED_JOBS=$(gh run view $GITHUB_RUN_ID --json jobs --jq '[.jobs[] | select(.status == "completed" and .conclusion != "success")] | length') || echo 0
if [ "${FAILED_JOBS:-0}" -eq 0 ] || [ "$SKIPPING_IS_ALLOWED" == "true" ]; then🤖 Prompt for AI Agents |
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add fallback handling for
gh run viewfailure.The command uses
|| echo 0but this doesn't properly assign 0 toFAILED_JOBSon failure. The current syntax would echo "0" to stdout butFAILED_JOBSremains unset.🐛 Proposed fix for proper fallback assignment
📝 Committable suggestion
🤖 Prompt for AI Agents