Skip to content

ci: Temp disable megatron lora grpo tests#2062

Open
chtruong814 wants to merge 1 commit intomainfrom
chtruong/disable-grpo-lora-test
Open

ci: Temp disable megatron lora grpo tests#2062
chtruong814 wants to merge 1 commit intomainfrom
chtruong/disable-grpo-lora-test

Conversation

@chtruong814
Copy link
Copy Markdown
Contributor

@chtruong814 chtruong814 commented Mar 4, 2026

What does this PR do ?

Temp disable megatron lora grpo tests

These tests were failing and a fix is being worked on. In the mean time, let's skip the tests.
#1889

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

Summary by CodeRabbit

  • Tests
    • Disabled GPU functional tests for grpo_megatron_lora and grpo_megatron_lora_async configurations in the L1 functional test suite.

Signed-off-by: Charlie Truong <chtruong@nvidia.com>
@chtruong814 chtruong814 requested a review from a team as a code owner March 4, 2026 17:51
@chtruong814 chtruong814 added CI:L1 Run doctests, unit tests, and functional tests CI:docs Run doctest and removed CI:L1 Run doctests, unit tests, and functional tests labels Mar 4, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 4, 2026

📝 Walkthrough

Walkthrough

This PR disables two GPU functional tests by commenting out their execution commands in a test script. Changes are minimal and affect only test invocations with no functional logic modifications.

Changes

Cohort / File(s) Summary
Test Disabling
tests/functional/L1_Functional_Tests_GPU.sh
Commented out execution of grpo_megatron_lora.sh and grpo_megatron_lora_async.sh tests.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Possibly related PRs

Suggested labels

CI:Lfast

Suggested reviewers

  • yuki-97
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: temporarily disabling megatron lora grpo tests in the CI pipeline.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Test Results For Major Changes ✅ Passed PR contains only minor CI configuration changes—temporarily disabling test invocations due to known failures being addressed separately. No feature additions, breaking changes, or algorithmic modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch chtruong/disable-grpo-lora-test

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/functional/L1_Functional_Tests_GPU.sh`:
- Around line 55-56: Restore the two commented-out run_test lines for
./tests/functional/grpo_megatron_lora.sh and
./tests/functional/grpo_megatron_lora_async.sh and wrap them with an explicit
gate (e.g., check SKIP_L1_GRPO_MEGATRON_LORA environment variable or a
--skip-l1-grpo flag) so the tests remain in CI but can be skipped; when
skipping, emit a clear log message that includes the tracking PR/issue
identifier (e.g., "Skipping L1 GRPO+Megatron+LoRA tests per PR#1234/issue#5678")
so re-enable criteria are visible. Ensure the gate logic is placed where the
original run_test invocations were removed and that the skip variable defaults
to false so coverage runs unless explicitly set.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 26743456-3b95-440f-878a-c2ee6df553d1

📥 Commits

Reviewing files that changed from the base of the PR and between bd7f236 and d0b5daf.

📒 Files selected for processing (1)
  • tests/functional/L1_Functional_Tests_GPU.sh

Comment on lines +55 to +56
# run_test uv run --no-sync bash ./tests/functional/grpo_megatron_lora.sh
# run_test uv run --no-sync bash ./tests/functional/grpo_megatron_lora_async.sh
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid silently removing unique CI coverage; gate these tests with an explicit temporary switch.

Line 55 and Line 56 currently remove the only L1 coverage for GRPO+Megatron+LoRA (including async), which means these paths won’t run in the main CI flow (.github/workflows/cicd-main.yml:310-327, plus main/scheduled L1 behavior in lines 100-120 of that workflow). Please keep these invocations in place behind an explicit skip flag and log the tracking PR/issue so re-enable criteria stays visible.

Proposed change
-# run_test      uv run --no-sync bash ./tests/functional/grpo_megatron_lora.sh
-# run_test      uv run --no-sync bash ./tests/functional/grpo_megatron_lora_async.sh
+if [[ "${DISABLE_GRPO_MEGATRON_LORA_TESTS:-1}" == "1" ]]; then
+    echo "Temporarily skipping GRPO Megatron LoRA tests (tracking fix: PR `#1889`)"
+else
+    run_test      uv run --no-sync bash ./tests/functional/grpo_megatron_lora.sh
+    run_test      uv run --no-sync bash ./tests/functional/grpo_megatron_lora_async.sh
+fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# run_test uv run --no-sync bash ./tests/functional/grpo_megatron_lora.sh
# run_test uv run --no-sync bash ./tests/functional/grpo_megatron_lora_async.sh
if [[ "${DISABLE_GRPO_MEGATRON_LORA_TESTS:-1}" == "1" ]]; then
echo "Temporarily skipping GRPO Megatron LoRA tests (tracking fix: PR `#1889`)"
else
run_test uv run --no-sync bash ./tests/functional/grpo_megatron_lora.sh
run_test uv run --no-sync bash ./tests/functional/grpo_megatron_lora_async.sh
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/functional/L1_Functional_Tests_GPU.sh` around lines 55 - 56, Restore
the two commented-out run_test lines for
./tests/functional/grpo_megatron_lora.sh and
./tests/functional/grpo_megatron_lora_async.sh and wrap them with an explicit
gate (e.g., check SKIP_L1_GRPO_MEGATRON_LORA environment variable or a
--skip-l1-grpo flag) so the tests remain in CI but can be skipped; when
skipping, emit a clear log message that includes the tracking PR/issue
identifier (e.g., "Skipping L1 GRPO+Megatron+LoRA tests per PR#1234/issue#5678")
so re-enable criteria are visible. Ensure the gate logic is placed where the
original run_test invocations were removed and that the skip variable defaults
to false so coverage runs unless explicitly set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:docs Run doctest

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant