Skip to content

Conversation

@terrykong
Copy link
Contributor

@terrykong terrykong commented Sep 30, 2025

The smolvlm test was never able to get 200 steps in 3 hours. This lowers the step count and makes the final accuracy check an average

image

https://wandb.ai/nvidia/nemo-rl?nw=x2dvezg4z1l

After #1115 merged the smolvlm test was disabled, I ran the config to check if the metrics would have caught that issue, and it does:

                                 Metric Checks                                  
┏━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Status ┃ Check                  ┃ Value              ┃ Message               ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━┩
│ FAIL   │ data["train/loss"]["1… │ 0.4045526683330536 │ data["train/loss"]["… │
│        │ < 0.1                  │                    │ < 0.1 (condition      │
│        │                        │                    │ evaluated to False)   │
│ FAIL   │ mean(data["train/rewa… │ 0.5276562213897705 │ mean(data["train/rew… │
│        │ -6, -1) > 0.6          │                    │ -6, -1) > 0.6         │
│        │                        │                    │ (condition evaluated  │
│        │                        │                    │ to False)             │

Summary by CodeRabbit

  • Tests
    • Upgraded the VLM nightly test to a newer version in the suite.
    • Reduced step counts to shorten test execution time, improving CI throughput.
    • Adjusted success criteria to use an averaged reward near the end of training for more stable evaluations.
    • Updated tracked checkpoints to match the new step limit.
    • No user-facing changes; impact is faster, more reliable test validation.

@terrykong terrykong changed the title https://wandb.ai/nvidia/nemo-rl?nw=x2dvezg4z1l fix: lower steps in smolvlm nightly test Sep 30, 2025
@terrykong terrykong requested review from a team as code owners September 30, 2025 20:47
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 30, 2025

📝 Walkthrough

Walkthrough

Nightly test suite updated to reference a new VLM test script version. The new script reduces step counts, updates metric checkpoints to step 130, and changes the reward validation from a single-point threshold to a trailing-window mean check.

Changes

Cohort / File(s) Summary
Nightly suite manifest update
tests/test_suites/nightly.txt
Updated test entry to use vlm_grpo-smolvlm2-2.2b-instruct-clevr-1n2g-dtensor2tp1.v2.sh instead of .v1.sh.
VLM CLEVR training script v2
tests/test_suites/vlm/vlm_grpo-smolvlm2-2.2b-instruct-clevr-1n2g-dtensor2tp1.v2.sh
Reduced STEPS_PER_RUN and MAX_STEPS from 200 to 130; adjusted metric checks to step 130; replaced single-point reward threshold with mean over trailing slice at step 130 (>0.6).

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Nightly as Nightly Suite
  participant Script as VLM v2 Test Script
  participant Trainer as Training Job
  participant Metrics as Metrics Reader

  Nightly->>Script: Invoke test script
  Script->>Trainer: Launch training (MAX_STEPS=130)
  Trainer-->>Script: Emit logs/metrics
  Script->>Metrics: Parse metrics at step 130
  rect rgba(200,230,255,0.3)
    note right of Metrics: Changed check
    Metrics-->>Script: Compute mean(train/reward[130][-6:-1])
    Script-->>Nightly: Assert mean > 0.6 and other step-130 checks
  end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested labels

CI:L0

Suggested reviewers

  • guyueh1
  • chtruong814

Pre-merge checks and finishing touches

✅ Passed checks (4 passed)
Check name Status Explanation
Title Check ✅ Passed The title clearly specifies the core change—reducing the number of steps in the smolvlm nightly test—and uses concise language reflecting the main purpose of the pull request without extraneous details. It directly relates to the modifications described in the test scripts and PR description, ensuring readability and clarity for maintainers.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
Test Results For Major Changes ✅ Passed The PR modifies only nightly test assets by lowering step counts (200 to 130), switching a single-point reward check to a trailing-mean criterion, and updating the test suite reference; no model or training code is changed, so the impact is minor and confined to test configuration. The PR description provides rationale (3-hour ceiling) and includes an embedded chart plus a link to the related Weights & Biases project as evidence, which constitutes testing information. Given the limited scope and the presence of testing context, this satisfies the check’s requirement to document results for changes that could touch numerics in tests.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch tk/smolvlm-fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@terrykong terrykong requested a review from yfw October 1, 2025 05:54
@terrykong terrykong added the CI:L0 Run doctests and unit tests label Oct 1, 2025
fix

Signed-off-by: Terry Kong <[email protected]>

rename

Signed-off-by: Terry Kong <[email protected]>

fix

Signed-off-by: Terry Kong <[email protected]>

fix

Signed-off-by: Terry Kong <[email protected]>
Signed-off-by: Terry Kong <[email protected]>
@terrykong terrykong added CI:L0 Run doctests and unit tests and removed CI:L0 Run doctests and unit tests labels Oct 1, 2025
@terrykong terrykong merged commit 0ad4722 into main Oct 2, 2025
43 of 44 checks passed
@terrykong terrykong deleted the tk/smolvlm-fix branch October 2, 2025 06:49
PrinsYin pushed a commit to PrinsYin/RL that referenced this pull request Nov 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L0 Run doctests and unit tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants