fix: lower steps in smolvlm nightly test #1239

terrykong · 2025-09-30T20:44:25Z

The smolvlm test was never able to get 200 steps in 3 hours. This lowers the step count and makes the final accuracy check an average

https://wandb.ai/nvidia/nemo-rl?nw=x2dvezg4z1l

After #1115 merged the smolvlm test was disabled, I ran the config to check if the metrics would have caught that issue, and it does:

                                 Metric Checks                                  
┏━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Status ┃ Check                  ┃ Value              ┃ Message               ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━┩
│ FAIL   │ data["train/loss"]["1… │ 0.4045526683330536 │ data["train/loss"]["… │
│        │ < 0.1                  │                    │ < 0.1 (condition      │
│        │                        │                    │ evaluated to False)   │
│ FAIL   │ mean(data["train/rewa… │ 0.5276562213897705 │ mean(data["train/rew… │
│        │ -6, -1) > 0.6          │                    │ -6, -1) > 0.6         │
│        │                        │                    │ (condition evaluated  │
│        │                        │                    │ to False)             │

Summary by CodeRabbit

Tests
- Upgraded the VLM nightly test to a newer version in the suite.
- Reduced step counts to shorten test execution time, improving CI throughput.
- Adjusted success criteria to use an averaged reward near the end of training for more stable evaluations.
- Updated tracked checkpoints to match the new step limit.
- No user-facing changes; impact is faster, more reliable test validation.

coderabbitai · 2025-09-30T20:50:33Z

📝 Walkthrough

Walkthrough

Nightly test suite updated to reference a new VLM test script version. The new script reduces step counts, updates metric checkpoints to step 130, and changes the reward validation from a single-point threshold to a trailing-window mean check.

Changes

Cohort / File(s)	Summary
Nightly suite manifest update `tests/test_suites/nightly.txt`	Updated test entry to use `vlm_grpo-smolvlm2-2.2b-instruct-clevr-1n2g-dtensor2tp1.v2.sh` instead of `.v1.sh`.
VLM CLEVR training script v2 `tests/test_suites/vlm/vlm_grpo-smolvlm2-2.2b-instruct-clevr-1n2g-dtensor2tp1.v2.sh`	Reduced STEPS_PER_RUN and MAX_STEPS from 200 to 130; adjusted metric checks to step 130; replaced single-point reward threshold with mean over trailing slice at step 130 (>0.6).

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Nightly as Nightly Suite
  participant Script as VLM v2 Test Script
  participant Trainer as Training Job
  participant Metrics as Metrics Reader

  Nightly->>Script: Invoke test script
  Script->>Trainer: Launch training (MAX_STEPS=130)
  Trainer-->>Script: Emit logs/metrics
  Script->>Metrics: Parse metrics at step 130
  rect rgba(200,230,255,0.3)
    note right of Metrics: Changed check
    Metrics-->>Script: Compute mean(train/reward[130][-6:-1])
    Script-->>Nightly: Assert mean > 0.6 and other step-130 checks
  end

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested labels

CI:L0

Suggested reviewers

guyueh1
chtruong814

Pre-merge checks and finishing touches

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title clearly specifies the core change—reducing the number of steps in the smolvlm nightly test—and uses concise language reflecting the main purpose of the pull request without extraneous details. It directly relates to the modifications described in the test scripts and PR description, ensuring readability and clarity for maintainers.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.
Test Results For Major Changes	✅ Passed	The PR modifies only nightly test assets by lowering step counts (200 to 130), switching a single-point reward check to a trailing-mean criterion, and updating the test suite reference; no model or training code is changed, so the impact is minor and confined to test configuration. The PR description provides rationale (3-hour ceiling) and includes an embedded chart plus a link to the related Weights & Biases project as evidence, which constitutes testing information. Given the limited scope and the presence of testing context, this satisfies the check’s requirement to document results for changes that could touch numerics in tests.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch tk/smolvlm-fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

fix Signed-off-by: Terry Kong <[email protected]> rename Signed-off-by: Terry Kong <[email protected]> fix Signed-off-by: Terry Kong <[email protected]> fix Signed-off-by: Terry Kong <[email protected]>

Signed-off-by: Terry Kong <[email protected]>

terrykong changed the title ~~https://wandb.ai/nvidia/nemo-rl?nw=x2dvezg4z1l~~ fix: lower steps in smolvlm nightly test Sep 30, 2025

terrykong requested review from a team as code owners September 30, 2025 20:47

terrykong force-pushed the tk/smolvlm-fix branch from 695fe82 to dedd010 Compare September 30, 2025 20:50

terrykong requested a review from yfw October 1, 2025 05:54

terrykong added the CI:L0 Run doctests and unit tests label Oct 1, 2025

terrykong force-pushed the tk/smolvlm-fix branch from 0111fbe to 2a16654 Compare October 1, 2025 05:55

https://wandb.ai/nvidia/nemo-rl?nw=x2dvezg4z1l

d7d41a1

fix Signed-off-by: Terry Kong <[email protected]> rename Signed-off-by: Terry Kong <[email protected]> fix Signed-off-by: Terry Kong <[email protected]> fix Signed-off-by: Terry Kong <[email protected]>

terrykong force-pushed the tk/smolvlm-fix branch from 2a16654 to d7d41a1 Compare October 1, 2025 05:59

disabled

6d2a93e

Signed-off-by: Terry Kong <[email protected]>

terrykong added CI:L0 Run doctests and unit tests and removed CI:L0 Run doctests and unit tests labels Oct 1, 2025

terrykong temporarily deployed to nemo-ci October 1, 2025 17:27 — with GitHub Actions Inactive

yfw approved these changes Oct 1, 2025

View reviewed changes

terrykong merged commit 0ad4722 into main Oct 2, 2025
43 of 44 checks passed

terrykong deleted the tk/smolvlm-fix branch October 2, 2025 06:49

This was referenced Oct 3, 2025

fix: make sft dynamic batch step time check more stable #1265

Merged

fix: deepscaler-24k test reduce to 10 steps to safely finish in 4 hr #1280

Merged

PrinsYin pushed a commit to PrinsYin/RL that referenced this pull request Nov 30, 2025

fix: lower steps in smolvlm nightly test (NVIDIA-NeMo#1239)

348b9f9

Signed-off-by: Terry Kong <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: lower steps in smolvlm nightly test #1239

fix: lower steps in smolvlm nightly test #1239

Uh oh!

terrykong commented Sep 30, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Sep 30, 2025 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested labels

Suggested reviewers

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: lower steps in smolvlm nightly test #1239

fix: lower steps in smolvlm nightly test #1239

Uh oh!

Conversation

terrykong commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

After #1115 merged the smolvlm test was disabled, I ran the config to check if the metrics would have caught that issue, and it does:

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested labels

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

terrykong commented Sep 30, 2025 •

edited

Loading

coderabbitai bot commented Sep 30, 2025 •

edited

Loading