Skip to content

fix flaky tests; should be using train loss from final step rather than final avg train loss#3532

Closed
winglian wants to merge 3 commits into
mainfrom
tensorboard-loss-check
Closed

fix flaky tests; should be using train loss from final step rather than final avg train loss#3532
winglian wants to merge 3 commits into
mainfrom
tensorboard-loss-check

Conversation

@winglian

@winglian winglian commented Mar 22, 2026

Copy link
Copy Markdown
Collaborator

Summary by CodeRabbit

Tests

  • Updated training loss metric validation across comprehensive end-to-end tests spanning multi-GPU, single GPU, and specialized training configurations to reflect current monitoring formats.

@coderabbitai

coderabbitai Bot commented Mar 22, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 35589696-1394-49c9-a751-aad120d2d064

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR updates TensorBoard metric assertions across 21 E2E test files, changing the expected scalar tag from train/train_loss to train/loss while preserving all other test logic, thresholds, and directory configurations.

Changes

Cohort / File(s) Summary
Multi-GPU FSDP and Distributed Tests
tests/e2e/multigpu/test_fsdp1.py, tests/e2e/multigpu/test_fsdp2.py, tests/e2e/multigpu/test_dist_muon_fsdp2.py, tests/e2e/multigpu/test_fp8_fsdp2.py
Updated TensorBoard scalar tag filtering in verify_training_success function from train/train_loss to train/loss for loss validation assertions.
Multi-GPU Framework-Specific Tests
tests/e2e/multigpu/test_llama.py, tests/e2e/multigpu/test_ray.py, tests/e2e/multigpu/test_gemma3.py, tests/e2e/multigpu/test_tp.py
Updated check_tensorboard calls to validate the metric key train/loss instead of train/train_loss across multiple test cases (LoRA DDP, FSDP variants, DeepSpeed, and FFT SFT tests).
Multi-GPU Patched and Solo Variants
tests/e2e/multigpu/patched/test_sp.py, tests/e2e/multigpu/solo/test_flex.py
Updated TensorBoard scalar tag assertions from train/train_loss to train/loss in E2E tests.
Single-GPU and Root E2E Tests
tests/e2e/patched/test_fa_xentropy.py, tests/e2e/patched/test_flattening.py, tests/e2e/patched/test_unsloth_qlora.py, tests/e2e/solo/test_flex.py, tests/e2e/test_embeddings_lr.py, tests/e2e/test_llama_pretrain.py, tests/e2e/test_packing_loss.py, tests/e2e/test_process_reward_model_smollm2.py, tests/e2e/test_qat.py, tests/e2e/test_reward_model_smollm2.py, tests/e2e/test_streaming.py
Updated check_tensorboard metric key from train/train_loss to train/loss in various training test scenarios including embeddings, pretraining, packing, QAT, reward models, and streaming datasets.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • SalmanMohammadi
  • NanoCode012
  • djsaunde
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.22% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately describes the main change: updating test assertions to check the TensorBoard metric 'train/loss' instead of 'train/train_loss', which aligns with the stated objective of using training loss from the final step rather than final average training loss.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch tensorboard-loss-check

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov

codecov Bot commented Mar 22, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant