fix flaky tests; should be using train loss from final step rather than final avg train loss by winglian · Pull Request #3532 · axolotl-ai-cloud/axolotl

winglian · 2026-03-22T14:39:05Z

Summary by CodeRabbit

Tests

Updated training loss metric validation across comprehensive end-to-end tests spanning multi-GPU, single GPU, and specialized training configurations to reflect current monitoring formats.

…an final avg train loss

coderabbitai · 2026-03-22T14:39:19Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 35589696-1394-49c9-a751-aad120d2d064

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR updates TensorBoard metric assertions across 21 E2E test files, changing the expected scalar tag from train/train_loss to train/loss while preserving all other test logic, thresholds, and directory configurations.

Changes

Cohort / File(s)	Summary
Multi-GPU FSDP and Distributed Tests `tests/e2e/multigpu/test_fsdp1.py`, `tests/e2e/multigpu/test_fsdp2.py`, `tests/e2e/multigpu/test_dist_muon_fsdp2.py`, `tests/e2e/multigpu/test_fp8_fsdp2.py`	Updated TensorBoard scalar tag filtering in `verify_training_success` function from `train/train_loss` to `train/loss` for loss validation assertions.
Multi-GPU Framework-Specific Tests `tests/e2e/multigpu/test_llama.py`, `tests/e2e/multigpu/test_ray.py`, `tests/e2e/multigpu/test_gemma3.py`, `tests/e2e/multigpu/test_tp.py`	Updated `check_tensorboard` calls to validate the metric key `train/loss` instead of `train/train_loss` across multiple test cases (LoRA DDP, FSDP variants, DeepSpeed, and FFT SFT tests).
Multi-GPU Patched and Solo Variants `tests/e2e/multigpu/patched/test_sp.py`, `tests/e2e/multigpu/solo/test_flex.py`	Updated TensorBoard scalar tag assertions from `train/train_loss` to `train/loss` in E2E tests.
Single-GPU and Root E2E Tests `tests/e2e/patched/test_fa_xentropy.py`, `tests/e2e/patched/test_flattening.py`, `tests/e2e/patched/test_unsloth_qlora.py`, `tests/e2e/solo/test_flex.py`, `tests/e2e/test_embeddings_lr.py`, `tests/e2e/test_llama_pretrain.py`, `tests/e2e/test_packing_loss.py`, `tests/e2e/test_process_reward_model_smollm2.py`, `tests/e2e/test_qat.py`, `tests/e2e/test_reward_model_smollm2.py`, `tests/e2e/test_streaming.py`	Updated `check_tensorboard` metric key from `train/train_loss` to `train/loss` in various training test scenarios including embeddings, pretraining, packing, QAT, reward models, and streaming datasets.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

update loss value for flakey e2e test #2786: Updates the same E2E Llama pretrain test's TensorBoard loss assertion alongside this PR's scalar tag change.
upgrade trl==0.19.1 #2892: Modifies the same tests/e2e/multigpu/test_llama.py file with related TensorBoard loss threshold adjustments.
basic torchao fp8 mixed precision training #2926: Introduces or updates tests/e2e/multigpu/test_fp8_fsdp2.py which receives the same TensorBoard scalar tag change in this PR.

Suggested reviewers

SalmanMohammadi
NanoCode012
djsaunde

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 16.22% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title accurately describes the main change: updating test assertions to check the TensorBoard metric 'train/loss' instead of 'train/train_loss', which aligns with the stated objective of using training loss from the final step rather than final average training loss.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch tensorboard-loss-check

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-03-22T14:56:05Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

fix flaky tests; should be using train loss from final step rather th…

6130e40

…an final avg train loss

winglian added 2 commits March 22, 2026 11:09

retry and more info on download failure

a967339

use train_loss for sp test

598c965

winglian mentioned this pull request Mar 22, 2026

increase rtol, codecov informational only, don't silently fail errors w curl #3534

Merged

winglian closed this Mar 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix flaky tests; should be using train loss from final step rather than final avg train loss#3532

fix flaky tests; should be using train loss from final step rather than final avg train loss#3532
winglian wants to merge 3 commits into
mainfrom
tensorboard-loss-check

winglian commented Mar 22, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Mar 22, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

winglian commented Mar 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Tests

Uh oh!

coderabbitai Bot commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented Mar 22, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

winglian commented Mar 22, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 22, 2026 •

edited

Loading