Skip to content

Add Nemotron 3 to tests via tiny model#5278

Closed
sergiopaniego wants to merge 21 commits into
mainfrom
nemotron3-tiny-tests
Closed

Add Nemotron 3 to tests via tiny model#5278
sergiopaniego wants to merge 21 commits into
mainfrom
nemotron3-tiny-tests

Conversation

@sergiopaniego

@sergiopaniego sergiopaniego commented Mar 12, 2026

Copy link
Copy Markdown
Member

What does this PR do?

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

@qgallouedec @albertvillanova


Note

Medium Risk
Primarily affects test coverage and internal tiny-model generation, but introduces a new architecture path (NemotronH) and tightens minimum transformers version expectations, which could cause CI/runtime mismatches if versions diverge.

Overview
Adds a new trl-internal-testing/tiny-NemotronHForCausalLM tiny model generator entry (hybrid Mamba+attention) so unit tests can exercise NemotronH/Nemotron 3–style models, including mirroring the model’s float32-only Mamba parameters.

Wires this tiny model into existing tokenizer/data-utils and trainer test parametrizations (SFTTrainer/DPOTrainer), with skipif(transformers<5.7.0) guards due to NemotronH gradient-checkpointing requirements. Also updates the Nemotron 3 SFT example to require transformers>=5.7.0 and removes the forced disabling of gradient checkpointing.

Reviewed by Cursor Bugbot for commit b12633b. Bugbot is set up for automated code reviews on this repo. Configure here.

@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@albertvillanova albertvillanova left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

The CI is red:

  FAILED tests/test_dpo_trainer.py::TestDPOTrainer::test_train[trl-internal-testing/tiny-NemotronHForCausalLM] - RuntimeError: causal_conv1d with channel last layout requires strides (x.stride(0) and x.stride(2)) to be multiples of 8
  FAILED tests/test_sft_trainer.py::TestSFTTrainer::test_train[trl-internal-testing/tiny-NemotronHForCausalLM] - RuntimeError: causal_conv1d with channel last layout requires strides (x.stride(0) and x.stride(2)) to be multiples of 8

Comment thread tests/test_sft_trainer.py Outdated

@qgallouedec qgallouedec left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!! just a few comments

Comment thread scripts/generate_tiny_models.py
Comment thread tests/test_dpo_trainer.py Outdated
kwargs = {}
if "NemotronH" in model_id:
kwargs["gradient_checkpointing"] = False
kwargs["use_cpu"] = True

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really not sure about this. we don't train on cpu, so why testing it + we wouldn't know it a gpu-specific issue is introduced

@qgallouedec

Copy link
Copy Markdown
Member

Thanks!

The CI is red:

  FAILED tests/test_dpo_trainer.py::TestDPOTrainer::test_train[trl-internal-testing/tiny-NemotronHForCausalLM] - RuntimeError: causal_conv1d with channel last layout requires strides (x.stride(0) and x.stride(2)) to be multiples of 8
  FAILED tests/test_sft_trainer.py::TestSFTTrainer::test_train[trl-internal-testing/tiny-NemotronHForCausalLM] - RuntimeError: causal_conv1d with channel last layout requires strides (x.stride(0) and x.stride(2)) to be multiples of 8

is it possible that this error originates from what params are used to build the model?

Comment thread scripts/generate_tiny_models.py
Comment thread tests/test_sft_trainer.py Outdated
@sergiopaniego

Copy link
Copy Markdown
Member Author

Fixes applied. There is an issue with some dependencies that needs to be addressed in transformers.
I've created the PR there: huggingface/transformers#44853.

@albertvillanova

Copy link
Copy Markdown
Member

Hi @sergiopaniego, is there any update on this or the corresponding upstream PR?

@albertvillanova

Copy link
Copy Markdown
Member

Upstream PR has just been merged!

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 0423cdb. Configure here.

Comment thread scripts/generate_tiny_models.py Outdated
@sergiopaniego

Copy link
Copy Markdown
Member Author

@albertvillanova Upstream PR has just been merged!

yes! It could be approved and merged if tests are green 😄

@sergiopaniego

Copy link
Copy Markdown
Member Author

failed test is unrelated

@sergiopaniego

Copy link
Copy Markdown
Member Author

Gradient checkpointing PR in transformers (huggingface/transformers#45625) is now merged so I've update the version here to the next transformers one (5.7.0). It should be now finally fixed once that version is released 😄 Tests are green

cc @albertvillanova

@qgallouedec

Copy link
Copy Markdown
Member

thanks! I think I'll wait for #5637 to be merged or closed before merging this one :)

@sergiopaniego

Copy link
Copy Markdown
Member Author

makes sense, ping me so I can update this PR once #5637 is merged

@sergiopaniego

Copy link
Copy Markdown
Member Author

Closed in favor of #5944

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants