Skip to content

cp: [docs, model] Add Ministral 3 Examples (2139) into r0.3.0#2204

Merged
ko3n1g merged 1 commit intor0.3.0from
cherry-pick-2139-r0.3.0
Feb 5, 2026
Merged

cp: [docs, model] Add Ministral 3 Examples (2139) into r0.3.0#2204
ko3n1g merged 1 commit intor0.3.0from
cherry-pick-2139-r0.3.0

Conversation

@ko3n1g
Copy link
Copy Markdown
Contributor

@ko3n1g ko3n1g commented Feb 4, 2026

beep boop [🤖]: Hi @kamran-nvidia 👋,

we've cherry picked #2139 into  for you! 🚀

Please review and approve this cherry pick by your convenience!

Summary by CodeRabbit

Release Notes

  • Documentation

    • Added comprehensive Ministral 3 Vision Language Model documentation including architecture, setup, and usage guides.
  • New Features

    • Added Ministral 3 support with scripts for model conversion, inference, and fine-tuning workflows.
    • Improved model output handling for enhanced compatibility across different model output formats.

Signed-off-by: Kamran Jafari <kjafarisadeg@nvidia.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
@ko3n1g
Copy link
Copy Markdown
Contributor Author

ko3n1g commented Feb 4, 2026

/ok to test 1b50e3c

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Feb 4, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 4, 2026

📝 Walkthrough

Walkthrough

This PR introduces support for Ministral 3, a Vision Language Model, by adding comprehensive documentation, example scripts for model conversion, inference, and finetuning, alongside minor code updates to handle pooler outputs in the model forward pass and normalize VLM generation outputs.

Changes

Cohort / File(s) Summary
Documentation
docs/models/vlm/README.md, examples/models/vlm/ministral3/README.md
Adds Ministral 3 entry to VLM table and introduces comprehensive documentation covering model architecture, workspace setup, conversion workflows, inference, finetuning, evaluation, and HuggingFace model card references.
Model Implementation
src/megatron/bridge/models/ministral3/modeling_ministral3.py
Updates Ministral3Model.forward to access pooler_output attribute when processing image features from get_image_features().
Recipe/Module Exports
src/megatron/bridge/recipes/__init__.py
Exposes ministral3 recipes via wildcard import to public module namespace.
VLM Generation
examples/conversion/hf_to_megatron_generate_vlm.py
Normalizes vlm_forward_step output handling to support both tensor and tuple (tensor, loss) return types from models.
Example Scripts
examples/models/vlm/ministral3/{conversion.sh, inference.sh, peft.sh, sft.sh}
Adds bash scripts demonstrating checkpoint conversion between HuggingFace and Megatron formats, multi-run distributed inference experiments, LoRA finetuning with varying parallelism configs (TP/PP), and structured fine-tuning recipes.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

Suggested labels

r0.3.0

Suggested reviewers

  • cuichenx
🚥 Pre-merge checks | ✅ 2 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Results For Major Changes ⚠️ Warning PR contains major changes to core model code (VLM model support, image feature extraction, output normalization) but PR description lacks test results or testing documentation. Add test results, performance benchmarks, and regression testing information to PR description documenting numeric consistency and successful validation.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding Ministral 3 examples and documentation to the codebase, which aligns with the changeset containing new files for Ministral 3 model (README, conversion, inference, peft, and sft scripts) plus related code changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch cherry-pick-2139-r0.3.0

Important

Action Needed: IP Allowlist Update

If your organization protects your Git platform with IP whitelisting, please add the new CodeRabbit IP address to your allowlist:

  • 136.113.208.247/32 (new)
  • 34.170.211.100/32
  • 35.222.179.152/32

Reviews will stop working after February 8, 2026 if the new IP is not added to your allowlist.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
examples/conversion/hf_to_megatron_generate_vlm.py (2)

85-85: ⚠️ Potential issue | 🟡 Minor

Incorrect return type annotation.

The function returns (output_tensor, loss_func) which is a tuple, but the return type is annotated as -> torch.Tensor. This should be corrected to reflect the actual return type.

🛠️ Suggested fix
-def vlm_forward_step(data_iterator, model, **kwargs) -> torch.Tensor:
+def vlm_forward_step(data_iterator, model, **kwargs) -> tuple[torch.Tensor, callable]:

136-139: ⚠️ Potential issue | 🟡 Minor

Duplicate HTTP request is inefficient.

The code makes two separate HTTP requests to the same URL: one to check status (line 137) and another to actually load the image (line 139). This is wasteful and could cause issues with rate-limited endpoints or yield inconsistent results if the resource changes between requests.

🛠️ Suggested fix - single request
     if image_path.startswith(("http://", "https://")):
-        response = requests.get(image_path)
-        response.raise_for_status()
-        return Image.open(requests.get(image_path, stream=True).raw)
+        response = requests.get(image_path, stream=True)
+        response.raise_for_status()
+        return Image.open(response.raw)
     else:
🤖 Fix all issues with AI agents
In `@examples/conversion/hf_to_megatron_generate_vlm.py`:
- Around line 118-124: The tuple unpacking assumes model(**forward_args) returns
exactly two elements; change the handling of model_output so it defensively
extracts the first element when model_output is a tuple (e.g., use indexed
access with a bounds check) and otherwise uses the value directly, ensuring
output_tensor is assigned from model_output[0] when present and leaving
loss_func unchanged; update the block around model_output, output_tensor, and
loss_func to perform isinstance(model_output, tuple) and safe indexing rather
than unpacking into two variables.

In `@examples/models/vlm/ministral3/README.md`:
- Around line 104-127: The Markdown "Expected output" fenced code block is
missing a language tag which triggers MD040; update the fence in the "Expected
output:" example so it uses a language tag (e.g., change ``` to ```text) to
satisfy markdownlint. Locate the "Expected output:" block in the README example
for Ministral3 (the fenced block showing "Generation step ..." and "========
GENERATED TEXT OUTPUT ========") and add the language identifier after the
opening backticks.

In `@src/megatron/bridge/models/ministral3/modeling_ministral3.py`:
- Around line 226-231: The code assumes get_image_features() returns an object
with .pooler_output (transformers>=5), but installed transformers<5 returns a
tensor; update the handling in modeling_ministral3.py where image_features is
assigned (the get_image_features call) to accept both shapes: if the result has
attribute pooler_output use it, otherwise treat the result as the tensor
directly; then only call torch.cat when the tensor is actually a sequence/list
of tensors (detect with isinstance(..., (list, tuple)) or similar) and otherwise
move the single tensor to inputs_embeds.device/dtype without concatenation; this
touches the get_image_features call site and the image_features variable
handling.
🧹 Nitpick comments (3)
examples/conversion/hf_to_megatron_generate_vlm.py (1)

31-31: Consider using T | None syntax instead of Optional[T].

Per coding guidelines for Python 3.10+, prefer using str | None instead of Optional[str]. This import can be removed if the Optional usage in type hints is replaced.

♻️ Suggested change
-from typing import Optional

And update any type hints using Optional[X] to X | None (e.g., in process_image_inputs at line 144).

examples/models/vlm/ministral3/inference.sh (2)

1-17: Consider adding shell error handling options.

The script lacks set -e (or set -euo pipefail) which means it will continue executing even if a command fails. For an inference script where each run depends on successful execution, this could mask failures.

🛠️ Suggested addition after the copyright header
 # limitations under the License.

+set -euo pipefail
+
 # Workspace directory for checkpoints and results
 WORKSPACE=${WORKSPACE:-/workspace}

31-31: Quote variable expansions to handle paths with spaces.

The ${WORKSPACE} variable is used unquoted in paths. If the workspace path contains spaces, this will cause the script to fail.

🛠️ Suggested fix
-    --megatron_model_path ${WORKSPACE}/models/Ministral-3-3B-Instruct-2512-BF16/iter_0000000 \
+    --megatron_model_path "${WORKSPACE}/models/Ministral-3-3B-Instruct-2512-BF16/iter_0000000" \
-    --hf_model_path ${WORKSPACE}/models/Ministral-3-3B-Instruct-2512-BF16-hf-export \
+    --hf_model_path "${WORKSPACE}/models/Ministral-3-3B-Instruct-2512-BF16-hf-export" \

Also applies to: 40-40

Copy link
Copy Markdown
Contributor Author

@ko3n1g ko3n1g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kamran-nvidia Do we have failing tests we're fixing with this PR? If not, I would defer this to the patch release. If we need the docs for the major release, please split them out into a separate PR.

@ko3n1g ko3n1g marked this pull request as draft February 5, 2026 10:01
@kamran-nvidia
Copy link
Copy Markdown
Contributor

@ko3n1g codecov/patch is failing because we can't run tests on ministral3 model it requires transformer > 5.0

@ko3n1g ko3n1g marked this pull request as ready for review February 5, 2026 22:52
@ko3n1g ko3n1g merged commit f91a086 into r0.3.0 Feb 5, 2026
115 of 121 checks passed
@ko3n1g ko3n1g deleted the cherry-pick-2139-r0.3.0 branch February 5, 2026 22:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants