cp: [docs, model] Add Ministral 3 Examples (2139) into r0.3.0#2204
cp: [docs, model] Add Ministral 3 Examples (2139) into r0.3.0#2204
[docs, model] Add Ministral 3 Examples (2139) into r0.3.0#2204Conversation
Signed-off-by: Kamran Jafari <kjafarisadeg@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
|
/ok to test 1b50e3c |
📝 WalkthroughWalkthroughThis PR introduces support for Ministral 3, a Vision Language Model, by adding comprehensive documentation, example scripts for model conversion, inference, and finetuning, alongside minor code updates to handle pooler outputs in the model forward pass and normalize VLM generation outputs. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Important Action Needed: IP Allowlist UpdateIf your organization protects your Git platform with IP whitelisting, please add the new CodeRabbit IP address to your allowlist:
Reviews will stop working after February 8, 2026 if the new IP is not added to your allowlist. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
examples/conversion/hf_to_megatron_generate_vlm.py (2)
85-85:⚠️ Potential issue | 🟡 MinorIncorrect return type annotation.
The function returns
(output_tensor, loss_func)which is a tuple, but the return type is annotated as-> torch.Tensor. This should be corrected to reflect the actual return type.🛠️ Suggested fix
-def vlm_forward_step(data_iterator, model, **kwargs) -> torch.Tensor: +def vlm_forward_step(data_iterator, model, **kwargs) -> tuple[torch.Tensor, callable]:
136-139:⚠️ Potential issue | 🟡 MinorDuplicate HTTP request is inefficient.
The code makes two separate HTTP requests to the same URL: one to check status (line 137) and another to actually load the image (line 139). This is wasteful and could cause issues with rate-limited endpoints or yield inconsistent results if the resource changes between requests.
🛠️ Suggested fix - single request
if image_path.startswith(("http://", "https://")): - response = requests.get(image_path) - response.raise_for_status() - return Image.open(requests.get(image_path, stream=True).raw) + response = requests.get(image_path, stream=True) + response.raise_for_status() + return Image.open(response.raw) else:
🤖 Fix all issues with AI agents
In `@examples/conversion/hf_to_megatron_generate_vlm.py`:
- Around line 118-124: The tuple unpacking assumes model(**forward_args) returns
exactly two elements; change the handling of model_output so it defensively
extracts the first element when model_output is a tuple (e.g., use indexed
access with a bounds check) and otherwise uses the value directly, ensuring
output_tensor is assigned from model_output[0] when present and leaving
loss_func unchanged; update the block around model_output, output_tensor, and
loss_func to perform isinstance(model_output, tuple) and safe indexing rather
than unpacking into two variables.
In `@examples/models/vlm/ministral3/README.md`:
- Around line 104-127: The Markdown "Expected output" fenced code block is
missing a language tag which triggers MD040; update the fence in the "Expected
output:" example so it uses a language tag (e.g., change ``` to ```text) to
satisfy markdownlint. Locate the "Expected output:" block in the README example
for Ministral3 (the fenced block showing "Generation step ..." and "========
GENERATED TEXT OUTPUT ========") and add the language identifier after the
opening backticks.
In `@src/megatron/bridge/models/ministral3/modeling_ministral3.py`:
- Around line 226-231: The code assumes get_image_features() returns an object
with .pooler_output (transformers>=5), but installed transformers<5 returns a
tensor; update the handling in modeling_ministral3.py where image_features is
assigned (the get_image_features call) to accept both shapes: if the result has
attribute pooler_output use it, otherwise treat the result as the tensor
directly; then only call torch.cat when the tensor is actually a sequence/list
of tensors (detect with isinstance(..., (list, tuple)) or similar) and otherwise
move the single tensor to inputs_embeds.device/dtype without concatenation; this
touches the get_image_features call site and the image_features variable
handling.
🧹 Nitpick comments (3)
examples/conversion/hf_to_megatron_generate_vlm.py (1)
31-31: Consider usingT | Nonesyntax instead ofOptional[T].Per coding guidelines for Python 3.10+, prefer using
str | Noneinstead ofOptional[str]. This import can be removed if theOptionalusage in type hints is replaced.♻️ Suggested change
-from typing import OptionalAnd update any type hints using
Optional[X]toX | None(e.g., inprocess_image_inputsat line 144).examples/models/vlm/ministral3/inference.sh (2)
1-17: Consider adding shell error handling options.The script lacks
set -e(orset -euo pipefail) which means it will continue executing even if a command fails. For an inference script where each run depends on successful execution, this could mask failures.🛠️ Suggested addition after the copyright header
# limitations under the License. +set -euo pipefail + # Workspace directory for checkpoints and results WORKSPACE=${WORKSPACE:-/workspace}
31-31: Quote variable expansions to handle paths with spaces.The
${WORKSPACE}variable is used unquoted in paths. If the workspace path contains spaces, this will cause the script to fail.🛠️ Suggested fix
- --megatron_model_path ${WORKSPACE}/models/Ministral-3-3B-Instruct-2512-BF16/iter_0000000 \ + --megatron_model_path "${WORKSPACE}/models/Ministral-3-3B-Instruct-2512-BF16/iter_0000000" \- --hf_model_path ${WORKSPACE}/models/Ministral-3-3B-Instruct-2512-BF16-hf-export \ + --hf_model_path "${WORKSPACE}/models/Ministral-3-3B-Instruct-2512-BF16-hf-export" \Also applies to: 40-40
ko3n1g
left a comment
There was a problem hiding this comment.
@kamran-nvidia Do we have failing tests we're fixing with this PR? If not, I would defer this to the patch release. If we need the docs for the major release, please split them out into a separate PR.
|
@ko3n1g |
beep boop [🤖]: Hi @kamran-nvidia 👋,
Summary by CodeRabbit
Release Notes
Documentation
New Features