Skip to content
1 change: 1 addition & 0 deletions .github/workflows/cicd-main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -363,6 +363,7 @@ jobs:
- script: L2_Launch_models_qwen
- script: L2_Launch_models_qwen_quantization
- script: L2_Launch_models_qwen_vl
- script: L2_Launch_models_qwen35_vl
- script: L2_Launch_recipes_gemma_vl
- script: L2_Launch_recipes_gpt_oss
- script: L2_Launch_models_qwen_vl_quantization
Expand Down
29 changes: 23 additions & 6 deletions examples/conversion/compare_hf_and_megatron/compare.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@
"""

import argparse
import gc
import importlib
import os
import sys
Expand Down Expand Up @@ -318,7 +319,13 @@ def vlm_forward_step(data_iterator, model, **kwargs) -> torch.Tensor:
def loss_func(x, **kwargs):
return x

return model(**forward_args), loss_func
model_output = model(**forward_args)
if isinstance(model_output, tuple):
output_tensor, _ = model_output
else:
output_tensor = model_output

return output_tensor, loss_func


def load_image(image_path: str) -> Image.Image:
Expand Down Expand Up @@ -609,6 +616,11 @@ def _load_megatron_model(args):
model_provider.finalize()
megatron_model = model_provider.provide_distributed_model(wrap_with_ddp=False)

# Workaround: disable MTP for inference (causes hangs on NCCL collectives)
for m in megatron_model:
m.config.mtp_num_layers = None
m.config.grad_scale_func = None
Comment on lines +619 to +622
Copy link
Contributor

@coderabbitai coderabbitai bot Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cd /tmp/repo && head -650 examples/conversion/compare_hf_and_megatron/compare.py | tail -50

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 130


🏁 Script executed:

# Get more context around the problematic lines
cd /tmp/repo && sed -n '600,630p' examples/conversion/compare_hf_and_megatron/compare.py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 130


🏁 Script executed:

# Search for mtp_num_layers and grad_scale_func definitions and usages
rg -n "mtp_num_layers|grad_scale_func" --type=py src examples

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 4815


🏁 Script executed:

# Check what megatron_model is - search backwards from line 619 to find its assignment
cd /tmp/repo && sed -n '1,625p' examples/conversion/compare_hf_and_megatron/compare.py | grep -n "megatron_model\s*=" | tail -5

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 130


🏁 Script executed:

# Look for config class definitions in megatron/core to understand structure
fd "config" src/megatron --type f -name "*.py" | head -20

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 301


🏁 Script executed:

sed -n '580,635p' examples/conversion/compare_hf_and_megatron/compare.py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 2450


🏁 Script executed:

# Check the vlm/base.py file that does the same workaround
sed -n '85,105p' src/megatron/bridge/inference/vlm/base.py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 669


🏁 Script executed:

# Search for how megatron_model is populated before line 619
rg -n "megatron_model\s*=|megatron_model\.append|megatron_model\[" examples/conversion/compare_hf_and_megatron/compare.py -A 3

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 469


🏁 Script executed:

# Check if TransformerConfig or other config classes define these fields
rg -n "class.*Config|mtp_num_layers.*=|grad_scale_func.*=" src/megatron/bridge/models -A 2 -B 2 | grep -A 3 -B 3 "class\|mtp_num_layers\|grad_scale_func"

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 15938


Guard the MTP workaround with attribute checks.

The code unconditionally assigns to m.config.mtp_num_layers and m.config.grad_scale_func without verifying they exist. These fields are optional across different config types (as shown by Optional[int] definitions in provider configs and defensive getattr() patterns elsewhere in the codebase). Configs without these attributes will fail during this workaround.

Proposed fix
    for m in megatron_model:
-        m.config.mtp_num_layers = None
-        m.config.grad_scale_func = None
+        if hasattr(m.config, "mtp_num_layers"):
+            m.config.mtp_num_layers = None
+        if hasattr(m.config, "grad_scale_func"):
+            m.config.grad_scale_func = None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Workaround: disable MTP for inference (causes hangs on NCCL collectives)
for m in megatron_model:
m.config.mtp_num_layers = None
m.config.grad_scale_func = None
# Workaround: disable MTP for inference (causes hangs on NCCL collectives)
for m in megatron_model:
if hasattr(m.config, "mtp_num_layers"):
m.config.mtp_num_layers = None
if hasattr(m.config, "grad_scale_func"):
m.config.grad_scale_func = None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/conversion/compare_hf_and_megatron/compare.py` around lines 619 -
622, The loop that disables MTP for inference blindly sets
m.config.mtp_num_layers and m.config.grad_scale_func and will crash if those
attributes are absent; update the code in the megatron_model iteration to check
for the attributes (e.g., using hasattr or getattr) on m.config before assigning
and only set them to None when they exist (references: megatron_model,
m.config.mtp_num_layers, m.config.grad_scale_func).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setting them as none has no issues

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cuichenx, understood! I've noted that setting these config attributes to None works without issues in this context. Thanks for clarifying.


✏️ Learnings added
Learnt from: cuichenx
Repo: NVIDIA-NeMo/Megatron-Bridge PR: 2530
File: examples/conversion/compare_hf_and_megatron/compare.py:619-622
Timestamp: 2026-02-27T02:48:46.395Z
Learning: In `examples/conversion/compare_hf_and_megatron/compare.py`, when disabling MTP for inference by setting `m.config.mtp_num_layers = None` and `m.config.grad_scale_func = None` on megatron_model components, no attribute existence checks are needed—setting these to None works without issues even if the attributes don't initially exist.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: HollowMan6
Repo: NVIDIA-NeMo/Megatron-Bridge PR: 2387
File: src/megatron/bridge/models/mimo/mimo_bridge.py:49-113
Timestamp: 2026-02-14T16:32:37.992Z
Learning: In `src/megatron/bridge/models/mimo/mimo_bridge.py`, the mapping_registry method intentionally iterates over both "transformer_layer" and "mtp_model_layer" prefixes for MTP layer mappings. This dual-prefix support ensures compatibility because Megatron-Core may expose the same MTP submodule under either naming convention, allowing HF MTP weights to load correctly without requiring configuration overrides.

Learnt from: aroshanghias-nvd
Repo: NVIDIA-NeMo/Megatron-Bridge PR: 2040
File: src/megatron/bridge/models/mimo/llava_provider.py:109-113
Timestamp: 2026-01-28T17:39:49.959Z
Learning: In `src/megatron/bridge/models/mimo/llava_provider.py`, the `TransformerConfig` used for the `MultimodalProjector` includes `num_attention_heads=1` as a required placeholder. While the MLP projector doesn't use attention, `TransformerConfig` requires a positive value for `num_attention_heads` (default of 0 is not valid), so using 1 as a placeholder is the correct approach.

Learnt from: aroshanghias-nvd
Repo: NVIDIA-NeMo/Megatron-Bridge PR: 2007
File: src/megatron/bridge/data/mimo/loaders.py:0-0
Timestamp: 2026-01-28T20:36:48.499Z
Learning: In MIMO data loaders (`src/megatron/bridge/data/mimo/loaders.py`), use `cfg.train.micro_batch_size` directly instead of calculating `cfg.train.global_batch_size // dp_size`, as the latter ignores gradient accumulation steps. The correct relationship is `global_batch_size = micro_batch_size × dp_size × num_micro_batches`. This matches the standard pattern used in Bridge's non-MIMO data loaders.

Learnt from: CR
Repo: NVIDIA-NeMo/Megatron-Bridge PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2026-01-28T19:56:04.048Z
Learning: Applies to src/megatron/bridge/models/**/*.py : Handle tensor parallel and pipeline parallel distribution correctly in weight conversion

Learnt from: yaoyu-33
Repo: NVIDIA-NeMo/Megatron-Bridge PR: 2342
File: src/megatron/bridge/models/qwen_omni/thinker_model.py:326-345
Timestamp: 2026-02-13T00:22:54.549Z
Learning: In Python implementations (e.g., src/megatron/bridge/models/qwen_omni/thinker_model.py), when a feature is not supported (such as audio embeddings), raise an explicit error (e.g., NotImplementedError) instead of silently ignoring the input to fail fast with a clear message. Use a descriptive exception message that identifies the unsupported feature and the expected behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm this breaks MTP in mamba model.. looks like gpt and mamba have different logic to determine if mtp is needed. For mamba, it still calls MTP layers even if mtp_num_layers=None (here) and it will crash in MTP forward pass (here)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!


model_components = [m.eval() for m in megatron_model]

# Register debug hooks if enabled
Expand Down Expand Up @@ -715,11 +727,10 @@ def compare_models_one_step(args) -> None:
)

del hf_model
# Reload Megatron model to ensure a fresh instance before comparison
megatron_model, _ = _load_megatron_model(args)
gc.collect()
torch.cuda.empty_cache()

# Broadcast HF results to all ranks after Megatron initialization
# (following the pattern from generate_from_hf.py)
# Broadcast HF results to all ranks
if torch.distributed.is_initialized():
# Create tensors for broadcasting if they don't exist on non-rank-0
if hf_next_token is None:
Expand All @@ -731,6 +742,9 @@ def compare_models_one_step(args) -> None:
)
hf_logits = torch.zeros(vocab_size, device=input_ids.device, dtype=torch.float32)

# Ensure consistent dtype across ranks before broadcast
hf_logits = hf_logits.float()

# Broadcast from rank 0 to all ranks
torch.distributed.broadcast(hf_next_token, 0)
torch.distributed.broadcast(hf_logits, 0)
Expand Down Expand Up @@ -778,7 +792,10 @@ def compare_models_one_step(args) -> None:
megatron_logits = megatron_output[0, -1, :]
megatron_next_token = torch.argmax(megatron_logits, dim=-1)

if not torch.distributed.is_initialized() or parallel_state.get_tensor_model_parallel_rank() == 0:
if not torch.distributed.is_initialized() or (
parallel_state.get_tensor_model_parallel_rank() == 0
and parallel_state.get_expert_model_parallel_rank() == 0
):
print(f"Megatron output shape: {megatron_output.shape}")
print(f"Megatron logits stats - mean: {megatron_logits.mean():.4f}, std: {megatron_logits.std():.4f}")
print(
Expand Down
2 changes: 2 additions & 0 deletions examples/conversion/hf_megatron_roundtrip_multi_gpu.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,8 @@
# These are compared in float32 to avoid false mismatches.
IGNORE_PRECISION_PARAMS = [
"e_score_correction_bias",
"A_log",
"linear_attn.norm.weight",
]


Expand Down
52 changes: 52 additions & 0 deletions examples/models/vlm/qwen35_vl/conversion.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
#!/usr/bin/env bash
# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Workspace directory for checkpoints and results
WORKSPACE=${WORKSPACE:-/workspace}
MODEL_NAME=Qwen3.5-35B-A3B # Qwen3.5-35B-A3B, Qwen3.5-122B-A10B, Qwen3.5-397B-A17B, Qwen3.5-27B

if [ "${MODEL_NAME}" = "Qwen3.5-27B" ]; then
HF_MODEL_CLASS="Qwen3_5ForConditionalGeneration"
else
HF_MODEL_CLASS="Qwen3_5MoeForConditionalGeneration"
fi

# Make sure to upgrade to transformers >= 5.2.0
# uv add transformers>=5.2.0

# Import HF → Megatron
uv run python examples/conversion/convert_checkpoints.py import \
--hf-model Qwen/${MODEL_NAME} \
--megatron-path ${WORKSPACE}/${MODEL_NAME} \
--torch-dtype bfloat16

# HF and Megatron models logits comparison validation
uv run python -m torch.distributed.run --nproc_per_node=8 examples/conversion/compare_hf_and_megatron/compare.py \
--hf_model_path Qwen/${MODEL_NAME} \
--megatron_model_path ${WORKSPACE}/${MODEL_NAME} \
--model_class "${HF_MODEL_CLASS}" \
--image_path "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg" \
--prompt "Describe this image." \
--tp 1 --pp 1 --ep 8

# Export Megatron → HF
uv run python examples/conversion/convert_checkpoints.py export \
--hf-model Qwen/${MODEL_NAME} \
--megatron-path ${WORKSPACE}/${MODEL_NAME}/iter_0000000 \
--hf-path ${WORKSPACE}/${MODEL_NAME}-hf-export

# Round-trip validation
uv run python -m torch.distributed.run --nproc_per_node=8 examples/conversion/hf_megatron_roundtrip_multi_gpu.py \
--hf-model-id Qwen/${MODEL_NAME} --tp 1 --pp 2 --ep 4 --trust-remote-code
43 changes: 43 additions & 0 deletions examples/models/vlm/qwen35_vl/inference.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
#!/usr/bin/env bash
# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Workspace directory for checkpoints and results
WORKSPACE=${WORKSPACE:-/workspace}
MODEL_NAME=Qwen3.5-35B-A3B # Qwen3.5-35B-A3B, Qwen3.5-122B-A10B, Qwen3.5-27B

# Inference with Hugging Face checkpoints
uv run python -m torch.distributed.run --nproc_per_node=8 examples/conversion/hf_to_megatron_generate_vlm.py \
--hf_model_path Qwen/${MODEL_NAME} \
--image_path "https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16/resolve/main/images/table.png" \
--prompt "Describe this image." \
--max_new_tokens 50 \
--tp 2 --pp 2 --ep 4

# Inference with imported Megatron checkpoints
uv run python -m torch.distributed.run --nproc_per_node=8 examples/conversion/hf_to_megatron_generate_vlm.py \
--hf_model_path Qwen/${MODEL_NAME} \
--megatron_model_path ${WORKSPACE}/${MODEL_NAME}/iter_0000000 \
--image_path "https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16/resolve/main/images/table.png" \
--prompt "Describe this image." \
--max_new_tokens 50 \
--tp 2 --pp 2 --ep 4

# Inference with exported HF checkpoints
uv run python -m torch.distributed.run --nproc_per_node=8 examples/conversion/hf_to_megatron_generate_vlm.py \
--hf_model_path ${WORKSPACE}/${MODEL_NAME}-hf-export \
--image_path "https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16/resolve/main/images/table.png" \
--prompt "Describe this image." \
--max_new_tokens 50 \
--tp 2 --pp 2 --ep 4
180 changes: 180 additions & 0 deletions examples/models/vlm/qwen35_vl/slurm_inference.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
#!/bin/bash
# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# ==============================================================================
# Qwen3.5-VL Multi-Node Distributed Inference for Qwen3.5-397B-A17B
# Recommended: TP=2, PP=4, EP=8 for full model (32 GPUs, 4 nodes)
#
# Usage:
# 1. Modify the #SBATCH directives below for your cluster
# 2. Set MODEL_PATH and CHECKPOINT_PATH as needed
# 3. Set CONTAINER_IMAGE or use --no-container-image for bare metal
# 4. Submit: sbatch slurm_inference.sh
# ==============================================================================

#SBATCH --job-name=qwen35v-inference
#SBATCH --nodes=4 # Number of nodes (32 GPUs = 4 nodes × 8 GPUs)
#SBATCH --ntasks-per-node=8 # Tasks per node (1 per GPU)
#SBATCH --gpus-per-node=8 # GPUs per node
#SBATCH --time=02:00:00 # Max run time (2 hours)
#SBATCH --partition=gpu # Partition name
#SBATCH --account=my_account # Account name
#SBATCH --output=logs/qwen35v_inference_%j.out
#SBATCH --error=logs/qwen35v_inference_%j.err
#SBATCH --exclusive # Exclusive node access

# ==============================================================================
# CONFIGURATION
# ==============================================================================

# Workspace directory
WORKSPACE=${WORKSPACE:-/workspace}

# Model configuration
MODEL_NAME=Qwen3.5-397B-A17B

# Option 1: Use HuggingFace model path (will load and convert on-the-fly)
MODEL_PATH=${WORKSPACE}/${MODEL_NAME}
# MODEL_PATH=Qwen/${MODEL_NAME} # Or use HF Hub path

# Option 2: Use pre-converted Megatron checkpoint (faster)
MEGATRON_CHECKPOINT=${WORKSPACE}/${MODEL_NAME}/iter_0000000
# Comment out to use HF model directly

# Inference configuration
IMAGE_PATH="https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16/resolve/main/images/table.png"
PROMPT="Describe this image."
MAX_NEW_TOKENS=1000

# Parallelism configuration for 32 GPUs (4 nodes × 8 GPUs)
TP=2 # Tensor Parallelism
PP=4 # Pipeline Parallelism
EP=8 # Expert Parallelism (MoE)

# Container configuration (required for SLURM pyxis)
CONTAINER_IMAGE=""
# CONTAINER_IMAGE="/path/to/nemo-framework.sqsh"

# Container mounts (optional, space-separated)
CONTAINER_MOUNTS=""
# CONTAINER_MOUNTS="/data:/data /workspace:/workspace"

# Set to true to run without container (bare metal)
NO_CONTAINER=false

# ==============================================================================
# Environment Setup
# ==============================================================================

# NCCL optimizations
export TORCH_NCCL_AVOID_RECORD_STREAMS=1
export NCCL_NVLS_ENABLE=0

# UV cache on shared filesystem (recommended for multi-node setups)
# Pre-sync once before submitting jobs: UV_CACHE_DIR=/path/to/cache uv sync
# export UV_CACHE_DIR="/path/to/shared/uv_cache"

# HuggingFace cache directory (recommended for shared filesystem)
# export HF_HOME="/path/to/shared/HF_HOME"

# Authentication tokens
# export HF_TOKEN="hf_your_token_here"

# Make sure to upgrade container image to transformers >= 5.2.0 (required for Qwen3.5)
# Run once: uv add "transformers>=5.2.0"

# ==============================================================================
# Job Execution
# ==============================================================================

echo "======================================"
echo "Qwen3.5-VL Multi-Node Inference"
echo "======================================"
echo "Job ID: $SLURM_JOB_ID"
echo "Nodes: $SLURM_JOB_NUM_NODES"
echo "GPUs per node: $SLURM_GPUS_PER_NODE"
echo "Total GPUs: $((SLURM_JOB_NUM_NODES * SLURM_GPUS_PER_NODE))"
echo "Model: $MODEL_NAME"
echo "Parallelism: TP=$TP, PP=$PP, EP=$EP"
echo "======================================"

# Create logs directory
mkdir -p logs

# Calculate total processes
TOTAL_GPUS=$((SLURM_JOB_NUM_NODES * SLURM_GPUS_PER_NODE))
REQUIRED_GPUS=$(( (TP > EP ? TP : EP) * PP ))

# Validate parallelism configuration
if [ $REQUIRED_GPUS -ne $TOTAL_GPUS ]; then
echo "ERROR: Parallelism mismatch!"
echo " max(TP, EP) × PP = max($TP, $EP) × $PP = $REQUIRED_GPUS"
echo " Total allocated GPUs = $TOTAL_GPUS"
echo " These must be equal!"
exit 1
fi

MEGATRON_CKPT_ARG=""
if [ -n "$MEGATRON_CHECKPOINT" ]; then
MEGATRON_CKPT_ARG="--megatron_model_path $MEGATRON_CHECKPOINT"
fi

CMD="uv run --no-sync python examples/conversion/hf_to_megatron_generate_vlm.py \
--hf_model_path $MODEL_PATH \
$MEGATRON_CKPT_ARG \
--image_path \"$IMAGE_PATH\" \
--prompt \"$PROMPT\" \
--max_new_tokens $MAX_NEW_TOKENS \
--tp $TP \
--pp $PP \
--ep $EP"

# Only rank 0 on each node runs uv sync
SYNC_CMD="if [ \"\$SLURM_LOCALID\" -eq 0 ]; then uv sync; else sleep 5; fi"
FULL_CMD="$SYNC_CMD && $CMD"

echo "Executing inference..."
echo "Command: $CMD"
echo "======================================"

# Execute based on container configuration
if [ "$NO_CONTAINER" = true ]; then
echo "Running without container (bare metal)"
srun --mpi=pmix bash -c "$FULL_CMD"
else
# Require container image
if [ -z "$CONTAINER_IMAGE" ]; then
echo "ERROR: CONTAINER_IMAGE must be set, or use NO_CONTAINER=true for bare metal."
exit 1
fi

echo "Running with container: $CONTAINER_IMAGE"

# Build srun command with container
SRUN_CMD="srun --mpi=pmix --container-image=$CONTAINER_IMAGE"

# Add container mounts
if [ -n "$CONTAINER_MOUNTS" ]; then
for mount in $CONTAINER_MOUNTS; do
SRUN_CMD="$SRUN_CMD --container-mounts=$mount"
done
fi

$SRUN_CMD bash -c "$FULL_CMD"
fi

echo "======================================"
echo "Inference completed"
echo "======================================"
8 changes: 8 additions & 0 deletions src/megatron/bridge/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,10 @@
Qwen25VLBridge,
Qwen25VLModel,
Qwen25VLModelProvider,
Qwen35VLBridge,
Qwen35VLModelProvider,
Qwen35VLMoEBridge,
Qwen35VLMoEModelProvider,
)
from megatron.bridge.models.qwen_vl.modelling_qwen3_vl import (
Qwen3VLBridge,
Expand Down Expand Up @@ -331,6 +335,10 @@
"Qwen3VLMoEModelProvider",
"Qwen3VLBridge",
"Qwen3VLMoEBridge",
"Qwen35VLBridge",
"Qwen35VLModelProvider",
"Qwen35VLMoEBridge",
"Qwen35VLMoEModelProvider",
"Gemma3VLBridge",
"Gemma3VLModel",
"Gemma3VLModelProvider",
Expand Down
Loading