update vllm miltimodal for api calls convenience by Jorjeous · Pull Request #1213 · NVIDIA-NeMo/Skills

Jorjeous · 2026-02-05T12:37:48Z

Summary by CodeRabbit

New Features
- External API mode with automatic API-key detection and base_url-aware request handling
- Configurable audio format, optional audio chunking, and aggregation of chunked results; preserves audio outputs when enabled
Bug Fixes
- Clear error for unsupported audio formats and improved audio input preprocessing and metadata handling
Tests
- New integration and unit tests covering multimodal audio scenarios
Chores
- Added root-level .sh ignore rule

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

greptile-apps

_{1 file reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

nemo_skills/inference/model/vllm_multimodal.py

coderabbitai · 2026-02-05T12:46:44Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds external-API support and audio-format handling to VLLMMultimodalModel: API key resolution, external-vs-local request branching, audio content block preprocessing and chunking, updated generation flow, server enum additions, unit/integration tests, and a small .gitignore change.

Changes

Cohort / File(s)	Summary
Multimodal model (external API & message processing) `nemo_skills/inference/model/vllm_multimodal.py`	Adds `base_url` and `audio_format` to init, detects external API mode, implements `_get_api_key`, `_build_request_body`, `_preprocess_messages_for_model`, audio chunking/aggregation across chunks, routes generation flows for external vs local, and removes deprecated hook usage.
Audio utilities `nemo_skills/inference/model/audio_utils.py`	Adds explicit validation in `make_audio_content_block`; raises ValueError on unsupported `audio_format`.
Generation config `nemo_skills/inference/generate.py`	Expands default `GenerationTaskConfig.drop_content_types` from `["audio_url"]` to `["audio_url", "input_audio"]`.
Server utilities `nemo_skills/pipeline/utils/server.py`	Adds `vllm_multimodal` enum members and treats it as an alias of `vllm` for server startup/path logic.
Tests — NVIDIA API integration `tests/test_nvidia_inference_api.py`	New integration tests for external NVIDIA inference API covering text-only, audio input, and transcription prompts; gated by environment and local test audio.
Tests — vLLM audio unit tests `tests/test_vllm_audio.py`	Adds helper to validate `audio_url`/`input_audio` blocks, a fixture for `input_audio`, and tests to verify new audio content formats and ordering.
Repository ignore `.gitignore`	Added root-level rule to ignore `*.sh` files.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Model as VLLMMultimodalModel
    participant URLCheck as URL Detector
    participant KeyRes as API Key Resolver
    participant ReqBuild as Request Builder
    participant Backend as Backend (External API / Local vLLM)

    Client->>Model: generate_async(messages, audio, ...)
    Model->>URLCheck: is base_url local?
    alt External API
        URLCheck-->>Model: external
        Model->>KeyRes: _get_api_key(base_url, env)
        KeyRes-->>Model: api_key
        Model->>ReqBuild: _build_request_body(messages, external_api_mode=True)
        ReqBuild-->>Model: request_body (skip vLLM-only params)
        Model->>Backend: HTTP call to external API (with api_key)
    else Local vLLM
        URLCheck-->>Model: local
        Model->>ReqBuild: _build_request_body(messages, external_api_mode=False)
        ReqBuild-->>Model: request_body (include vLLM params)
        Model->>Backend: local vLLM invocation
    end
    Backend-->>Model: response (generation, debug_info, audio metadata)
    Model-->>Client: aggregated result (generation, metadata, saved audio paths if configured)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Introduced vLLM_multimodal model to save multimodal outputs #1136: Overlapping edits to vllm_multimodal.py affecting audio parsing/saving and external-API handling.

Suggested labels

run GPU tests

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title contains a typo ('miltimodal' instead of 'multimodal') and is vague about the specific changes made; it uses generic phrasing ('api calls convenience') that doesn't clearly convey the main technical improvements.	Revise the title to be more specific and clear, such as 'Support external API endpoints in VLLMMultimodalModel' or 'Add base_url and audio_format support to VLLMMultimodalModel for API compatibility', and fix the typo.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 90.91% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch vllm_multimodal_api_support

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@nemo_skills/inference/model/vllm_multimodal.py`:
- Around line 75-107: The __init__ currently hard-fails if audio_format !=
"input_audio"; change it to accept both "audio_url" and "input_audio" and to
pick a mode-aware default when the caller doesn't override: use
self._external_api_mode (computed before super call) to default to "input_audio"
for external APIs and "audio_url" for local vLLM servers, but still allow
callers to pass either "audio_url" or "input_audio"; validate audio_format
against that allowed set and raise only for invalid values; refer to __init__,
self._external_api_mode, audio_format and the
audio_utils.make_audio_content_block() behavior when implementing.
- Around line 183-200: In _build_request_body, detect when
self._external_api_mode is True and any of the vLLM-only params differ from
their defaults (top_k != -1, min_p != 0.0, repetition_penalty != 1.0) and raise
a clear ValueError indicating these parameters are not supported in external API
mode; otherwise return extra_body or {} as before. Keep the existing behavior
for non-external mode by delegating to super()._build_request_body(top_k, min_p,
repetition_penalty, extra_body=extra_body).

nemo_skills/inference/model/vllm_multimodal.py

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

greptile-apps

_{2 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

nemo_skills/inference/model/vllm_multimodal.py

greptile-apps · 2026-02-05T13:24:12Z

Additional Comments (1)

tests/test_vllm_audio.py
calls removed method _preprocess_messages_for_model() that no longer exists, test will fail

def test_content_text_to_list_no_audio(mock_vllm_multimodal_model):
    """Test that messages without audio are returned unchanged."""
    message = {"role": "user", "content": "Hello, world!"}
    result = mock_vllm_multimodal_model.content_text_to_list(message)

    assert result["content"] == "Hello, world!"
    assert "audio" not in result

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

greptile-apps

_{1 file reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

nemo_skills/inference/model/vllm_multimodal.py

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

greptile-apps

_{1 file reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

nemo_skills/inference/model/vllm_multimodal.py

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

…o vllm_multimodal_api_support

greptile-apps

_{1 file reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

nemo_skills/inference/model/vllm_multimodal.py

Jorjeous · 2026-02-06T13:09:45Z

Tests failture seem unrelated to this PR

greptile-apps

_{1 file reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-06T13:25:42Z

nemo_skills/inference/model/vllm_multimodal.py

+            if msg["role"] == "user":
+                if "audio" in msg:
+                    audio_info = msg["audio"]
+                elif "audios" in msg:
+                    audios = msg["audios"]
                    audio_info = audios[0] if audios else {}
+                else:
+                    continue


KeyError on missing role

_needs_audio_chunking() now uses msg["role"] == "user" instead of msg.get("role"). Any message dict without a role key (e.g., malformed input or upstream preprocessing differences) will raise a KeyError and crash generation. Using msg.get("role") == "user" preserves previous behavior (skip messages without role) and avoids hard failures in normal data-processing pipelines.

greptile-apps · 2026-02-06T13:25:43Z

nemo_skills/inference/model/vllm_multimodal.py

+                    if "content" not in msg_copy:
+                        raise KeyError("Missing required 'content' in message")
+                    content = msg_copy["content"]


Chunking rejects valid messages

In _generate_with_chunking(), chunking now raises KeyError if the user message doesn’t already contain a content field. Previously, audio-only messages (using audio/audios fields without content) could still be handled by constructing content items. With the current logic, audio chunking will crash for such inputs; consider defaulting missing content to "" (or []) instead of raising.

Jorjeous · 2026-02-13T09:37:24Z

@vmendelev

…odal_api_support

Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

nemo_skills/inference/model/vllm_multimodal.py (1)
449-452: ⚠️ Potential issue | 🟠 Major

Fix key name mismatch in statistics aggregation — parent returns num_generated_tokens, not generated_tokens.

super().generate_async() returns num_generated_tokens (via BaseModel._parse_chat_completion_response), yet lines 450-452 attempt to access input_tokens, generated_tokens, and time_elapsed—keys that do not exist in the parent's result dict. Since these keys won't be found, .get() falls back to the default values (0 or 0.0), causing total_input_tokens and total_generated_tokens to always be 0. The aggregated statistics are then written to non-standard keys (input_tokens, generated_tokens, time_elapsed) at lines 468-470, rather than the standard keys from the parent. This also violates the coding guideline against using .get() for keys expected to be present.

Use num_generated_tokens instead of generated_tokens, and remove attempts to access input_tokens and time_elapsed which the parent does not provide. If these statistics are needed, they must be computed separately or added by the parent implementation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/inference/model/vllm_multimodal.py` around lines 449 - 452, The
aggregation loop in generate_async (summing into total_input_tokens,
total_generated_tokens, total_time) is using wrong/missing keys from the parent
response; super().generate_async() and BaseModel._parse_chat_completion_response
provide num_generated_tokens (not generated_tokens) and do not provide
input_tokens or time_elapsed. Update the loop in generate_async to read
num_generated_tokens from each result (use direct key access, not .get(), since
the key is expected) and remove attempts to read input_tokens and time_elapsed
(or compute them elsewhere if required), and ensure the final aggregated dict
uses the standard parent key names (e.g., num_generated_tokens) to match
BaseModel._parse_chat_completion_response.

🧹 Nitpick comments (3)

nemo_skills/inference/model/vllm_multimodal.py (3)

502-504: Redundant copy.deepcopy before content_text_to_list.

_preprocess_messages_for_model (line 289) already deep-copies every message unconditionally, and content_text_to_list also deep-copies internally when audio keys are present. The outer copy.deepcopy(msg) in the comprehension on line 503 produces an extra copy that is discarded immediately. The same redundancy exists in _generate_with_chunking (line 419 + line 442).

♻️ Proposed fix

-            messages = [self.content_text_to_list(copy.deepcopy(msg)) for msg in messages]
-            messages = self._preprocess_messages_for_model(messages)
+            messages = [self.content_text_to_list(msg) for msg in messages]
+            messages = self._preprocess_messages_for_model(messages)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/inference/model/vllm_multimodal.py` around lines 502 - 504,
Remove the redundant outer copy.deepcopy in the list comprehensions that call
content_text_to_list before passing messages to _preprocess_messages_for_model
and in _generate_with_chunking; content_text_to_list and
_preprocess_messages_for_model already deep-copy messages as needed, so drop the
extra copy.deepcopy(msg) wrappers and pass msg (or the original list items)
directly to content_text_to_list to avoid an unnecessary copy.

181-181: Implicit Optional in two method signatures — PEP 484 violation.

extra_body: dict = None (line 181) and task_type: str = None (line 478) both use implicit Optional, which PEP 484 (and Ruff RUF013) disallows.

♻️ Proposed fix

-    def _build_request_body(self, top_k, min_p, repetition_penalty, extra_body: dict = None):
+    def _build_request_body(self, top_k, min_p, repetition_penalty, extra_body: dict | None = None):

-        task_type: str = None,
+        task_type: str | None = None,

Also applies to: 478-478

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/inference/model/vllm_multimodal.py` at line 181, The signatures
_build_request_body(self, top_k, min_p, repetition_penalty, extra_body: dict =
None) and the method with task_type: str = None should use explicit Optional
annotations to satisfy PEP 484; update them to extra_body: Optional[dict] = None
and task_type: Optional[str] = None, add "from typing import Optional" to the
imports (or include Optional in the existing typing import), and run
type-check/lint to ensure no other implicit Optional uses remain in
vllm_multimodal.py.

460-461: Unreachable guard — dead code.

result is guaranteed to be a non-None dict after the loop because if not chunks: raise RuntimeError(...) at line 404 ensures at least one iteration runs and assigns result. The if not result check at line 460 can never be True in normal operation.

♻️ Proposed fix

-        if not result:
-            raise RuntimeError("Audio chunk generation returned no result")
-
         final_result = result.copy()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/inference/model/vllm_multimodal.py` around lines 460 - 461,
Remove the unreachable runtime guard that checks "if not result: raise
RuntimeError(...)" because earlier code already raises if chunks is empty and
the loop always assigns result; instead delete this dead branch (or replace it
with an assert like "assert result" to document the invariant) in the function
where "result" is assigned after iterating "chunks" (referencing the "chunks"
guard and the "result" variable in vllm_multimodal.py).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test_nvidia_inference_api.py`:
- Line 21: The import of litellm in tests/test_nvidia_inference_api.py is
unused; either remove the import statement or, if intended for configuring
verbosity, call the appropriate API (e.g., set the verbose flag via
litellm.set_verbose or equivalent) so the symbol is referenced; update the file
by deleting the bare "import litellm" or adding the intended litellm
configuration call to avoid an unused-import lint/error.

---

Outside diff comments:
In `@nemo_skills/inference/model/vllm_multimodal.py`:
- Around line 449-452: The aggregation loop in generate_async (summing into
total_input_tokens, total_generated_tokens, total_time) is using wrong/missing
keys from the parent response; super().generate_async() and
BaseModel._parse_chat_completion_response provide num_generated_tokens (not
generated_tokens) and do not provide input_tokens or time_elapsed. Update the
loop in generate_async to read num_generated_tokens from each result (use direct
key access, not .get(), since the key is expected) and remove attempts to read
input_tokens and time_elapsed (or compute them elsewhere if required), and
ensure the final aggregated dict uses the standard parent key names (e.g.,
num_generated_tokens) to match BaseModel._parse_chat_completion_response.

---

Duplicate comments:
In `@nemo_skills/inference/model/vllm_multimodal.py`:
- Around line 72-105: The constructor sets audio_format default to "input_audio"
which is wrong for local vLLM; move the line that computes
self._external_api_mode (the call to self._is_local_url(self.base_url)) before
the audio_format logic and make the default audio_format mode-aware in __init__:
if self._external_api_mode is False default to "audio_url", otherwise default to
"input_audio"; keep the validation that audio_format must be one of
("audio_url","input_audio") and assign self.audio_format after this mode-aware
default is determined.

---

Nitpick comments:
In `@nemo_skills/inference/model/vllm_multimodal.py`:
- Around line 502-504: Remove the redundant outer copy.deepcopy in the list
comprehensions that call content_text_to_list before passing messages to
_preprocess_messages_for_model and in _generate_with_chunking;
content_text_to_list and _preprocess_messages_for_model already deep-copy
messages as needed, so drop the extra copy.deepcopy(msg) wrappers and pass msg
(or the original list items) directly to content_text_to_list to avoid an
unnecessary copy.
- Line 181: The signatures _build_request_body(self, top_k, min_p,
repetition_penalty, extra_body: dict = None) and the method with task_type: str
= None should use explicit Optional annotations to satisfy PEP 484; update them
to extra_body: Optional[dict] = None and task_type: Optional[str] = None, add
"from typing import Optional" to the imports (or include Optional in the
existing typing import), and run type-check/lint to ensure no other implicit
Optional uses remain in vllm_multimodal.py.
- Around line 460-461: Remove the unreachable runtime guard that checks "if not
result: raise RuntimeError(...)" because earlier code already raises if chunks
is empty and the loop always assigns result; instead delete this dead branch (or
replace it with an assert like "assert result" to document the invariant) in the
function where "result" is assigned after iterating "chunks" (referencing the
"chunks" guard and the "result" variable in vllm_multimodal.py).

tests/test_nvidia_inference_api.py

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

tests/test_nvidia_inference_api.py (1)

70-99: Add a docstring to test_nvidia_api_audio_input.

The other two test functions have descriptive docstrings, but this one is missing one. Minor nit for consistency.

✏️ Proposed fix

 def test_nvidia_api_audio_input():
+    """Integration test: audio-input generation using a local test audio file."""
     model = VLLMMultimodalModel(

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/test_nvidia_inference_api.py` around lines 70 - 99, Add a descriptive
docstring to the test_nvidia_api_audio_input function explaining what the test
verifies (e.g., that the VLLMMultimodalModel correctly handles audio input via
the NVIDIA API and returns a non-empty "generation" field); place the docstring
as the first statement inside the test_nvidia_api_audio_input function (before
model = VLLMMultimodalModel(...)) to match the style of the other tests.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test_nvidia_inference_api.py`:
- Around line 130-134: The test uses result.get("num_generated_tokens", 0) which
hides missing-key errors; replace the .get() calls with direct dict access
result["num_generated_tokens"] in both the assertion (currently asserting > 0)
and the print statement so a missing key raises KeyError and the failure is
informative—update the occurrences around the assertions/prints that reference
result and "num_generated_tokens" in tests/test_nvidia_inference_api.py.

---

Nitpick comments:
In `@tests/test_nvidia_inference_api.py`:
- Around line 70-99: Add a descriptive docstring to the
test_nvidia_api_audio_input function explaining what the test verifies (e.g.,
that the VLLMMultimodalModel correctly handles audio input via the NVIDIA API
and returns a non-empty "generation" field); place the docstring as the first
statement inside the test_nvidia_api_audio_input function (before model =
VLLMMultimodalModel(...)) to match the style of the other tests.

tests/test_nvidia_inference_api.py

coderabbitai

🧹 Nitpick comments (1)

nemo_skills/inference/generate.py (1)
634-636: content.get("type") should use direct key access per coding guidelines.

Content blocks in the OpenAI messages format always carry a "type" key. Using .get() silently returns None for any malformed block — which then passes through the filter instead of surfacing a clear error. The expanded drop_content_types default makes this path more active.
♻️ Proposed fix
-            message["content"] = [
-                content for content in message["content"] if content.get("type") not in self.cfg.drop_content_types
-            ]
+            message["content"] = [
+                content for content in message["content"] if content["type"] not in self.cfg.drop_content_types
+            ]
As per coding guidelines: "Do not use .get() for accessing dictionary keys if the code expects them to be present; use direct dictionary access dict[key] instead to allow proper error handling and fail fast with clear errors."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/inference/generate.py` around lines 634 - 636, The filtering of
message["content"] currently uses content.get("type") which hides malformed
content blocks; update the filter in the generate flow to access the required
key directly (use content["type"]) so missing "type" keys will raise immediately
and fail fast, and keep the comparison against self.cfg.drop_content_types
unchanged; if you want a clearer error, wrap the list comprehension in a brief
validation step that asserts each content has "type" before filtering,
referencing message["content"] and self.cfg.drop_content_types.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@nemo_skills/inference/generate.py`:
- Around line 634-636: The filtering of message["content"] currently uses
content.get("type") which hides malformed content blocks; update the filter in
the generate flow to access the required key directly (use content["type"]) so
missing "type" keys will raise immediately and fail fast, and keep the
comparison against self.cfg.drop_content_types unchanged; if you want a clearer
error, wrap the list comprehension in a brief validation step that asserts each
content has "type" before filtering, referencing message["content"] and
self.cfg.drop_content_types.

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

…Skills into vllm_multimodal_api_support

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

nemo_skills/inference/model/vllm_multimodal.py (2)

442-442: Use iterable unpacking instead of list concatenation (Ruff RUF005).

🔧 Proposed fix

-                    msg_copy["content"] = [make_audio_content_block(chunk_base64, self.audio_format)] + text_content
+                    msg_copy["content"] = [make_audio_content_block(chunk_base64, self.audio_format), *text_content]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/inference/model/vllm_multimodal.py` at line 442, Replace the list
concatenation when building msg_copy["content"] with iterable unpacking: instead
of combining [make_audio_content_block(chunk_base64, self.audio_format)] +
text_content, construct the list using the audio block followed by unpacking the
existing iterable (e.g., [make_audio_content_block(chunk_base64,
self.audio_format), *text_content]) so that make_audio_content_block,
chunk_base64, self.audio_format and text_content are used in the new expression.

469-470: if not result: guard is unreachable dead code.

chunks is verified non-empty at line 413, so the for loop always executes at least once and result is always assigned. The guard can never be True.

🔧 Proposed fix

-        if not result:
-            raise RuntimeError("Audio chunk generation returned no result")
-
         final_result = result.copy()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/inference/model/vllm_multimodal.py` around lines 469 - 470,
Remove the unreachable guard that checks "if not result:" after the audio-chunk
processing loop (since "chunks" is already validated non-empty and "result" is
assigned inside the for loop), i.e., delete the raise RuntimeError("Audio chunk
generation returned no result") block; if you want a defensive check instead,
move a single assert or explicit check for a non-empty "chunks" before the loop
(or assert result is set after loop) rather than keeping the unreachable
post-loop guard.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nemo_skills/inference/model/vllm_multimodal.py`:
- Around line 287-298: The current _preprocess_messages_for_model only
deep-copies messages and is being invoked after large base64 audio is already
placed into messages, doubling memory; remove the redundant calls to
_preprocess_messages_for_model at the call sites in generate_async and
_generate_with_chunking (where content_text_to_list already returns a fresh copy
and where copy.deepcopy(msg) is used) or, if you want to preserve the hook, move
the single call to _preprocess_messages_for_model to run before audio
encoding/injection (i.e., before base64 audio is added) so audio blobs are not
copied unnecessarily.
- Line 183: Update the type annotation for the extra_body parameter in the
_build_request_body function to explicitly allow None (PEP 484 compliance):
change extra_body: dict = None to extra_body: dict | None = None so the
signature reads _build_request_body(self, top_k, min_p, repetition_penalty,
extra_body: dict | None = None); keep the default value None and ensure any
usage inside _build_request_body handles the None case as before.
- Around line 300-315: The function content_text_to_list currently returns the
original message dict when no "audio"/"audios" keys are present, breaking its
contract to return a new message dict; change it to return a shallow or deep
copy (e.g., copy.deepcopy(message) or message.copy() depending on nested
mutability needs) so callers always receive a new dict, and then remove the
now-redundant outer copy.deepcopy call in generate_async to avoid
double-copying; update references to content_text_to_list in generate_async to
use the returned copy.

---

Nitpick comments:
In `@nemo_skills/inference/model/vllm_multimodal.py`:
- Line 442: Replace the list concatenation when building msg_copy["content"]
with iterable unpacking: instead of combining
[make_audio_content_block(chunk_base64, self.audio_format)] + text_content,
construct the list using the audio block followed by unpacking the existing
iterable (e.g., [make_audio_content_block(chunk_base64, self.audio_format),
*text_content]) so that make_audio_content_block, chunk_base64,
self.audio_format and text_content are used in the new expression.
- Around line 469-470: Remove the unreachable guard that checks "if not result:"
after the audio-chunk processing loop (since "chunks" is already validated
non-empty and "result" is assigned inside the for loop), i.e., delete the raise
RuntimeError("Audio chunk generation returned no result") block; if you want a
defensive check instead, move a single assert or explicit check for a non-empty
"chunks" before the loop (or assert result is set after loop) rather than
keeping the unreachable post-loop guard.

nemo_skills/inference/model/vllm_multimodal.py

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com> Co-authored-by: mmkrtchyan <mmkrtchyan@nvidia.com>

commit a5da597 Author: Igor Gitman <igitman@nvidia.com> Date: Fri Mar 6 12:13:36 2026 -0800 Revert "Eval kit support (#1239)" (#1294) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit b237e33 Author: George <37293288+Jorjeous@users.noreply.github.com> Date: Fri Mar 6 20:25:37 2026 +0400 Eval kit support (#1239) Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> commit dc28bbf Author: George Armstrong <georgea@nvidia.com> Date: Thu Mar 5 10:17:44 2026 -0800 Python direct tool calling without MCP (#1286) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 12454dd Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Wed Mar 4 13:06:21 2026 -0800 Allow het servers for nemo-rl jobs (#1223) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit 8884a68 Author: Prasoon Varshney <prasoon1995@gmail.com> Date: Wed Mar 4 10:24:02 2026 -0800 Support source_lang param for translation recipe (#1290) Signed-off-by: Prasoon Varshney <prasoonv@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 4618b19 Author: Meriem B. <113170426+ka00ri@users.noreply.github.com> Date: Wed Mar 4 18:59:28 2026 +0100 Add MMLU-Pro 10% optimized subset for checkpoint selection (#1285) Signed-off-by: Meriem Boubdir <mboubdir@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 5ac8609 Author: Talor Abramovich <talor19@gmail.com> Date: Wed Mar 4 02:30:06 2026 +0200 Add SPEED-Bench (within repo) (#1279) Signed-off-by: Talor Abramovich <talora@nvidia.com> Signed-off-by: talora <talora@nvidia.com> Signed-off-by: Talor Abramovich <talor19@gmail.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com> commit c31eec5 Author: George Armstrong <georgea@nvidia.com> Date: Tue Mar 3 12:18:15 2026 -0800 Fix os.getlogin() crash in ns setup (#1289) Signed-off-by: George Armstrong <georgea@nvidia.com> commit c228e66 Author: George Armstrong <georgea@nvidia.com> Date: Tue Mar 3 11:04:54 2026 -0800 Fix streaming TypeError when delta.content is None (#1267) (#1288) Signed-off-by: George Armstrong <georgea@nvidia.com> commit aa47923 Author: Matvei Novikov <mnovikov@nvidia.com> Date: Mon Mar 2 16:28:41 2026 -0800 Add LibTrace recipe for generating domain-specific reasoning data (#1224) Signed-off-by: jubick1337 <mnovikov@nvidia.com> Signed-off-by: mnovikov <mnovikov@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 313cad7 Author: Stephen Ge <stepheng@nvidia.com> Date: Mon Mar 2 18:28:49 2026 -0500 fix: clean parse-failure retries in prover (#1284) Signed-off-by: Stephen Ge <stepheng@nvidia.com> commit 813cfa3 Author: George Armstrong <georgea@nvidia.com> Date: Mon Mar 2 15:10:08 2026 -0800 tst: rollback inference-api to integrate (#1287) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 31735f9 Author: Valentin Mendelev <vmendelev@nvidia.com> Date: Mon Mar 2 23:11:25 2026 +0100 Add backend-agnostic unified inference server with NeMo ASR and TTS backends (#1250) Signed-off-by: Valentin Mendelev <vmendelev@nvidia.com> commit d4ef8c0 Author: George <37293288+Jorjeous@users.noreply.github.com> Date: Fri Feb 27 23:58:54 2026 +0400 Update promt_config to working with openai format + inline setup (#1210) Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit e879cbc Author: George Armstrong <georgea@nvidia.com> Date: Fri Feb 27 10:41:23 2026 -0800 Update noc tutorial (#1282) Signed-off-by: George Armstrong <georgea@nvidia.com> commit f6e3505 Author: George Armstrong <georgea@nvidia.com> Date: Fri Feb 27 10:17:33 2026 -0800 Add noc reasoning tutorial (#1278) Signed-off-by: Amparo Canaveras <acanaveras@nvidia.com> Signed-off-by: rajeshwarid179 <rdevaramani@nvidia.com> Signed-off-by: acanaveras <142839082+acanaveras@users.noreply.github.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Amparo Canaveras <acanaveras@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: acanaveras <142839082+acanaveras@users.noreply.github.com> Co-authored-by: rajeshwarid179 <rdevaramani@nvidia.com> commit fc2072a Author: Jiacheng Xu <jcxu@utexas.edu> Date: Fri Feb 27 10:10:25 2026 -0800 CritPt generation add prompt_format=None (#1280) Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit c8abe5d Author: Igor Gitman <igitman@nvidia.com> Date: Fri Feb 27 09:31:26 2026 -0800 New slurm customization parameters (account, containers) (#1209) Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 2b38cce Author: George Armstrong <georgea@nvidia.com> Date: Wed Feb 25 17:59:52 2026 -0800 Add nemo-skills-core subpackage for lightweight installs (#1229) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 9fa8e83 Author: Dheeraj Peri <peri.dheeraj@gmail.com> Date: Wed Feb 25 12:56:35 2026 -0800 feat: add custom judge type support for external repo integration (#1274) Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: bzantium <ryumin93@gmail.com> Signed-off-by: Dheeraj Peri <dperi@nvidia.com> Signed-off-by: suriya <sgunasekar@nvidia.com> Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Minho Ryu <ryumin93@gmail.com> Co-authored-by: Yongqiang Wang <yongqiang.seagull@gmail.com> Co-authored-by: Suriya Gunasekar <sgunasekar@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Jiacheng Xu <jcxu@utexas.edu> Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com> commit 8a32b13 Author: Igor Gitman <igitman@nvidia.com> Date: Tue Feb 24 15:24:42 2026 -0800 Exclude numb3rs form test_eval.py (#1275) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 6da2219 Author: George <37293288+Jorjeous@users.noreply.github.com> Date: Mon Feb 23 18:37:46 2026 +0400 Numb3rs ds addition (#1174) Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> commit ad034b5 Author: Suriya Gunasekar <sgunasekar@users.noreply.github.com> Date: Sun Feb 22 11:55:24 2026 -0800 Add DSBench-DA evaluation (#1254) Squash merge of changes during code-review. Signed-off-by: suriya <sgunasekar@nvidia.com> commit 7593ab3 Author: Jiacheng Xu <jcxu@utexas.edu> Date: Fri Feb 20 16:42:01 2026 -0800 Add CritPt benchmark (#1200) Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit 58c31b2 Author: Suriya Gunasekar <sgunasekar@users.noreply.github.com> Date: Fri Feb 20 16:19:22 2026 -0800 Fix no_answer metric overcounting in _compute_pass_at_k (#1245) Signed-off-by: suriya <sgunasekar@nvidia.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit 1f1a2e7 Author: Igor Gitman <igitman@nvidia.com> Date: Fri Feb 20 15:58:40 2026 -0800 Fix incorrect prompt tokens count due to HF api update (#1264) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 8ebc6f5 Author: Igor Gitman <igitman@nvidia.com> Date: Fri Feb 20 09:05:33 2026 -0800 Remove deprecated dataset group (#1263) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit ea4177f Author: Yongqiang Wang <yongqiang.seagull@gmail.com> Date: Thu Feb 19 19:57:25 2026 -0500 fix deps (#1258) commit 60905a7 Author: Minho Ryu <ryumin93@gmail.com> Date: Fri Feb 20 09:39:39 2026 +0900 Add aime26 (#1256) Signed-off-by: bzantium <ryumin93@gmail.com> commit b28afc5 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 19 16:18:25 2026 -0800 Rename custom -> external benchmarks (#1262) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 6cc9c45 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 19 16:10:33 2026 -0800 Add reference to internal benchmarks repo (#1261) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 5202af6 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 19 16:08:05 2026 -0800 Remove incorrect presence-penalty setting (#1259) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 144c70b Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 19 15:26:33 2026 -0800 Adding an option to store benchmarks in external repo (#1240) Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 10e6e39 Author: George <37293288+Jorjeous@users.noreply.github.com> Date: Thu Feb 19 19:57:21 2026 +0400 update vllm miltimodal for api calls convenience (#1213) Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com> Co-authored-by: mmkrtchyan <mmkrtchyan@nvidia.com> commit 1ba4219 Author: Nick Ludwig <nliudvig@nvidia.com> Date: Wed Feb 18 03:28:23 2026 +0400 Fix --server_container not being applied to dependent jobs (#1244) Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit 9517614 Author: Wasi Ahmad <wasiahmad@ucla.edu> Date: Mon Feb 16 11:13:24 2026 -0800 Support mini-swe-agent as agent harness (#1212) Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: i-vainn <imoshkov@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: Charlie Truong <chtruong@nvidia.com> Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> Signed-off-by: Grigor Nalbandyan <gnalbandyan@nvidia.com> Signed-off-by: bzantium <ryumin93@gmail.com> Signed-off-by: Stephen Ge <stepheng@nvidia.com> Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: Mateusz Winiarek <mwiniarek@nvidia.com> Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com> Signed-off-by: Wei Du <wedu@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Signed-off-by: Arkadiusz Nowaczynski <anowaczynski@nvidia.com> Signed-off-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> Co-authored-by: Ivan <imoshkov@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Charlie Truong <chtruong@nvidia.com> Co-authored-by: Nick Ludwig <nliudvig@nvidia.com> Co-authored-by: Wojciech Prazuch <wojciechprazuch3@gmail.com> Co-authored-by: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com> Co-authored-by: Minho Ryu <ryumin93@gmail.com> Co-authored-by: Stephen Ge <stepheng@nvidia.com> Co-authored-by: Jiacheng Xu <jcxu@utexas.edu> Co-authored-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: Sanyam Kapoor <sanyamk@nvidia.com> Co-authored-by: Mateusz Winiarek <72758259+Froxyy-dev@users.noreply.github.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Meline Mkrtchyan <72409758+melllinia@users.noreply.github.com> Co-authored-by: Wei Du <wedu@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Sean Naren <snarenthiran@nvidia.com> Co-authored-by: Mehrzad Samadi <mehrzadsamadi@gmail.com> Co-authored-by: anowaczynski-nvidia <anowaczynski@nvidia.com> Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> commit a3d44dc Author: Suriya Gunasekar <sgunasekar@users.noreply.github.com> Date: Fri Feb 13 22:32:15 2026 -0800 Add --installation_command support to prepare_data (#1243) Signed-off-by: suriya <sgunasekar@nvidia.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> commit e80d524 Author: George Armstrong <georgea@nvidia.com> Date: Thu Feb 12 17:26:00 2026 -0800 Fix CI disk space for Docker image builds (#1241) Signed-off-by: George Armstrong <georgea@nvidia.com> commit d22236c Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Wed Feb 11 17:55:00 2026 -0800 Fix answerbench prompt parsing (#1235) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> commit 2401628 Author: George Armstrong <georgea@nvidia.com> Date: Wed Feb 11 14:56:43 2026 -0800 feat: add lockfiles for reproducible sandbox builds (#1233) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 5a0a84d Author: Wasi Ahmad <wasiahmad@ucla.edu> Date: Wed Feb 11 13:30:03 2026 -0800 removing datasets version restriction for LCB eval (#1230) Signed-off-by: wasiahmad <wasiahmad@ucla.edu> commit ef0a890 Author: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com> Date: Wed Feb 11 12:03:16 2026 +0400 Gnalbandyan/add physics (#1214) Signed-off-by: Grigor Nalbandyan <gnalbandyan@nvidia.com> Signed-off-by: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com> commit bd9d30c Author: Wasi Ahmad <wasiahmad@ucla.edu> Date: Tue Feb 10 15:13:27 2026 -0800 LCB generic prompting (#1215) Signed-off-by: wasiahmad <wasiahmad@ucla.edu> commit 7d6c49a Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Sat Feb 7 08:45:46 2026 -0800 Add support for different variations of nemo-rl (#1220) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> commit b19ba96 Author: George Armstrong <georgea@nvidia.com> Date: Fri Feb 6 21:40:56 2026 -0800 Add multi-node sandbox support for SLURM clusters (#1218) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 8950bb0 Author: anowaczynski-nvidia <anowaczynski@nvidia.com> Date: Sat Feb 7 01:38:00 2026 +0100 support structured outputs in hle judge for optional AA compatibility (#1186) Signed-off-by: Arkadiusz Nowaczynski <anowaczynski@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit b84f7a2 Author: Igor Gitman <igitman@nvidia.com> Date: Fri Feb 6 14:51:02 2026 -0800 A small update on running tests docs (#1219) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 8e838e1 Author: George Armstrong <georgea@nvidia.com> Date: Thu Feb 5 18:01:35 2026 -0800 feat: add flag to disable sandbox replay (#1217) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 5fd9085 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 5 15:57:01 2026 -0800 Add an option to limit number of tool calls (#1216) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit d820200 Author: Igor Gitman <igitman@nvidia.com> Date: Tue Feb 3 10:43:55 2026 -0800 Add arena-hard v2 (#1205) Signed-off-by: bzantium <ryumin93@gmail.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: bzantium <ryumin93@gmail.com> commit a30920e Author: Igor Gitman <igitman@nvidia.com> Date: Mon Feb 2 10:53:55 2026 -0800 Fix mkdocs warnings (#1204) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 19d7788 Author: Ivan <imoshkov@nvidia.com> Date: Mon Feb 2 23:25:13 2026 +0500 Fix infinite wait in sandbox.wait_for_sandbox (#1206) Signed-off-by: i-vainn <imoshkov@nvidia.com> commit 3e65fbf Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Fri Jan 30 19:38:38 2026 -0800 Improve tts (#1203) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> commit 250c862 Author: Nick Ludwig <nliudvig@nvidia.com> Date: Fri Jan 30 22:12:29 2026 +0400 SWE-bench: fix SWE-agent hanging, adjust expected scores (#1202) Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> commit 7ded756 Author: Ivan <imoshkov@nvidia.com> Date: Fri Jan 30 09:57:41 2026 +0500 Add proper token counting to code execution model (#1184) Signed-off-by: i-vainn <imoshkov@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit b986304 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Jan 29 17:57:07 2026 -0800 Upgrade containers (#1198) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Sadegh Mahdavi <smahdavi@nvidia.com> commit 3b44f02 Author: Dan Lord <blahblahasdf@gmail.com> Date: Thu Jan 29 16:40:47 2026 -0800 Fix incorrect string format (#1199) Signed-off-by: dlord <dlord@nvidia.com> commit c4854b8 Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Thu Jan 29 13:43:36 2026 -0800 Update nemo-rl to latest (#1087) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com>

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com> Co-authored-by: mmkrtchyan <mmkrtchyan@nvidia.com>

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com> Co-authored-by: mmkrtchyan <mmkrtchyan@nvidia.com> Signed-off-by: dgitman <dgitman@nvidia.com>

update vllm miltimodal for api calls convenience

6136ec9

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

greptile-apps bot reviewed Feb 5, 2026

View reviewed changes

nemo_skills/inference/model/vllm_multimodal.py Outdated Show resolved Hide resolved

nemo_skills/inference/model/vllm_multimodal.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Feb 5, 2026

View reviewed changes

nemo_skills/inference/model/vllm_multimodal.py Show resolved Hide resolved

nemo_skills/inference/model/vllm_multimodal.py Outdated Show resolved Hide resolved

Update to contribution, tests for input audio format

7695ab3

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

greptile-apps bot reviewed Feb 5, 2026

View reviewed changes

nemo_skills/inference/model/vllm_multimodal.py Outdated Show resolved Hide resolved

nemo_skills/inference/model/vllm_multimodal.py Outdated Show resolved Hide resolved

handling of topk top p params

8a3fb5d

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

greptile-apps bot reviewed Feb 5, 2026

View reviewed changes

nemo_skills/inference/model/vllm_multimodal.py Outdated Show resolved Hide resolved

Jorjeous and others added 2 commits February 5, 2026 05:32

adress comments

7d4a5cc

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

Merge branch 'main' into vllm_multimodal_api_support

a45947f

Jorjeous requested review from melllinia and vmendelev February 5, 2026 13:37

greptile-apps bot reviewed Feb 5, 2026

View reviewed changes

nemo_skills/inference/model/vllm_multimodal.py Outdated Show resolved Hide resolved

nemo_skills/inference/model/vllm_multimodal.py Show resolved Hide resolved

Jorjeous added 2 commits February 5, 2026 05:54

add multimodal to allowerd

e7d0d92

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

Merge remote-tracking branch 'origin/vllm_multimodal_api_support' int…

af6b5b8

…o vllm_multimodal_api_support

greptile-apps bot reviewed Feb 5, 2026

View reviewed changes

nemo_skills/inference/model/vllm_multimodal.py Outdated Show resolved Hide resolved

nemo_skills/inference/model/vllm_multimodal.py Outdated Show resolved Hide resolved

Merge branch 'main' into vllm_multimodal_api_support

a7717e4

greptile-apps bot reviewed Feb 6, 2026

View reviewed changes

Jorjeous and others added 2 commits February 19, 2026 03:16

Merge branch 'main' of github.com:NVIDIA-NeMo/Skills into vllm_multim…

37333a1

…odal_api_support

adding nvidia inference api test

0363027

Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>

melllinia force-pushed the vllm_multimodal_api_support branch from 629c921 to 0363027 Compare February 19, 2026 11:47

coderabbitai bot reviewed Feb 19, 2026

View reviewed changes

tests/test_nvidia_inference_api.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Feb 19, 2026

View reviewed changes

tests/test_nvidia_inference_api.py Outdated Show resolved Hide resolved

Merge branch 'main' into vllm_multimodal_api_support

e14ca7a

coderabbitai bot reviewed Feb 19, 2026

View reviewed changes

Jorjeous added 2 commits February 19, 2026 04:16

update tests and adress comments

12a76c9

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

Merge branch 'vllm_multimodal_api_support' of github.com:NVIDIA-NeMo/…

ebc15af

…Skills into vllm_multimodal_api_support

coderabbitai bot reviewed Feb 19, 2026

View reviewed changes

nemo_skills/inference/model/vllm_multimodal.py Outdated Show resolved Hide resolved

nemo_skills/inference/model/vllm_multimodal.py Show resolved Hide resolved

nemo_skills/inference/model/vllm_multimodal.py Outdated Show resolved Hide resolved

Jorjeous added 4 commits February 19, 2026 04:37

address review nits

f5b57dc

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

avoid extra deepcopy after audio injection

28f44b8

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

use nv inference api key and add soundfile

ac97b44

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

update yml for jkeys and default key

64a860a

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

Jorjeous enabled auto-merge (squash) February 19, 2026 15:40

melllinia approved these changes Feb 19, 2026

View reviewed changes

Jorjeous merged commit 10e6e39 into main Feb 19, 2026
5 of 7 checks passed

Jorjeous deleted the vllm_multimodal_api_support branch February 19, 2026 15:57

coderabbitai bot mentioned this pull request Feb 20, 2026

Nemo-ASR models server [do not merge] #1211

Closed

Conversation

Jorjeous commented Feb 5, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot commented Feb 5, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Jorjeous commented Feb 6, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Jorjeous commented Feb 13, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Jorjeous commented Feb 5, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 5, 2026 •

edited

Loading