Skip to content

Add backend-agnostic unified inference server#1250

Merged
vmendelev merged 6 commits intomainfrom
pr2/unified-server-refactor
Mar 2, 2026
Merged

Add backend-agnostic unified inference server#1250
vmendelev merged 6 commits intomainfrom
pr2/unified-server-refactor

Conversation

@vmendelev
Copy link
Collaborator

@vmendelev vmendelev commented Feb 18, 2026

Summary

This PR adds a backend-agnostic unified inference server and integrates speech backends used in NeMo-Skills generation workflows.

Included:

  • Unified server entrypoint: nemo_skills.inference.server.serve_unified
  • Unified server core: recipes/multimodal/server/unified_server.py
  • Backend framework/registry: recipes/multimodal/server/backends/*
  • New backends:
    • magpie_tts
    • nemo_asr
  • Native Slurm tests:
    • tests/slurm-tests/unified_asr/*
    • tests/slurm-tests/unified_tts/*
  • Slurm matrix registration in tests/slurm-tests/run_all.sh
  • Unit tests:
    • tests/test_nemo_asr_backend.py
    • tests/test_unified_server_audio_parser.py

Scope and constraints

  • No changes to nemo_skills/inference/generate.py public config surface.
  • No cluster config files are part of this PR.

Notable follow-up-compatible behavior

  • tests/slurm-tests/unified_tts/run_test.py keeps default server image as nvcr.io/nvidia/nemo:25.11.
  • For current environments, TTS can require --code_path to a NeMo tree that contains magpietts_inference.
  • Usage guidance is documented in:
    • tests/slurm-tests/unified_tts/README.md

Validation

Unit tests

  • Added/updated tests for ASR backend behavior and unified audio parser.

Slurm validation (rerun on Feb 25, 2026)

Ran ASR+TTS on both clusters with local image override and correct TTS --code_path.

DFW:

  • ASR generation job 9662732, checker 9662734 -> ALL TESTS PASSED
  • TTS generation job 9662733, checker 9662735 -> ALL TESTS PASSED

IAD (draco_oci):

  • ASR generation job 8611354, checker 8611357 -> ALL TESTS PASSED
  • TTS generation job 8611353, checker 8611356 -> ALL TESTS PASSED

Commit notes

Latest incremental updates in this PR branch include:

  • Add unified TTS Slurm test README
  • Add copyright header to NeMo ASR backend test

Summary by CodeRabbit

  • New Features

    • Unified NeMo Inference Server with OpenAI-compatible endpoints for text and audio (chat completions, health, model listing)
    • Pluggable, backend-agnostic architecture with ASR and TTS backends and batched request processing
    • YAML-based configuration with CLI overrides and runtime environment controls
  • Documentation

    • New server README describing architecture and backend integration
  • Tests

    • Added unit and integration tests for ASR, TTS, batching, audio parsing, and error handling

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 18, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a unified, YAML-configurable NeMo inference server: a CLI entrypoint, backend registry and abstract backend API, two backends (NeMo ASR, Magpie TTS), an OpenAI-compatible FastAPI app with batched inference and audio handling, and Slurm test runners/checkers.

Changes

Cohort / File(s) Summary
CLI / Entrypoint
nemo_skills/inference/server/serve_unified.py
New CLI entrypoint implementing YAML config loading, CLI overrides, extra-args parsing into backend config, PYTHONPATH setup, optional safetensors patch, environment setup (CUDA/DEBUG), and Uvicorn-based FastAPI app launcher with error handling and main().
Unified FastAPI Server
recipes/multimodal/server/unified_server.py
New FastAPI app exposing OpenAI-style /v1/chat/completions, message parsing (text/audio/system), RequestBatcher for batched inference, backend init via get_backend, extra-route registration, health/models endpoints, audio saving, and mapping GenerationRequest/Result to OpenAI-like responses.
Backend package surface
recipes/multimodal/__init__.py, recipes/multimodal/server/__init__.py
New package initializers re-exporting core backend API types and get_backend to form the public multimodal interface.
Backend registry & loader
recipes/multimodal/server/backends/__init__.py
Adds BACKEND_REGISTRY, lazy import-based get_backend() and list_backends() with error handling for unknown backends and import failures.
Backend abstractions / base types
recipes/multimodal/server/backends/base.py
Introduces Modality enum, BackendConfig.from_dict, GenerationRequest, GenerationResult, and abstract InferenceBackend with lifecycle methods, validation, health_check, and param merging.
Magpie TTS backend
recipes/multimodal/server/backends/magpie_tts_backend.py
New MagpieTTSConfig and MagpieTTSBackend implementing model loading (fsspec/HF handling), optional normalization, batched TTS generation, temporary batch artifact management, metrics/debug_info, and health info.
NeMo ASR backend
recipes/multimodal/server/backends/nemo_asr_backend.py
New NeMoASRConfig and NeMoASRBackend implementing model loading, optional warmup, robust transcription across NeMo variants, hypothesis/word parsing, audio acquisition, batching, and detailed debug metadata in results.
Backends export updates
recipes/multimodal/server/backends/...
__all__ updated to expose new types and helper functions (InferenceBackend, GenerationRequest, GenerationResult, BackendConfig, get_backend, list_backends, etc.).
Slurm tests & runners
tests/slurm-tests/unified_asr/*, tests/slurm-tests/unified_tts/*, tests/slurm-tests/run_all.sh
Adds Slurm test runners, checkers, test data and README; updates run_all.sh to include unified ASR/TTS runs and orchestration entries.
Unit tests
tests/test_nemo_asr_backend.py, tests/test_unified_server_audio_parser.py, tests/test_unified_server_batcher.py, tests/test_unified_server_error_handling.py, tests/test_magpie_tts_backend.py
Adds unit tests covering ASR backend behavior, audio extraction/parsing, batcher error handling, error sanitization, and Magpie TTS context-audio path validation.
Test data
tests/slurm-tests/unified_asr/asr_openai.test, tests/slurm-tests/unified_tts/tts_openai.test
Adds sample JSON test entries for ASR and TTS Slurm tests.
Test checkers
tests/slurm-tests/unified_asr/check_results.py, tests/slurm-tests/unified_tts/check_results.py
New validation scripts to assert ASR/TTS outputs: transcript normalization and presence, audio file resolution/existence, and expected sample counts.
Slurm run helpers
tests/slurm-tests/unified_asr/run_test.py, tests/slurm-tests/unified_tts/run_test.py
New end-to-end test runners preparing remote workspaces, launching generation jobs, handling code_path mounts, and optionally running post-checks.

Sequence Diagram(s)

sequenceDiagram
    actor User as User/CLI
    participant CLI as serve_unified.py
    participant Server as unified_server.create_app
    participant Registry as backends.get_backend
    participant Backend as InferenceBackend
    participant Model as Model Loader

    User->>CLI: Run CLI (args / --config)
    CLI->>CLI: parse args, load YAML, setup PYTHONPATH/patches
    CLI->>Server: create_app(backend_type, config_dict)
    Server->>Registry: get_backend(backend_type)
    Registry-->>Server: Backend class
    Server->>Backend: instantiate(config) & load_model()
    Backend->>Model: load model artifacts
    Model-->>Backend: model ready
    Server->>Server: register extra routes, init RequestBatcher
    CLI->>CLI: start Uvicorn with app
Loading
sequenceDiagram
    actor Client as Client
    participant API as /v1/chat/completions
    participant Parser as message parsing
    participant Batcher as RequestBatcher
    participant Backend as InferenceBackend
    participant Model as Model

    Client->>API: POST chat request (text/audio)
    API->>Parser: extract text/audio/system prompt
    Parser-->>API: GenerationRequest
    API->>API: validate request
    API->>Batcher: add_request(GenerationRequest)
    Batcher->>Batcher: accumulate until batch_size or timeout
    alt Batch ready or timeout
        Batcher->>Backend: generate([requests])
        Backend->>Model: run inference
        Model-->>Backend: GenerationResult(s)
        Backend-->>Batcher: return results
        Batcher->>API: resolve futures with results
    end
    API->>Client: OpenAI-style JSON response (incl. audio if present)
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Suggested labels

enhancement, run GPU tests

Suggested reviewers

  • melllinia
  • gwarmstrong
  • Kipok
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.03% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add backend-agnostic unified inference server' clearly and concisely summarizes the main change: introducing a new unified inference server with pluggable backend support.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr2/unified-server-refactor

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (6)
recipes/multimodal/server/unified_server.py (5)

38-38: Unused import uvicorn.

uvicorn is imported at the top of this module but never referenced. It's only used in serve_unified.py. Remove to avoid confusion.

Proposed fix
-import uvicorn
 from fastapi import FastAPI, HTTPException
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/multimodal/server/unified_server.py` at line 38, Remove the unused
top-level import "uvicorn" from recipes/multimodal/server/unified_server.py:
locate the import statement (import uvicorn) and delete it so the module no
longer references an unused symbol; leave imports used by functions/classes in
this file intact and ensure serve_unified.py continues to import uvicorn where
required.

355-389: Optional file-save logic catches broad Exception and silently continues.

Both the JSON save (line 380) and audio save (line 388) catch Exception broadly and only print a warning. If the user has configured AUDIO_SAVE_DIR, they likely expect artifacts to be written. Per coding guidelines, "Do not catch exceptions when they are not normally expected to be raised; let code fail with clear errors instead of silently misbehaving." At minimum, consider logging at a warning level or making the save operation best-effort with a more specific exception type (e.g., OSError).

Proposed fix — narrow the exception type
-                except Exception as e:
+                except OSError as e:
                     print(f"[Server] Warning: Failed to save JSON: {e}")
-                except Exception as e:
+                except OSError as e:
                     print(f"[Server] Warning: Failed to save audio: {e}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/multimodal/server/unified_server.py` around lines 355 - 389, The save
block currently swallows all Exceptions with prints; change it to handle only
expected filesystem errors and surface unexpected failures: replace the broad
except blocks around the JSON save (saved_json_path / json.dump) and audio save
(saved_audio_path / f.write when result.audio_bytes) to catch OSError (and
json.JSONEncodeError if needed) and log via the server logger at warning level,
including the error and file path, while letting other exceptions propagate (or
re-raise) so failures are visible; ensure you reference save_dir,
saved_json_path, saved_audio_path, result.audio_bytes, and the same
timestamp/base_filename variables when updating the handlers.

440-446: Re-raise with from to preserve exception chain.

Per best practice (and flagged by static analysis B904), use raise ... from e so the original traceback is chained properly.

Proposed fix
         except Exception as e:
             import traceback
 
             traceback.print_exc()
-            raise HTTPException(status_code=500, detail=str(e))
+            raise HTTPException(status_code=500, detail=str(e)) from e
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/multimodal/server/unified_server.py` around lines 440 - 446, In the
generic exception handler in unified_server.py (the except Exception as e
block), change the re-raise to preserve the original exception chain by using
"raise HTTPException(status_code=500, detail=str(e)) from e" instead of the
current bare raise; keep the traceback.print_exc() call if you want the local
trace printed but ensure the HTTPException is raised with "from e" so the
original exception (e) is chained and its traceback is preserved.

84-89: Fire-and-forget asyncio.create_task calls risk silently lost exceptions and client hangs.

If a _process_batch task fails during lock acquisition or at any point before setting results on futures, the exception is silently lost and clients await indefinitely. This pattern recurs at lines 84, 86, 99, and 136. Store task references and add a done-callback for logging.

Example fix — add a done callback for error logging
+    def _log_task_exception(self, task: asyncio.Task):
+        if task.cancelled():
+            return
+        exc = task.exception()
+        if exc:
+            print(f"[Batcher] Background task failed: {exc}")
+
     async def add_request(self, request: GenerationRequest) -> GenerationResult:
         """Add a request and wait for result."""
         future = asyncio.Future()
         pending = PendingRequest(request=request, future=future, timestamp=time.time())
 
         async with self.lock:
             self.pending_requests.append(pending)
 
             if len(self.pending_requests) >= self.batch_size:
                 if DEBUG:
                     print(f"[Batcher] Batch full ({self.batch_size}), processing immediately")
-                asyncio.create_task(self._process_batch())
+                task = asyncio.create_task(self._process_batch())
+                task.add_done_callback(self._log_task_exception)
             elif self.batch_timeout == 0:
-                asyncio.create_task(self._process_batch())
+                task = asyncio.create_task(self._process_batch())
+                task.add_done_callback(self._log_task_exception)
             elif self.timeout_task is None or self.timeout_task.done():
                 self.timeout_task = asyncio.create_task(self._timeout_handler())
+                self.timeout_task.add_done_callback(self._log_task_exception)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/multimodal/server/unified_server.py` around lines 84 - 89, The
create_task calls that fire-and-forget _process_batch (and _timeout_handler) can
swallow exceptions and leave client futures pending; change code to capture the
returned Task (e.g., assign to a variable or a list field) and attach a
done-callback that logs exceptions: for places creating tasks for
self._process_batch() and for self._timeout_handler() (referenced as
_process_batch and _timeout_handler and the timeout_task attribute), store the
task reference (e.g., task = asyncio.create_task(...)), add
task.add_done_callback(lambda t: log.error(...) or inspect and log t.exception()
if t.cancelled() is False), and keep or dispose the reference appropriately so
exceptions are surfaced and logged instead of being silently lost.

230-230: Migrate from @app.on_event("startup") to the lifespan context manager pattern.

@app.on_event("startup") is deprecated in FastAPI (and its underlying Starlette framework) in favor of the ASGI lifespan context manager. Use contextlib.asynccontextmanager with the lifespan= parameter on FastAPI() instead, placing startup logic before yield and shutdown logic after.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/multimodal/server/unified_server.py` at line 230, Replace the
deprecated `@app.on_event`("startup") handler by creating an async lifespan
context manager and passing it into FastAPI(lifespan=...); move the current
startup logic implemented where the decorator is applied (look for the function
decorated with `@app.on_event`("startup") and any code it runs) into the block
before the yield of an asynccontextmanager, and move any teardown/shutdown code
(if present) after the yield; then instantiate the app with
FastAPI(lifespan=your_lifespan) and remove the `@app.on_event` decorator and its
function.
nemo_skills/inference/server/serve_unified.py (1)

114-158: build_config_from_args hardcodes backend-specific parameter names, coupling the entrypoint to specific backends.

Every new backend that adds a CLI parameter requires updating this list. Consider having each backend declare its own argparse arguments or accepting arbitrary --key=value pairs to pass through, keeping the CLI entrypoint truly backend-agnostic.

This is acceptable for now as backward-compatible scaffolding, but worth tracking for cleanup.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/inference/server/serve_unified.py` around lines 114 - 158, The
build_config_from_args function currently hardcodes backend-specific keys;
replace that with dynamic pass-through by iterating over vars(args).items(),
skipping core CLI keys
("model","device","dtype","max_new_tokens","temperature","top_p","codec_model")
and any keys with value None, and adding every other arg to config_dict; keep
the special-case handling for codec_model (map to "codec_model_path") and the
store_true flag no_decode_audio -> set "decode_audio": False as before. This
removes the long maintained list and lets new backend args be forwarded
automatically while preserving existing special mappings.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nemo_skills/inference/server/serve_unified.py`:
- Around line 106-111: load_yaml_config may return None for empty YAML files
which later causes config_dict.pop(...) to fail; update load_yaml_config to
ensure it always returns a dict (e.g., call yaml.safe_load and if result is None
return {} or wrap with cast to dict), so that downstream code using config_dict
(including the config_dict.pop(...) usage) does not raise AttributeError; change
only inside load_yaml_config to default to an empty dict when safe_load yields
None and preserve type hints.
- Around line 90-103: The apply_safetensors_patch function currently catches all
exceptions and only prints a warning which hides failures; update
apply_safetensors_patch to not silently swallow errors by either removing the
broad try/except or re-raising the caught exception after logging so the caller
sees the failure (keep the print/log line for context). Specifically, modify
apply_safetensors_patch to let exceptions from importing safetensors, getting
inspect.getfile, os.makedirs, or shutil.copyfile propagate (or if you keep the
except block, re-raise the exception with raise after the print) so callers
loading NeMo models will receive a clear failure instead of a later confusing
error.
- Line 220: The code currently calls parser.parse_known_args() and assigns to
args, extra_args which silently swallows unknown CLI flags; change this to
parser.parse_args() and remove the unused extra_args variable so unsupported
arguments cause a fast failure. Update the assignment from "args, extra_args =
parser.parse_known_args()" to just "args = parser.parse_args()" (and remove any
handling of extra_args), ensuring the parser variable and subsequent uses of
args (in serve_unified.py) continue to work unchanged.

In `@recipes/multimodal/server/backends/base.py`:
- Around line 220-227: The get_generation_params method is using truthy "or"
which treats explicit zeros (e.g., temperature=0.0, top_p=0.0, top_k=0) as falsy
and incorrectly falls back to self.config; update get_generation_params in the
class that defines it to use explicit None checks for each field (e.g., use
request.temperature if request.temperature is not None else
self.config.temperature, and similarly for max_new_tokens, top_p, top_k) so that
valid zero values are preserved while still falling back to config defaults when
the request field is None.
- Around line 237-244: In validate_request, the audio presence check only
inspects request.audio_bytes and request.audio_path but ignores multi-turn
fields audio_bytes_list and audio_paths; update the has_audio_input computation
in the validate_request method to also consider non-empty
request.audio_bytes_list and request.audio_paths (e.g., list not None and len>0)
so that requests with only multi-turn audio are treated as audio input for the
Modality.AUDIO_IN support check and the "must have either text or audio input"
validation; ensure references remain to Modality.AUDIO_IN and self.name for the
error message.

In `@recipes/multimodal/server/unified_server.py`:
- Around line 223-228: server_config is built with fixed keys, so replace all
uses of server_config.get("key", ...) with direct indexing server_config["key"]
to avoid masking bugs; update every access sites that read model_path,
batch_size, batch_timeout, backend_type (notably the usages around the root
handler and other handlers referenced in the diff: the occurrences you flagged
near lines ~265-274, ~271, ~291 and ~411) to use server_config["model_path"],
server_config["batch_size"], server_config["batch_timeout"], and
server_config["backend_type"] respectively, and remove any fallback defaults
since the keys are guaranteed present.
- Around line 167-198: extract_text_from_messages is including messages with
role "system", causing duplicate system text alongside extract_system_prompt;
modify extract_text_from_messages to skip any message where message.get("role")
== "system" (i.e., only collect content from non-system roles), leaving
extract_system_prompt as-is so the system prompt only appears in the
system_prompt field of GenerationRequest; update the function that iterates
messages (extract_text_from_messages) to check message.get("role") before
processing content.
- Around line 101-138: The batch-result association in _process_batch is unsafe
because zip(batch, results) will silently drop pairs if backend.generate returns
fewer results than requests; update the code to validate that results length
matches len(requests) after calling self.backend.generate and raise or set
exceptions for unmatched pending items if mismatched, and then iterate with an
indexed loop (e.g., for i, pending in enumerate(batch): result = results[i]) or
use explicit range checking to avoid silent truncation; reference
_process_batch, self.backend.generate, batch, results, and the zip(batch,
results) usage when making the change.
- Around line 304-338: The handler chat_completions currently uses
request.get("messages", []) which silently masks a missing required field;
change to use direct access request["messages"] (and catch KeyError to raise
HTTPException 400 with a clear message) so malformed requests fail fast, while
keeping .get(...) for optional params (max_tokens, temperature, top_p, seed).
Locate the chat_completions function and replace request.get("messages", [])
with request["messages"], add a try/except KeyError around that access (or
explicit if "messages" not in request) to raise HTTPException(status_code=400,
detail="Missing required field: messages"), and ensure subsequent uses
(extract_audio_from_messages, extract_text_from_messages, GenerationRequest)
operate on the validated messages list.

---

Nitpick comments:
In `@nemo_skills/inference/server/serve_unified.py`:
- Around line 114-158: The build_config_from_args function currently hardcodes
backend-specific keys; replace that with dynamic pass-through by iterating over
vars(args).items(), skipping core CLI keys
("model","device","dtype","max_new_tokens","temperature","top_p","codec_model")
and any keys with value None, and adding every other arg to config_dict; keep
the special-case handling for codec_model (map to "codec_model_path") and the
store_true flag no_decode_audio -> set "decode_audio": False as before. This
removes the long maintained list and lets new backend args be forwarded
automatically while preserving existing special mappings.

In `@recipes/multimodal/server/unified_server.py`:
- Line 38: Remove the unused top-level import "uvicorn" from
recipes/multimodal/server/unified_server.py: locate the import statement (import
uvicorn) and delete it so the module no longer references an unused symbol;
leave imports used by functions/classes in this file intact and ensure
serve_unified.py continues to import uvicorn where required.
- Around line 355-389: The save block currently swallows all Exceptions with
prints; change it to handle only expected filesystem errors and surface
unexpected failures: replace the broad except blocks around the JSON save
(saved_json_path / json.dump) and audio save (saved_audio_path / f.write when
result.audio_bytes) to catch OSError (and json.JSONEncodeError if needed) and
log via the server logger at warning level, including the error and file path,
while letting other exceptions propagate (or re-raise) so failures are visible;
ensure you reference save_dir, saved_json_path, saved_audio_path,
result.audio_bytes, and the same timestamp/base_filename variables when updating
the handlers.
- Around line 440-446: In the generic exception handler in unified_server.py
(the except Exception as e block), change the re-raise to preserve the original
exception chain by using "raise HTTPException(status_code=500, detail=str(e))
from e" instead of the current bare raise; keep the traceback.print_exc() call
if you want the local trace printed but ensure the HTTPException is raised with
"from e" so the original exception (e) is chained and its traceback is
preserved.
- Around line 84-89: The create_task calls that fire-and-forget _process_batch
(and _timeout_handler) can swallow exceptions and leave client futures pending;
change code to capture the returned Task (e.g., assign to a variable or a list
field) and attach a done-callback that logs exceptions: for places creating
tasks for self._process_batch() and for self._timeout_handler() (referenced as
_process_batch and _timeout_handler and the timeout_task attribute), store the
task reference (e.g., task = asyncio.create_task(...)), add
task.add_done_callback(lambda t: log.error(...) or inspect and log t.exception()
if t.cancelled() is False), and keep or dispose the reference appropriately so
exceptions are surfaced and logged instead of being silently lost.
- Line 230: Replace the deprecated `@app.on_event`("startup") handler by creating
an async lifespan context manager and passing it into FastAPI(lifespan=...);
move the current startup logic implemented where the decorator is applied (look
for the function decorated with `@app.on_event`("startup") and any code it runs)
into the block before the yield of an asynccontextmanager, and move any
teardown/shutdown code (if present) after the yield; then instantiate the app
with FastAPI(lifespan=your_lifespan) and remove the `@app.on_event` decorator and
its function.

Comment on lines +223 to +228
server_config = {
"backend_type": backend_type,
"model_path": config_dict.get("model_path", ""),
"batch_size": batch_size,
"batch_timeout": batch_timeout,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use direct dictionary access instead of .get() on server_config where keys are always present.

server_config is populated at lines 223-228 with known keys (backend_type, model_path, batch_size, batch_timeout). Accessing these via .get() (lines 271, 291, 411) masks potential bugs. As per coding guidelines, "Do not use .get() for accessing dictionary keys if the code expects them to be present; use direct dictionary access dict[key] instead."

Proposed fix (example for root endpoint)
     `@app.get`("/")
     async def root():
         """Root endpoint with server info."""
         return {
             "service": "Unified NeMo Inference Server",
             "version": "1.0.0",
-            "backend": server_config.get("backend_type"),
-            "model": server_config.get("model_path"),
+            "backend": server_config["backend_type"],
+            "model": server_config["model_path"],
             "endpoints": ["/v1/chat/completions", "/health", "/v1/models"],
         }

Apply the same change at lines 291 and 411.

Also applies to: 265-274

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/multimodal/server/unified_server.py` around lines 223 - 228,
server_config is built with fixed keys, so replace all uses of
server_config.get("key", ...) with direct indexing server_config["key"] to avoid
masking bugs; update every access sites that read model_path, batch_size,
batch_timeout, backend_type (notably the usages around the root handler and
other handlers referenced in the diff: the occurrences you flagged near lines
~265-274, ~271, ~291 and ~411) to use server_config["model_path"],
server_config["batch_size"], server_config["batch_timeout"], and
server_config["backend_type"] respectively, and remove any fallback defaults
since the keys are guaranteed present.

Comment on lines +304 to +338
@app.post("/v1/chat/completions")
async def chat_completions(request: Dict[str, Any]):
"""OpenAI-compatible chat completions endpoint with audio support."""
if backend_instance is None or not backend_instance.is_loaded:
raise HTTPException(status_code=503, detail="Model not loaded")

try:
messages = request.get("messages", [])
if not messages:
raise HTTPException(status_code=400, detail="No messages provided")

# Extract components from messages
audio_bytes_list = extract_audio_from_messages(messages)
text = extract_text_from_messages(messages)
system_prompt = extract_system_prompt(messages)

# Get generation parameters
max_tokens = request.get("max_tokens", 512)
temperature = request.get("temperature", 1.0)
top_p = request.get("top_p", 1.0)
seed = request.get("seed")

# Create generation request
gen_request = GenerationRequest(
text=text if text else None,
system_prompt=system_prompt,
audio_bytes=audio_bytes_list[0] if len(audio_bytes_list) == 1 else None,
audio_bytes_list=audio_bytes_list if len(audio_bytes_list) > 1 else None,
max_new_tokens=max_tokens,
temperature=temperature,
top_p=top_p,
seed=seed,
request_id=hashlib.md5(f"{time.time()}".encode()).hexdigest()[:8],
extra_params=request.get("extra_body", {}),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

request.get("messages", []) — per coding guidelines, use direct access for expected keys.

messages is a required field for the /v1/chat/completions endpoint. Using .get("messages", []) silently returns an empty list instead of failing with a clear KeyError. Similarly, request.get("max_tokens", 512) at line 321 and others silently apply defaults for what may be an improperly formed request. Use request["messages"] for required fields. As per coding guidelines, "Do not use .get() for accessing dictionary keys if the code expects them to be present; use direct dictionary access dict[key] instead."

Note that max_tokens, temperature, top_p, and seed are legitimately optional per the OpenAI API spec, so .get() with defaults is appropriate for those.

Proposed fix for required field
-            messages = request.get("messages", [])
-            if not messages:
-                raise HTTPException(status_code=400, detail="No messages provided")
+            messages = request.get("messages")
+            if not messages:
+                raise HTTPException(status_code=400, detail="'messages' field is required and must not be empty")
🧰 Tools
🪛 Ruff (0.15.1)

[warning] 313-313: Abstract raise to an inner function

(TRY301)


[error] 336-336: Probable use of insecure hash functions in hashlib: md5

(S324)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/multimodal/server/unified_server.py` around lines 304 - 338, The
handler chat_completions currently uses request.get("messages", []) which
silently masks a missing required field; change to use direct access
request["messages"] (and catch KeyError to raise HTTPException 400 with a clear
message) so malformed requests fail fast, while keeping .get(...) for optional
params (max_tokens, temperature, top_p, seed). Locate the chat_completions
function and replace request.get("messages", []) with request["messages"], add a
try/except KeyError around that access (or explicit if "messages" not in
request) to raise HTTPException(status_code=400, detail="Missing required field:
messages"), and ensure subsequent uses (extract_audio_from_messages,
extract_text_from_messages, GenerationRequest) operate on the validated messages
list.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nemo_skills/inference/server/serve_unified.py`:
- Around line 155-159: The parse_extra_args loop silently skips tokens that
don't start with "--", which hides user mistakes; update the parse_extra_args
function to raise a clear error instead of continuing when encountering a token
that does not start with "--" (e.g., treat arg in the while loop as invalid and
raise ValueError or a custom exception mentioning the offending token), ensuring
any non-`--` prefixed token is reported as unrecognized so callers fail fast
rather than being ignored.
- Around line 301-306: Remove the redundant broad exception handler in the main
flow: delete the "except Exception as e:" block that prints the error, defers
"import traceback", calls traceback.print_exc(), and sys.exit(1); let unhandled
exceptions from create_app and uvicorn.run propagate so Python's default handler
provides the traceback and non-zero exit. Specifically, modify the main function
(the code surrounding create_app(...) and uvicorn.run(...)) to not catch
Exception, and remove the deferred import of traceback and the sys.exit call
tied to that handler.

---

Duplicate comments:
In `@nemo_skills/inference/server/serve_unified.py`:
- Around line 103-116: The try/except in apply_safetensors_patch currently
swallows all exceptions with a print; replace that silent handling by importing
logging, obtaining a logger (e.g. logging.getLogger(__name__)), and in the
except block call logger.exception or logger.error(..., exc_info=True) to record
the full stack trace and context (include the hack_path and dest_path), then
re-raise the exception so failures are visible to callers; update references to
safetensors.torch and inspect.getfile within apply_safetensors_patch only (do
not change other functions).
- Around line 119-124: load_yaml_config currently returns whatever
yaml.safe_load yields (which can be None for empty/comment-only files) causing
downstream code like the config_dict.pop(...) call to fail; update
load_yaml_config to coerce the result into an empty dict when safe_load returns
None (e.g., assign result = yaml.safe_load(f) and if not isinstance(result,
dict) set result = {} or raise a clear error) so callers always receive a dict
and config_dict.pop will not throw AttributeError; reference function
load_yaml_config and the downstream usage config_dict.pop to locate the change.

Comment on lines +155 to +159
while i < len(extra_args):
arg = extra_args[i]
if not arg.startswith("--"):
i += 1
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

parse_extra_args silently drops non--- prefixed tokens.

Any token in extra_args that does not start with -- (e.g., -model /path single-dash typo, or a stray positional argument) is silently skipped. Per coding guidelines, unused user-passed parameters should not be silently ignored — the code should fail fast.

🛡️ Proposed fix — raise on unrecognized tokens
         if not arg.startswith("--"):
-            i += 1
-            continue
+            raise ValueError(f"Unexpected argument: {arg!r}. Backend-specific flags must use '--' prefix.")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/inference/server/serve_unified.py` around lines 155 - 159, The
parse_extra_args loop silently skips tokens that don't start with "--", which
hides user mistakes; update the parse_extra_args function to raise a clear error
instead of continuing when encountering a token that does not start with "--"
(e.g., treat arg in the while loop as invalid and raise ValueError or a custom
exception mentioning the offending token), ensuring any non-`--` prefixed token
is reported as unrecognized so callers fail fast rather than being ignored.

Comment on lines +301 to +306
except Exception as e:
print(f"[serve_unified] Error: {e}")
import traceback

traceback.print_exc()
sys.exit(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Broad except Exception in main is redundant and violates coding guidelines.

uvicorn.run and create_app failures are not silently suppressed here (the traceback is printed), but Python's own unhandled-exception handler provides the same traceback and non-zero exit by default. Wrapping with a broad except Exception violates the guideline "Do not catch exceptions when they are not normally expected to be raised". The deferred import traceback inside the except block is also an unnecessary pattern.

🐛 Proposed fix — remove the redundant handler
-    except ImportError as e:
-        print(f"[serve_unified] Error: Failed to import unified server: {e}")
-        print("[serve_unified] Make sure the recipes.multimodal.server package is in PYTHONPATH")
-        sys.exit(1)
-    except Exception as e:
-        print(f"[serve_unified] Error: {e}")
-        import traceback
-
-        traceback.print_exc()
-        sys.exit(1)
+    except ImportError as e:
+        print(f"[serve_unified] Error: Failed to import unified server: {e}")
+        print("[serve_unified] Make sure the recipes.multimodal.server package is in PYTHONPATH")
+        sys.exit(1)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
except Exception as e:
print(f"[serve_unified] Error: {e}")
import traceback
traceback.print_exc()
sys.exit(1)
🧰 Tools
🪛 Ruff (0.15.1)

[warning] 301-301: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/inference/server/serve_unified.py` around lines 301 - 306, Remove
the redundant broad exception handler in the main flow: delete the "except
Exception as e:" block that prints the error, defers "import traceback", calls
traceback.print_exc(), and sys.exit(1); let unhandled exceptions from create_app
and uvicorn.run propagate so Python's default handler provides the traceback and
non-zero exit. Specifically, modify the main function (the code surrounding
create_app(...) and uvicorn.run(...)) to not catch Exception, and remove the
deferred import of traceback and the sys.exit call tied to that handler.

@vmendelev vmendelev force-pushed the pr2/unified-server-refactor branch from 3bb29cb to 424705c Compare February 25, 2026 17:29
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (8)
tests/slurm-tests/unified_asr/run_test.py (1)

110-110: Same unquoted workspace issue as in unified_tts/run_test.py line 127.

The fix is symmetric — see the comment raised there.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/slurm-tests/unified_asr/run_test.py` at line 110, The checker_cmd
f-string uses unquoted args.workspace which breaks when the workspace path
contains spaces; update the construction of checker_cmd to safely quote or
shell-escape args.workspace (e.g. use shlex.quote on args.workspace) and ensure
shlex is imported where checker_cmd is defined so the command becomes safe for
shell execution; refer to the checker_cmd variable and args.workspace in
run_test.py when making the change.
recipes/multimodal/server/backends/base.py (2)

237-244: ⚠️ Potential issue | 🟠 Major

validate_request ignores multi-turn audio fields (audio_bytes_list, audio_paths).

A request with only audio_bytes_list set passes neither the audio check nor the text check, so it's rejected as "must have either text or audio input" even though audio was provided.

Proposed fix
         has_text_input = request.text is not None
-        has_audio_input = request.audio_bytes is not None or request.audio_path is not None
+        has_audio_input = (
+            request.audio_bytes is not None
+            or request.audio_path is not None
+            or (request.audio_bytes_list is not None and len(request.audio_bytes_list) > 0)
+            or (request.audio_paths is not None and len(request.audio_paths) > 0)
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/multimodal/server/backends/base.py` around lines 237 - 244, The
validate_request logic incorrectly computes has_audio_input, ignoring multi-turn
fields audio_bytes_list and audio_paths; update the has_audio_input check in
validate_request to also consider non-empty request.audio_bytes_list and
request.audio_paths (e.g., not None and len(...)>0) so requests with multi-turn
audio are treated as audio input, and keep the existing Modality.AUDIO_IN check
and error message (refer to validate_request, has_text_input, has_audio_input,
Modality.AUDIO_IN, and self.name).

220-227: ⚠️ Potential issue | 🔴 Critical

Bug: or operator silently discards explicit 0 / 0.0 values.

request.temperature or self.config.temperature evaluates to the config default when the user explicitly sets temperature=0.0 (greedy decoding), because 0.0 is falsy. Same for top_k=0 and max_new_tokens=0.

Proposed fix
     def get_generation_params(self, request: GenerationRequest) -> Dict[str, Any]:
         """Get effective generation parameters, merging request with config defaults."""
         return {
-            "max_new_tokens": request.max_new_tokens or self.config.max_new_tokens,
-            "temperature": request.temperature or self.config.temperature,
-            "top_p": request.top_p or self.config.top_p,
-            "top_k": request.top_k or self.config.top_k,
+            "max_new_tokens": request.max_new_tokens if request.max_new_tokens is not None else self.config.max_new_tokens,
+            "temperature": request.temperature if request.temperature is not None else self.config.temperature,
+            "top_p": request.top_p if request.top_p is not None else self.config.top_p,
+            "top_k": request.top_k if request.top_k is not None else self.config.top_k,
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/multimodal/server/backends/base.py` around lines 220 - 227, The
get_generation_params method is incorrectly using "or" which treats explicit
zero values (e.g., request.temperature == 0.0, request.top_k == 0,
request.max_new_tokens == 0) as falsy and falls back to self.config defaults;
update get_generation_params to use explicit None checks so each field uses the
request value when it is not None (e.g., use "request.max_new_tokens if
request.max_new_tokens is not None else self.config.max_new_tokens", same for
temperature, top_p, top_k) while keeping the return keys unchanged and the type
signature (GenerationRequest and self.config) intact.
recipes/multimodal/server/unified_server.py (2)

182-198: ⚠️ Potential issue | 🟠 Major

extract_text_from_messages includes system-role text, duplicating the system prompt.

System messages are extracted separately by extract_system_prompt (line 333), but extract_text_from_messages also collects them. Both values are passed to GenerationRequest, causing the system prompt to appear in both text and system_prompt fields.

Proposed fix
 def extract_text_from_messages(messages: List[Dict[str, Any]]) -> str:
     """Extract text content from OpenAI-format messages."""
     texts = []
     for message in messages:
+        if message.get("role") == "system":
+            continue
         content = message.get("content")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/multimodal/server/unified_server.py` around lines 182 - 198,
extract_text_from_messages is currently including messages with role "system",
causing the system prompt to be duplicated because extract_system_prompt also
extracts it and both are passed into GenerationRequest; update
extract_text_from_messages to skip any message where message.get("role") ==
"system" before processing content so system messages are not appended to the
returned text. Keep all existing handling for string and list content, only add
the role check at the top of the loop in extract_text_from_messages to prevent
duplication with extract_system_prompt/GenerationRequest.

118-122: ⚠️ Potential issue | 🟠 Major

zip(batch, results) silently truncates if backend returns fewer results than requests.

If self.backend.generate returns a shorter list, some futures will never resolve. Add a length check or use strict=True.

Proposed fix
+            if len(results) != len(batch):
+                raise RuntimeError(
+                    f"Backend returned {len(results)} results for {len(batch)} requests"
+                )
-            for pending, result in zip(batch, results):
+            for pending, result in zip(batch, results, strict=True):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/multimodal/server/unified_server.py` around lines 118 - 122, The loop
that pairs batch with results uses zip(batch, results) which silently truncates
if self.backend.generate returns fewer items, leaving some pending.future
unresolved; change the pairing to validate lengths (e.g., compare len(results)
to len(batch)) or use zip(..., strict=True) and, if a mismatch occurs, set an
exception on any remaining pending.future entries (via
pending.future.set_exception(...)) and/or raise an explicit error so callers
aren't left hanging; update the block around loop.run_in_executor and uses of
results, batch, pending.future.set_result to perform this length check and
handle the short/long result cases deterministically.
nemo_skills/inference/server/serve_unified.py (3)

119-124: ⚠️ Potential issue | 🟡 Minor

yaml.safe_load returns None for empty files — will crash at line 244.

If the YAML file is empty, safe_load returns None, and config_dict.pop("backend", ...) will raise AttributeError.

Proposed fix
 def load_yaml_config(config_path: str) -> dict:
     """Load YAML config file."""
     import yaml
 
     with open(config_path) as f:
-        return yaml.safe_load(f)
+        config = yaml.safe_load(f)
+        if not isinstance(config, dict):
+            raise ValueError(f"Config file '{config_path}' must contain a YAML mapping, got {type(config)}")
+        return config
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/inference/server/serve_unified.py` around lines 119 - 124, The
YAML loader can return None for empty files causing later code (e.g.,
config_dict.pop("backend", ...)) to fail; update load_yaml_config to ensure it
always returns a dict by calling yaml.safe_load(f) into a variable, then return
that variable if it's a dict or return an empty dict ({}) when safe_load returns
None or a non-dict value so callers (like code that uses config_dict.pop) never
receive None.

155-159: parse_extra_args silently drops non--- prefixed tokens.

Stray positional arguments or single-dash typos (e.g., -model) are silently ignored. Consider raising on unrecognized tokens.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/inference/server/serve_unified.py` around lines 155 - 159, The
loop in parse_extra_args silently skips tokens that don't start with "--"
(variable extra_args, loop variable arg), causing stray positional args or
single-dash typos to be dropped; update parse_extra_args to treat non-"--"
prefixed tokens as invalid by raising a clear exception (e.g., ValueError) or
returning an error with context (include the offending token and suggestion to
use double-dash flags), instead of incrementing i and continuing, so callers are
immediately notified of unrecognized/typoed arguments.

103-116: ⚠️ Potential issue | 🟡 Minor

apply_safetensors_patch silently swallows failures.

If the file copy fails, the server continues but model loading may later fail with a confusing error. Consider separating ImportError (safetensors not installed — skip) from other exceptions (propagate).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/inference/server/serve_unified.py` around lines 103 - 116, The
apply_safetensors_patch function currently catches all exceptions and hides real
errors; change its error handling so that ImportError (or ModuleNotFoundError)
when importing safetensors.torch is handled by returning/skip, but any other
exceptions (e.g., failures in inspect.getfile, os.makedirs, shutil.copyfile) are
not swallowed—log a clear error including hack_path and dest_path and re-raise
the exception so the caller sees the failure; update the try/except around the
import to only catch import-related errors and move file operations into their
own try/except that logs and re-raises on unexpected errors, referencing
apply_safetensors_patch, hack_path, st_torch, and dest_path to locate the code.
🧹 Nitpick comments (11)
tests/slurm-tests/unified_asr/run_test.py (1)

52-64: Clarify the intent of ++server.server_type=vllm_multimodal alongside server_type="generic".

Line 52 passes a Hydra override ++server.server_type=vllm_multimodal while line 64 passes server_type="generic" as a keyword argument to generate(). These appear to configure the same concept through two different mechanisms. If vllm_multimodal is intentional for the inference client side (controlling how requests are formatted) and generic is the server-launcher type, a brief comment explaining the distinction would prevent future confusion. The same pattern exists in unified_tts/run_test.py.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/slurm-tests/unified_asr/run_test.py` around lines 52 - 64, Clarify the
dual configuration by documenting which setting controls the client request
format vs the server launcher: in the call to generate(...) (symbol: generate)
add a brief comment explaining that the Hydra override
++server.server_type=vllm_multimodal configures the inference client/request
formatting while the keyword argument server_type="generic" selects the
server-launcher implementation, and do the same explanatory comment in
unified_tts/run_test.py to avoid future confusion.
tests/slurm-tests/unified_asr/check_results.py (1)

64-73: load_outputs is duplicated verbatim from unified_tts/check_results.py.

Identical implementation — consider hoisting it into tests/slurm-tests/utils.py alongside soft_assert/assert_all.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/slurm-tests/unified_asr/check_results.py` around lines 64 - 73, The
load_outputs implementation is duplicated; move the function into a shared
module (e.g., tests/slurm-tests/utils.py) alongside existing helpers
soft_assert/assert_all, export it, and replace the local definition in
tests/slurm-tests/unified_asr/check_results.py with an import of load_outputs
from that utils module so both unified_asr/check_results.py and
unified_tts/check_results.py reuse the single shared function.
tests/slurm-tests/unified_tts/run_test.py (1)

30-32: ensure_workspace_exists is duplicated verbatim from unified_asr/run_test.py.

Both files define the exact same three-line function. Consider extracting it into the shared tests/slurm-tests/utils.py (which already hosts soft_assert / assert_all) and importing it from there.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/slurm-tests/unified_tts/run_test.py` around lines 30 - 32, The function
ensure_workspace_exists is duplicated; move that implementation into the shared
utils module (utils.py) so both tests reuse it: add
ensure_workspace_exists(workspace: str, cluster: str, config_dir: str | None =
None) to utils.py (it should call get_cluster_config and
create_remote_directory) and then replace the local definitions in both
run_test.py files with an import from utils (e.g., from utils import
ensure_workspace_exists); update any imports if
get_cluster_config/create_remote_directory are in different modules so utils has
the needed imports, and remove the duplicated function bodies.
recipes/multimodal/server/backends/nemo_asr_backend.py (2)

39-61: Hardcoded known field set in from_dict can drift from actual dataclass fields.

Both NeMoASRConfig and MagpieTTSConfig maintain a manually-listed known set instead of using cls.__dataclass_fields__ like the parent BackendConfig.from_dict() does. If a field is added to the dataclass but forgotten in known, it will silently land in extra_config.

Suggested approach — reuse parent pattern
     `@classmethod`
     def from_dict(cls, d: Dict[str, Any]) -> "NeMoASRConfig":
         if d.get("model_name") and not d.get("model_path"):
             d = {**d, "model_path": d["model_name"]}
-        known = {
-            "model_path",
-            "model_name",
-            ...
-        }
+        known = {f.name for f in cls.__dataclass_fields__.values()} - {"extra_config"}
         return cls(
             **{k: v for k, v in d.items() if k in known},
             extra_config={k: v for k, v in d.items() if k not in known},
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/multimodal/server/backends/nemo_asr_backend.py` around lines 39 - 61,
The from_dict method in NeMoASRConfig currently uses a hardcoded known set which
can drift from the dataclass fields; update NeMoASRConfig.from_dict to mirror
the parent BackendConfig.from_dict pattern by deriving the known keys from
cls.__dataclass_fields__ (e.g., known = set(cls.__dataclass_fields__.keys()))
and then use that set to split d into constructor kwargs and extra_config,
keeping the existing model_name->model_path override logic intact; apply the
same change for MagpieTTSConfig.from_dict if present.

100-104: Silenced .to(device) exception — consider logging the swallowed error.

The broad except Exception: pass when moving the model to a device can mask real failures (e.g., OOM). At minimum, log the exception at debug level so it's diagnosable.

Proposed fix
         try:
             self._model.to(self.config.device)
-        except Exception:
-            # Some NeMo wrappers may already be on the correct device.
-            pass
+        except Exception as e:
+            print(f"[NeMoASRBackend] Note: .to({self.config.device}) skipped: {e}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/multimodal/server/backends/nemo_asr_backend.py` around lines 100 -
104, Replace the bare "except Exception: pass" around
self._model.to(self.config.device) with an exception handler that captures the
exception (e.g., except Exception as e) and logs the full error/traceback at
debug level so failures (including OOM) are diagnosable; use an existing logger
(self.logger if available) or logging.getLogger(__name__) and call
logger.debug("Failed to move model to %s", self.config.device, exc_info=True)
before continuing.
recipes/multimodal/server/backends/magpie_tts_backend.py (3)

413-415: __del__ for temp directory cleanup is fragile.

__del__ is not guaranteed to run (especially during interpreter shutdown). Since each generate() call already cleans up its batch_dir, the _temp_dir parent would only contain stale data on unclean exits. Consider implementing a context manager or explicit shutdown() method that the server can call.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/multimodal/server/backends/magpie_tts_backend.py` around lines 413 -
415, The finalizer __del__ used to remove self._temp_dir is fragile; replace it
with an explicit cleanup API and/or context manager so the server can
deterministically remove stale files. Add a public method (e.g., shutdown() or
close()) on the MagpieTTS backend that checks getattr(self, "_temp_dir", None)
and removes it (shutil.rmtree(...)) and document that the server must call this
on shutdown; alternatively implement __enter__/__exit__ on the same class so
callers can use a with-statement to ensure cleanup; keep generate() behavior of
removing each batch_dir but remove the reliance on __del__ for parent _temp_dir
cleanup.

283-292: Silently swallowed cache reset and normalization failures could mask issues.

Lines 290-291 (except Exception: pass on cache reset) and 321-322 (same for normalization) both swallow errors silently. While defensive, at least logging at debug level would help diagnose unexpected failures during generation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/multimodal/server/backends/magpie_tts_backend.py` around lines 283 -
292, The code currently swallows exceptions during the cache reset (around
decoder.reset_cache on self._model) and the subsequent normalization step;
change each bare "except Exception: pass" to log the caught exception at debug
level (e.g., "except Exception as e: self._logger.debug('Failed to reset decoder
cache', exc_info=e)" or similar) so failures are recorded without crashing;
ensure you reference the same symbols (self._model, decoder.reset_cache and the
normalization call) and preserve existing behavior (do not re-raise) while
including contextual messages and exception info in the debug logs.

45-76: Same hardcoded known field set pattern as NeMoASRConfig.from_dict.

See the comment on nemo_asr_backend.py — using cls.__dataclass_fields__ avoids maintaining a duplicate list of field names.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/multimodal/server/backends/magpie_tts_backend.py` around lines 45 -
76, The from_dict method in MagpieTTSConfig currently builds a hardcoded known
set; replace that by deriving allowed field names from the dataclass instead
(similar to NeMoASRConfig): use cls.__dataclass_fields__ to compute the set of
known keys, then construct the kwargs and extra_config exactly as before (i.e.,
pass {k:v for k,v in d.items() if k in dataclass_fields} into cls(...) and put
the rest into extra_config), and keep the existing CLI alias handling for
"codec_model" intact; update references to the local variable "known" to the
computed dataclass field set so you don't maintain a duplicate list.
recipes/multimodal/__init__.py (1)

23-37: Consider re-exporting Modality for external consumers.

Modality is part of the public API (used in supported_modalities property), yet it's not re-exported here. External consumers would need to reach into recipes.multimodal.server.backends.base to reference it. If the intent is to keep the surface minimal, this is fine — just flagging the asymmetry.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/multimodal/__init__.py` around lines 23 - 37, The package is missing
a public re-export of Modality which external consumers need for things like
InferenceBackend.supported_modalities; import Modality from its defining module
(base) and add it to the top-level exports alongside InferenceBackend,
GenerationRequest, GenerationResult, BackendConfig, and get_backend so callers
can reference Modality without reaching into
recipes.multimodal.server.backends.base.
recipes/multimodal/server/unified_server.py (2)

78-90: Fire-and-forget asyncio.create_task — store task references to avoid silent exception loss.

Tasks created at lines 84, 86, 99, and 136 are not stored. If they raise unexpectedly, the exception is silently dropped ("Task exception was never retrieved" warning). Store references in a set and discard them on completion.

Proposed approach
 class RequestBatcher:
     def __init__(self, backend, batch_size: int, batch_timeout: float):
         ...
+        self._background_tasks: set = set()
+
+    def _track_task(self, task: asyncio.Task):
+        self._background_tasks.add(task)
+        task.add_done_callback(self._background_tasks.discard)

     async def add_request(self, request: GenerationRequest) -> GenerationResult:
         ...
             if len(self.pending_requests) >= self.batch_size:
-                asyncio.create_task(self._process_batch())
+                self._track_task(asyncio.create_task(self._process_batch()))

Apply the same pattern to lines 86, 99, and 136.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/multimodal/server/unified_server.py` around lines 78 - 90, The
fire-and-forget asyncio.create_task calls that launch _process_batch and
_timeout_handler must be tracked to avoid lost exceptions: add a self._tasks:
Set[asyncio.Task] (initialized in the class __init__), replace each
asyncio.create_task(...) in the batcher (the places that reference
_process_batch and _timeout_handler when managing pending_requests, batch_size,
and timeout_task) with creating the task, adding it to self._tasks, and
attaching a done-callback that removes it from the set and retrieves/logs
task.exception() (or calls task.exception() inside the callback to suppress
"Task exception was never retrieved"). Ensure you apply this pattern for the
create_task calls that currently live alongside pending_requests, batch_size,
timeout_task and any other create_task usages in the same class (e.g., the ones
around lines referencing _process_batch and _timeout_handler).

245-278: Replace @app.on_event("startup") with FastAPI's lifespan context manager pattern. The on_event decorator is deprecated in FastAPI's API reference in favor of the ASGI lifespan pattern, which consolidates startup and shutdown logic in a single async context manager with yield. FastAPI will stop supporting on_event in a future version.

from contextlib import asynccontextmanager
from fastapi import FastAPI

`@asynccontextmanager`
async def lifespan(app: FastAPI):
    # startup code
    yield
    # shutdown code

app = FastAPI(lifespan=lifespan)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/multimodal/server/unified_server.py` around lines 245 - 278, Replace
the deprecated `@app.on_event`("startup") startup() function with a FastAPI
lifespan asynccontextmanager: move all startup logic (lookup via
get_backend(backend_type), ConfigClass.from_dict(config_dict), instantiating
BackendClass and calling backend_instance.load_model(), creating RequestBatcher
as request_batcher, registering extra routes via BackendClass.get_extra_routes()
and app.add_api_route(), and the print-ready logs) into the pre-yield section of
the lifespan, yield control, then put any shutdown/cleanup (teardown of
request_batcher/backend_instance if needed) after the yield; register the
lifespan with FastAPI via app = FastAPI(lifespan=lifespan) so the same symbols
(BackendClass, ConfigClass, backend_instance, request_batcher, RequestBatcher,
get_backend) are used but following the ASGI lifespan pattern.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@recipes/multimodal/server/unified_server.py`:
- Around line 115-122: Replace asyncio.get_event_loop() with
asyncio.get_running_loop() in the batch processing code so the coroutine uses
the currently running loop; specifically update the call that assigns loop =
asyncio.get_event_loop() (the loop used to call loop.run_in_executor(None,
self.backend.generate, requests)) to loop = asyncio.get_running_loop() to avoid
the deprecation warning and ensure it runs in the active FastAPI async context.

In `@tests/slurm-tests/unified_asr/check_results.py`:
- Around line 38-47: normalize_text only maps single-character tokens via
_DIGIT_TO_WORD so multi-digit numeric tokens like "42" or "100" remain
unchanged; update normalize_text to detect token.isdigit() (or a cleaned numeric
token) and expand multi-digit numbers by mapping each digit to its word form
(e.g., tokens.append(" ".join(_DIGIT_TO_WORD[d] for d in token))) or
alternatively use a numeric-to-words utility (num2words) to convert the whole
token, keeping the current single-digit logic for token in _DIGIT_TO_WORD and
preserving non-numeric tokens as-is.
- Around line 55-61: Accessing row["_reference"] can raise KeyError and crash
the checker; change the code that reads reference_path (the loop that parses
each line into row and assigns refs[row["id"]] = row["_reference"]) to handle
missing keys gracefully: use row.get("_reference") (or wrap in try/except
KeyError) and when the key is absent, record a soft failure (e.g., append a
message to the existing failures/alerts collection or call the test failure
logger) and skip adding that entry to refs, then continue processing. Ensure you
reference variables used in the diff (reference_path.open, row, refs) so the fix
is applied in the same block that builds refs.

In `@tests/slurm-tests/unified_tts/run_test.py`:
- Line 127: The checker_cmd string uses an unquoted args.workspace which will
break when the workspace path contains spaces; change its construction to safely
quote the workspace value (e.g., use shlex.quote(args.workspace) or otherwise
wrap the workspace in quotes) so that wrap_arguments(checker_cmd) receives a
single argument for the workspace; apply the same fix for the similar
checker_cmd in unified_asr run_test.py (the variable name checker_cmd and the
call to wrap_arguments remain the targets to edit).

In `@tests/test_nemo_asr_backend.py`:
- Around line 1-2: Add the standard NVIDIA copyright header to the very top of
tests/test_nemo_asr_backend.py (before any imports) to satisfy CI; ensure the
header block is the exact company copyright text used across the repo and leave
a blank line after the header before the first import (which currently imports
GenerationRequest and NeMoASRBackend/NeMoASRConfig).

---

Duplicate comments:
In `@nemo_skills/inference/server/serve_unified.py`:
- Around line 119-124: The YAML loader can return None for empty files causing
later code (e.g., config_dict.pop("backend", ...)) to fail; update
load_yaml_config to ensure it always returns a dict by calling yaml.safe_load(f)
into a variable, then return that variable if it's a dict or return an empty
dict ({}) when safe_load returns None or a non-dict value so callers (like code
that uses config_dict.pop) never receive None.
- Around line 155-159: The loop in parse_extra_args silently skips tokens that
don't start with "--" (variable extra_args, loop variable arg), causing stray
positional args or single-dash typos to be dropped; update parse_extra_args to
treat non-"--" prefixed tokens as invalid by raising a clear exception (e.g.,
ValueError) or returning an error with context (include the offending token and
suggestion to use double-dash flags), instead of incrementing i and continuing,
so callers are immediately notified of unrecognized/typoed arguments.
- Around line 103-116: The apply_safetensors_patch function currently catches
all exceptions and hides real errors; change its error handling so that
ImportError (or ModuleNotFoundError) when importing safetensors.torch is handled
by returning/skip, but any other exceptions (e.g., failures in inspect.getfile,
os.makedirs, shutil.copyfile) are not swallowed—log a clear error including
hack_path and dest_path and re-raise the exception so the caller sees the
failure; update the try/except around the import to only catch import-related
errors and move file operations into their own try/except that logs and
re-raises on unexpected errors, referencing apply_safetensors_patch, hack_path,
st_torch, and dest_path to locate the code.

In `@recipes/multimodal/server/backends/base.py`:
- Around line 237-244: The validate_request logic incorrectly computes
has_audio_input, ignoring multi-turn fields audio_bytes_list and audio_paths;
update the has_audio_input check in validate_request to also consider non-empty
request.audio_bytes_list and request.audio_paths (e.g., not None and len(...)>0)
so requests with multi-turn audio are treated as audio input, and keep the
existing Modality.AUDIO_IN check and error message (refer to validate_request,
has_text_input, has_audio_input, Modality.AUDIO_IN, and self.name).
- Around line 220-227: The get_generation_params method is incorrectly using
"or" which treats explicit zero values (e.g., request.temperature == 0.0,
request.top_k == 0, request.max_new_tokens == 0) as falsy and falls back to
self.config defaults; update get_generation_params to use explicit None checks
so each field uses the request value when it is not None (e.g., use
"request.max_new_tokens if request.max_new_tokens is not None else
self.config.max_new_tokens", same for temperature, top_p, top_k) while keeping
the return keys unchanged and the type signature (GenerationRequest and
self.config) intact.

In `@recipes/multimodal/server/unified_server.py`:
- Around line 182-198: extract_text_from_messages is currently including
messages with role "system", causing the system prompt to be duplicated because
extract_system_prompt also extracts it and both are passed into
GenerationRequest; update extract_text_from_messages to skip any message where
message.get("role") == "system" before processing content so system messages are
not appended to the returned text. Keep all existing handling for string and
list content, only add the role check at the top of the loop in
extract_text_from_messages to prevent duplication with
extract_system_prompt/GenerationRequest.
- Around line 118-122: The loop that pairs batch with results uses zip(batch,
results) which silently truncates if self.backend.generate returns fewer items,
leaving some pending.future unresolved; change the pairing to validate lengths
(e.g., compare len(results) to len(batch)) or use zip(..., strict=True) and, if
a mismatch occurs, set an exception on any remaining pending.future entries (via
pending.future.set_exception(...)) and/or raise an explicit error so callers
aren't left hanging; update the block around loop.run_in_executor and uses of
results, batch, pending.future.set_result to perform this length check and
handle the short/long result cases deterministically.

In `@tests/slurm-tests/unified_asr/run_test.py`:
- Line 110: The checker_cmd f-string uses unquoted args.workspace which breaks
when the workspace path contains spaces; update the construction of checker_cmd
to safely quote or shell-escape args.workspace (e.g. use shlex.quote on
args.workspace) and ensure shlex is imported where checker_cmd is defined so the
command becomes safe for shell execution; refer to the checker_cmd variable and
args.workspace in run_test.py when making the change.

---

Nitpick comments:
In `@recipes/multimodal/__init__.py`:
- Around line 23-37: The package is missing a public re-export of Modality which
external consumers need for things like InferenceBackend.supported_modalities;
import Modality from its defining module (base) and add it to the top-level
exports alongside InferenceBackend, GenerationRequest, GenerationResult,
BackendConfig, and get_backend so callers can reference Modality without
reaching into recipes.multimodal.server.backends.base.

In `@recipes/multimodal/server/backends/magpie_tts_backend.py`:
- Around line 413-415: The finalizer __del__ used to remove self._temp_dir is
fragile; replace it with an explicit cleanup API and/or context manager so the
server can deterministically remove stale files. Add a public method (e.g.,
shutdown() or close()) on the MagpieTTS backend that checks getattr(self,
"_temp_dir", None) and removes it (shutil.rmtree(...)) and document that the
server must call this on shutdown; alternatively implement __enter__/__exit__ on
the same class so callers can use a with-statement to ensure cleanup; keep
generate() behavior of removing each batch_dir but remove the reliance on
__del__ for parent _temp_dir cleanup.
- Around line 283-292: The code currently swallows exceptions during the cache
reset (around decoder.reset_cache on self._model) and the subsequent
normalization step; change each bare "except Exception: pass" to log the caught
exception at debug level (e.g., "except Exception as e:
self._logger.debug('Failed to reset decoder cache', exc_info=e)" or similar) so
failures are recorded without crashing; ensure you reference the same symbols
(self._model, decoder.reset_cache and the normalization call) and preserve
existing behavior (do not re-raise) while including contextual messages and
exception info in the debug logs.
- Around line 45-76: The from_dict method in MagpieTTSConfig currently builds a
hardcoded known set; replace that by deriving allowed field names from the
dataclass instead (similar to NeMoASRConfig): use cls.__dataclass_fields__ to
compute the set of known keys, then construct the kwargs and extra_config
exactly as before (i.e., pass {k:v for k,v in d.items() if k in
dataclass_fields} into cls(...) and put the rest into extra_config), and keep
the existing CLI alias handling for "codec_model" intact; update references to
the local variable "known" to the computed dataclass field set so you don't
maintain a duplicate list.

In `@recipes/multimodal/server/backends/nemo_asr_backend.py`:
- Around line 39-61: The from_dict method in NeMoASRConfig currently uses a
hardcoded known set which can drift from the dataclass fields; update
NeMoASRConfig.from_dict to mirror the parent BackendConfig.from_dict pattern by
deriving the known keys from cls.__dataclass_fields__ (e.g., known =
set(cls.__dataclass_fields__.keys())) and then use that set to split d into
constructor kwargs and extra_config, keeping the existing model_name->model_path
override logic intact; apply the same change for MagpieTTSConfig.from_dict if
present.
- Around line 100-104: Replace the bare "except Exception: pass" around
self._model.to(self.config.device) with an exception handler that captures the
exception (e.g., except Exception as e) and logs the full error/traceback at
debug level so failures (including OOM) are diagnosable; use an existing logger
(self.logger if available) or logging.getLogger(__name__) and call
logger.debug("Failed to move model to %s", self.config.device, exc_info=True)
before continuing.

In `@recipes/multimodal/server/unified_server.py`:
- Around line 78-90: The fire-and-forget asyncio.create_task calls that launch
_process_batch and _timeout_handler must be tracked to avoid lost exceptions:
add a self._tasks: Set[asyncio.Task] (initialized in the class __init__),
replace each asyncio.create_task(...) in the batcher (the places that reference
_process_batch and _timeout_handler when managing pending_requests, batch_size,
and timeout_task) with creating the task, adding it to self._tasks, and
attaching a done-callback that removes it from the set and retrieves/logs
task.exception() (or calls task.exception() inside the callback to suppress
"Task exception was never retrieved"). Ensure you apply this pattern for the
create_task calls that currently live alongside pending_requests, batch_size,
timeout_task and any other create_task usages in the same class (e.g., the ones
around lines referencing _process_batch and _timeout_handler).
- Around line 245-278: Replace the deprecated `@app.on_event`("startup") startup()
function with a FastAPI lifespan asynccontextmanager: move all startup logic
(lookup via get_backend(backend_type), ConfigClass.from_dict(config_dict),
instantiating BackendClass and calling backend_instance.load_model(), creating
RequestBatcher as request_batcher, registering extra routes via
BackendClass.get_extra_routes() and app.add_api_route(), and the print-ready
logs) into the pre-yield section of the lifespan, yield control, then put any
shutdown/cleanup (teardown of request_batcher/backend_instance if needed) after
the yield; register the lifespan with FastAPI via app =
FastAPI(lifespan=lifespan) so the same symbols (BackendClass, ConfigClass,
backend_instance, request_batcher, RequestBatcher, get_backend) are used but
following the ASGI lifespan pattern.

In `@tests/slurm-tests/unified_asr/check_results.py`:
- Around line 64-73: The load_outputs implementation is duplicated; move the
function into a shared module (e.g., tests/slurm-tests/utils.py) alongside
existing helpers soft_assert/assert_all, export it, and replace the local
definition in tests/slurm-tests/unified_asr/check_results.py with an import of
load_outputs from that utils module so both unified_asr/check_results.py and
unified_tts/check_results.py reuse the single shared function.

In `@tests/slurm-tests/unified_asr/run_test.py`:
- Around line 52-64: Clarify the dual configuration by documenting which setting
controls the client request format vs the server launcher: in the call to
generate(...) (symbol: generate) add a brief comment explaining that the Hydra
override ++server.server_type=vllm_multimodal configures the inference
client/request formatting while the keyword argument server_type="generic"
selects the server-launcher implementation, and do the same explanatory comment
in unified_tts/run_test.py to avoid future confusion.

In `@tests/slurm-tests/unified_tts/run_test.py`:
- Around line 30-32: The function ensure_workspace_exists is duplicated; move
that implementation into the shared utils module (utils.py) so both tests reuse
it: add ensure_workspace_exists(workspace: str, cluster: str, config_dir: str |
None = None) to utils.py (it should call get_cluster_config and
create_remote_directory) and then replace the local definitions in both
run_test.py files with an import from utils (e.g., from utils import
ensure_workspace_exists); update any imports if
get_cluster_config/create_remote_directory are in different modules so utils has
the needed imports, and remove the duplicated function bodies.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3bb29cb and 424705c.

📒 Files selected for processing (18)
  • nemo_skills/inference/server/serve_unified.py
  • recipes/multimodal/__init__.py
  • recipes/multimodal/server/__init__.py
  • recipes/multimodal/server/backends/__init__.py
  • recipes/multimodal/server/backends/base.py
  • recipes/multimodal/server/backends/magpie_tts_backend.py
  • recipes/multimodal/server/backends/nemo_asr_backend.py
  • recipes/multimodal/server/unified_server.py
  • tests/slurm-tests/run_all.sh
  • tests/slurm-tests/unified_asr/asr_openai.test
  • tests/slurm-tests/unified_asr/check_results.py
  • tests/slurm-tests/unified_asr/run_test.py
  • tests/slurm-tests/unified_tts/README.md
  • tests/slurm-tests/unified_tts/check_results.py
  • tests/slurm-tests/unified_tts/run_test.py
  • tests/slurm-tests/unified_tts/tts_openai.test
  • tests/test_nemo_asr_backend.py
  • tests/test_unified_server_audio_parser.py
✅ Files skipped from review due to trivial changes (1)
  • tests/slurm-tests/unified_asr/asr_openai.test

Comment on lines +55 to +61
with reference_path.open("rt", encoding="utf-8") as fin:
for line in fin:
if not line.strip():
continue
row = json.loads(line)
refs[row["id"]] = row["_reference"]
return refs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

row["_reference"] raises an unhandled KeyError if the field is absent.

If any row in asr_openai.test is missing the "_reference" key (e.g., a row added for a different purpose or a malformed test file), this will hard-crash the script rather than recording a soft assertion failure like the rest of the checker.

🛡️ Proposed fix
-            refs[row["id"]] = row["_reference"]
+            ref = row.get("_reference")
+            if ref is None:
+                soft_assert(False, f"Row id={row.get('id')} missing '_reference' field in test file")
+            else:
+                refs[row["id"]] = ref
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/slurm-tests/unified_asr/check_results.py` around lines 55 - 61,
Accessing row["_reference"] can raise KeyError and crash the checker; change the
code that reads reference_path (the loop that parses each line into row and
assigns refs[row["id"]] = row["_reference"]) to handle missing keys gracefully:
use row.get("_reference") (or wrap in try/except KeyError) and when the key is
absent, record a soft failure (e.g., append a message to the existing
failures/alerts collection or call the test failure logger) and skip adding that
entry to refs, then continue processing. Ensure you reference variables used in
the diff (reference_path.open, row, refs) so the fix is applied in the same
block that builds refs.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (3)
recipes/multimodal/server/unified_server.py (2)

184-197: ⚠️ Potential issue | 🟠 Major

Exclude role=="system" from extract_text_from_messages to prevent prompt duplication.

System text is currently included in both text (via this function) and system_prompt (via extract_system_prompt), which can double-apply instructions.

🔧 Proposed fix
 def extract_text_from_messages(messages: List[Dict[str, Any]]) -> str:
     """Extract text content from OpenAI-format messages."""
     texts = []
     for message in messages:
+        if message.get("role") == "system":
+            continue
         content = message.get("content")
         if isinstance(content, str):
             if content:
                 texts.append(content)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/multimodal/server/unified_server.py` around lines 184 - 197, The
function extract_text_from_messages should skip messages where
message.get("role") == "system" to avoid duplicating system prompt text (which
is already handled by extract_system_prompt); update the loop in
extract_text_from_messages to check message.get("role") and continue when it
equals "system", keeping the existing logic for string, list, dict items (look
for "content", list items with type=="text", and string items) so only
non-system content is collected and returned.

117-121: ⚠️ Potential issue | 🔴 Critical

Guard against backend/result length mismatch before completing futures.

Line 119 still uses zip(batch, results) without validating result count. If backend returns fewer items, some futures are never resolved and requests can hang.

🔧 Proposed fix
             loop = asyncio.get_event_loop()
             results = await loop.run_in_executor(None, self.backend.generate, requests)

-            for pending, result in zip(batch, results):
+            if len(results) != len(batch):
+                err = RuntimeError(
+                    f"Backend returned {len(results)} results for {len(batch)} requests"
+                )
+                for pending in batch:
+                    if not pending.future.done():
+                        pending.future.set_exception(err)
+                return
+
+            for pending, result in zip(batch, results, strict=True):
                 if not pending.future.done():
                     pending.future.set_result(result)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/multimodal/server/unified_server.py` around lines 117 - 121, Validate
the number of items returned from self.backend.generate against the batch size
before using zip: after results = await loop.run_in_executor(None,
self.backend.generate, requests) check len(results) vs len(batch) and for each
index i in range(len(batch)) if i < len(results) set
pending.future.set_result(results[i]) (skipping if future.done()), otherwise set
pending.future.set_exception(RuntimeError(...)) describing the backend returned
too few results; also handle the inverse case (len(results) > len(batch)) by
logging/ignoring extra results so no futures are left unresolved; reference
batch, results, self.backend.generate, loop.run_in_executor, and
pending.future.set_result when making the change.
tests/slurm-tests/unified_tts/run_test.py (1)

126-129: ⚠️ Potential issue | 🟡 Minor

checker_cmd still breaks on workspace paths containing spaces.

Line 126 and Line 128 still use the split-prone wrap_arguments(checker_cmd) flow, so --workspace can be fragmented into multiple args.

🔧 Proposed fix
-    checker_cmd = f"python tests/slurm-tests/unified_tts/check_results.py --workspace {args.workspace}"
+    checker_cmd = "python tests/slurm-tests/unified_tts/check_results.py --workspace"
+    checker_ctx = wrap_arguments(checker_cmd)
+    checker_ctx.args.append(args.workspace)
     run_cmd(
-        ctx=wrap_arguments(checker_cmd),
+        ctx=checker_ctx,
         cluster=args.cluster,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/slurm-tests/unified_tts/run_test.py` around lines 126 - 129, The
checker_cmd string can still be split incorrectly when workspace contains spaces
because wrap_arguments(checker_cmd) tokenizes the full string; change the call
to pass arguments safely by not wrapping the entire formatted string—construct
ctx as a list/sequence of command and separate args so "--workspace" and
args.workspace are distinct elements (or alternatively use a proper
shell-quoting function like shlex.quote on args.workspace before building
checker_cmd) and call run_cmd(ctx=...) with that safe argument list; update
references to checker_cmd, wrap_arguments, run_cmd, and args.workspace
accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@recipes/multimodal/server/backends/__init__.py`:
- Around line 70-76: The lazy loader currently catches ImportError but not
AttributeError, so a typo in the registry's class name (the getattr(module,
class_name) call) will raise AttributeError and skip your friendly message;
modify the try/except around importlib.import_module and getattr to also catch
AttributeError (e.g., except (ImportError, AttributeError) as e) and re-raise an
ImportError that includes backend_name, module_name and class_name with clear
guidance that the backend class is missing or misnamed, preserving the original
exception as the cause (using "from e") so startup errors show the custom
message for both missing modules and missing classes.

In `@recipes/multimodal/server/backends/magpie_tts_backend.py`:
- Around line 304-310: The code in the block handling
p.get("context_audio_filepath") allows client-supplied filesystem paths (in
magpie_tts_backend.py) which can access arbitrary local files; update the
handler (the code around p.get("context_audio_filepath", "") and link creation)
to only accept paths under a specific allowlist root (e.g., a CONTEXT_ROOT
constant) by resolving the candidate with realpath/Path.resolve and verifying it
is a descendant (or use Path.is_relative_to) before creating any symlink; if the
resolved path is outside the allowlist (or if the request is remote), reject or
ignore the context path and log a warning instead of creating the link. Ensure
you also handle symlink/to-parent attacks by resolving both the candidate and
the allowlist root before comparison and avoid following untrusted symlinks when
creating links.

In `@recipes/multimodal/server/unified_server.py`:
- Around line 456-460: The except block that currently does
traceback.print_exc() and raises HTTPException(status_code=500, detail=str(e))
should not return raw exception text to clients; instead log/print the full
traceback for server-side debugging and raise HTTPException(status_code=500,
detail="Internal server error") while preserving the original exception chain
(raise ... from e). Update the except block surrounding the handler where
Exception as e is caught (the block that calls traceback.print_exc()) to replace
str(e) with a generic message and use exception chaining to keep the original
exception available for logs.

In `@tests/slurm-tests/unified_asr/run_test.py`:
- Around line 109-112: The checker invocation breaks when args.workspace
contains spaces because wrap_arguments(checker_cmd) splits on whitespace; fix by
preserving the workspace as a single argument: build the ctx by passing the
command prefix through wrap_arguments for "python
tests/slurm-tests/unified_asr/check_results.py --workspace" (or simply the
script and flag) and then append args.workspace as one element, and use that ctx
in run_cmd (update the checker_cmd/ctx construction used in the run_cmd call so
that --workspace and its path are separate args but the path is a single
element).

---

Duplicate comments:
In `@recipes/multimodal/server/unified_server.py`:
- Around line 184-197: The function extract_text_from_messages should skip
messages where message.get("role") == "system" to avoid duplicating system
prompt text (which is already handled by extract_system_prompt); update the loop
in extract_text_from_messages to check message.get("role") and continue when it
equals "system", keeping the existing logic for string, list, dict items (look
for "content", list items with type=="text", and string items) so only
non-system content is collected and returned.
- Around line 117-121: Validate the number of items returned from
self.backend.generate against the batch size before using zip: after results =
await loop.run_in_executor(None, self.backend.generate, requests) check
len(results) vs len(batch) and for each index i in range(len(batch)) if i <
len(results) set pending.future.set_result(results[i]) (skipping if
future.done()), otherwise set pending.future.set_exception(RuntimeError(...))
describing the backend returned too few results; also handle the inverse case
(len(results) > len(batch)) by logging/ignoring extra results so no futures are
left unresolved; reference batch, results, self.backend.generate,
loop.run_in_executor, and pending.future.set_result when making the change.

In `@tests/slurm-tests/unified_tts/run_test.py`:
- Around line 126-129: The checker_cmd string can still be split incorrectly
when workspace contains spaces because wrap_arguments(checker_cmd) tokenizes the
full string; change the call to pass arguments safely by not wrapping the entire
formatted string—construct ctx as a list/sequence of command and separate args
so "--workspace" and args.workspace are distinct elements (or alternatively use
a proper shell-quoting function like shlex.quote on args.workspace before
building checker_cmd) and call run_cmd(ctx=...) with that safe argument list;
update references to checker_cmd, wrap_arguments, run_cmd, and args.workspace
accordingly.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 52f1248 and 8473b00.

📒 Files selected for processing (6)
  • recipes/multimodal/server/backends/__init__.py
  • recipes/multimodal/server/backends/magpie_tts_backend.py
  • recipes/multimodal/server/unified_server.py
  • tests/slurm-tests/unified_asr/run_test.py
  • tests/slurm-tests/unified_tts/run_test.py
  • tests/test_nemo_asr_backend.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_nemo_asr_backend.py

Comment on lines +70 to +76
try:
module = importlib.import_module(f".{module_name}", package=__name__)
return getattr(module, class_name)
except ImportError as e:
raise ImportError(
f"Failed to import backend '{backend_name}'. Make sure required dependencies are installed. Error: {e}"
) from e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Handle missing backend class names explicitly in lazy loader.

If the registry has a class-name typo, getattr(...) on Line 72 raises AttributeError, which currently bypasses your custom import error message. Catching it explicitly makes startup failures much clearer.

🔧 Proposed fix
     try:
         module = importlib.import_module(f".{module_name}", package=__name__)
-        return getattr(module, class_name)
+        try:
+            return getattr(module, class_name)
+        except AttributeError as e:
+            raise ImportError(
+                f"Backend '{backend_name}' is registered with class '{class_name}', "
+                f"but that class was not found in module '{module_name}'."
+            ) from e
     except ImportError as e:
         raise ImportError(
             f"Failed to import backend '{backend_name}'. Make sure required dependencies are installed. Error: {e}"
         ) from e
🧰 Tools
🪛 Ruff (0.15.2)

[warning] 74-76: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/multimodal/server/backends/__init__.py` around lines 70 - 76, The
lazy loader currently catches ImportError but not AttributeError, so a typo in
the registry's class name (the getattr(module, class_name) call) will raise
AttributeError and skip your friendly message; modify the try/except around
importlib.import_module and getattr to also catch AttributeError (e.g., except
(ImportError, AttributeError) as e) and re-raise an ImportError that includes
backend_name, module_name and class_name with clear guidance that the backend
class is missing or misnamed, preserving the original exception as the cause
(using "from e") so startup errors show the custom message for both missing
modules and missing classes.

Comment on lines +109 to +112
checker_cmd = f"python tests/slurm-tests/unified_asr/check_results.py --workspace {args.workspace}"
run_cmd(
ctx=wrap_arguments(checker_cmd),
cluster=args.cluster,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Preserve --workspace as a single argument when building checker context.

Line 109 + Line 111 currently rely on wrap_arguments(checker_cmd), but wrap_arguments does plain whitespace splitting. A workspace path with spaces will break the checker invocation.

🔧 Proposed fix
-    checker_cmd = f"python tests/slurm-tests/unified_asr/check_results.py --workspace {args.workspace}"
+    checker_cmd = "python tests/slurm-tests/unified_asr/check_results.py --workspace"
+    checker_ctx = wrap_arguments(checker_cmd)
+    checker_ctx.args.append(args.workspace)
     run_cmd(
-        ctx=wrap_arguments(checker_cmd),
+        ctx=checker_ctx,
         cluster=args.cluster,
         expname=f"{args.expname_prefix}-check-results",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/slurm-tests/unified_asr/run_test.py` around lines 109 - 112, The
checker invocation breaks when args.workspace contains spaces because
wrap_arguments(checker_cmd) splits on whitespace; fix by preserving the
workspace as a single argument: build the ctx by passing the command prefix
through wrap_arguments for "python
tests/slurm-tests/unified_asr/check_results.py --workspace" (or simply the
script and flag) and then append args.workspace as one element, and use that ctx
in run_cmd (update the checker_cmd/ctx construction used in the run_cmd call so
that --workspace and its path are separate args but the path is a single
element).

@vmendelev vmendelev force-pushed the pr2/unified-server-refactor branch 2 times, most recently from 05f1a0f to 754aedc Compare February 26, 2026 09:26
@Jorjeous
Copy link
Member

Will look tmr

@Jorjeous
Copy link
Member

Comment about temperature seem legit?

@vmendelev
Copy link
Collaborator Author

Comment about temperature seem legit?

Which one?

Signed-off-by: Valentin Mendelev <vmendelev@nvidia.com>
Signed-off-by: Valentin Mendelev <vmendelev@nvidia.com>
Signed-off-by: Valentin Mendelev <vmendelev@nvidia.com>
Signed-off-by: Valentin Mendelev <vmendelev@nvidia.com>
@vmendelev vmendelev force-pushed the pr2/unified-server-refactor branch from 6e7b703 to ac7771f Compare March 2, 2026 10:18
Signed-off-by: Valentin Mendelev <vmendelev@nvidia.com>
Signed-off-by: Valentin Mendelev <vmendelev@nvidia.com>
@Jorjeous Jorjeous self-requested a review March 2, 2026 14:08
@Jorjeous
Copy link
Member

Jorjeous commented Mar 2, 2026

@coderabbitai help

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 2, 2026

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

CodeRabbit commands

These commands are invoked using PR/Issue comments.

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai evaluate custom pre-merge check --instructions <custom-checks-instructions> --name <custom-checks-title> [--mode <error|warning>] to validate the custom pre-merge checks instructions. Defaults to error when --mode is omitted.
    • @coderabbitai ignore pre-merge checks to override pre-merge checks and get an approval on PR.
    • @coderabbitai run pre-merge checks to run pre-merge checks on the pull request.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve to resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai generate configuration to create a PR that adds the current resolved configuration as .coderabbit.yaml (or show it if already present).
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Status, support, documentation and community

  • Visit our status page to check the current availability of CodeRabbit.
  • Create a ticket on our support page for assistance with any issues or questions.
  • Visit our documentation site for detailed information on how to use CodeRabbit.
  • Join our Discord community to connect with other users and get help from the community.
  • Follow us on X/Twitter for updates and announcements.

@Jorjeous
Copy link
Member

Jorjeous commented Mar 2, 2026

@coderabbitai full review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 2, 2026

✅ Actions performed

Full review triggered.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (8)
recipes/multimodal/server/backends/__init__.py (1)

70-76: ⚠️ Potential issue | 🟡 Minor

Handle AttributeError from getattr() to provide clearer error messages.

If the registry has a class-name typo, getattr(module, class_name) raises AttributeError, which bypasses the custom import error message. This was flagged in a prior review and remains unaddressed.

🔧 Proposed fix
     try:
         module = importlib.import_module(f".{module_name}", package=__name__)
-        return getattr(module, class_name)
+        try:
+            return getattr(module, class_name)
+        except AttributeError as e:
+            raise ImportError(
+                f"Backend '{backend_name}' registered class '{class_name}' not found in '{module_name}'"
+            ) from e
     except ImportError as e:
         raise ImportError(
             f"Failed to import backend '{backend_name}'. Make sure required dependencies are installed. Error: {e}"
         ) from e
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/multimodal/server/backends/__init__.py` around lines 70 - 76, The
current import block catches ImportError but not AttributeError from
getattr(module, class_name), so typos in class names bypass your friendly
message; update the try/except around importlib.import_module and getattr (the
code that returns getattr(module, class_name)) to also catch AttributeError and
re-raise an ImportError that includes backend_name and class_name and the
original error to provide a clear message about a missing class in the imported
module.
nemo_skills/inference/server/serve_unified.py (3)

119-124: ⚠️ Potential issue | 🟡 Minor

yaml.safe_load returns None for empty files, causing downstream crash.

If the YAML file is empty, safe_load returns None, which will cause config_dict.pop(...) at line 244 to raise AttributeError. This was flagged in prior review.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/inference/server/serve_unified.py` around lines 119 - 124, The
YAML loader can return None for empty files causing downstream calls like
config_dict.pop(...) to fail; update load_yaml_config to ensure it never returns
None by checking the result of yaml.safe_load and returning an empty dict when
safe_load yields None (i.e., in function load_yaml_config, after calling
yaml.safe_load, coerce a None to {} and return that instead).

282-306: ⚠️ Potential issue | 🟡 Minor

Broad except Exception block remains from prior review.

The ImportError catch with helpful guidance is appropriate, but the generic Exception catch at line 301 violates the guideline "Don't catch exceptions when they are not expected to be normally raised." Python's default handler provides the same traceback. This was flagged in prior review.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/inference/server/serve_unified.py` around lines 282 - 306, The
broad "except Exception as e" block in serve_unified.py (surrounding the
uvicorn.run and create_app invocation) should be removed so unexpected errors
are not swallowed; keep the specific ImportError handler that logs guidance, but
delete the generic exception handler (the block that prints the error, imports
traceback, calls traceback.print_exc(), and exits) so Python's default exception
handling provides the traceback and proper failure behavior for
create_app/uvicorn.run.

103-116: ⚠️ Potential issue | 🟡 Minor

Broad exception handling may hide patch failures.

Per coding guidelines, exceptions should not be silently suppressed. If the file copy fails (e.g., permission error), the server continues but model loading may fail later with an unrelated error. This was flagged in prior review.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/inference/server/serve_unified.py` around lines 103 - 116, The
function apply_safetensors_patch currently catches Exception broadly and only
prints a warning, which can hide real failures; update apply_safetensors_patch
to catch specific exceptions (e.g., ImportError, FileNotFoundError,
PermissionError, OSError) or re-raise after logging, include the full exception
traceback and context (hack_path, dest_path, and the import target st_torch) in
the log, and ensure critical failures either raise a clear exception or return a
distinct error result so callers can handle the failure instead of proceeding
silently.
tests/slurm-tests/unified_asr/run_test.py (1)

109-118: ⚠️ Potential issue | 🟡 Minor

Workspace paths with spaces will break the checker invocation.

wrap_arguments(checker_cmd) splits on whitespace, so a workspace path like /path/with spaces/ becomes multiple arguments. This was flagged in prior review.

🔧 Proposed fix
-    checker_cmd = f"python tests/slurm-tests/unified_asr/check_results.py --workspace {args.workspace}"
+    checker_ctx = wrap_arguments("python tests/slurm-tests/unified_asr/check_results.py --workspace")
+    checker_ctx.args.append(args.workspace)
     run_cmd(
-        ctx=wrap_arguments(checker_cmd),
+        ctx=checker_ctx,
         cluster=args.cluster,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/slurm-tests/unified_asr/run_test.py` around lines 109 - 118,
Escape/quote the workspace before splitting so spaces don't produce multiple
args: import shlex and build checker_cmd using shlex.quote(args.workspace) (e.g.
checker_cmd = f"python ... --workspace {shlex.quote(args.workspace)}"), then
continue to pass that checker_cmd into wrap_arguments and run_cmd; this ensures
wrap_arguments(checker_cmd) yields a single argument token for paths with
spaces.
recipes/multimodal/server/unified_server.py (2)

332-334: ⚠️ Potential issue | 🟡 Minor

Require messages explicitly instead of defaulting to [].

At Line 332, defaulting a required field to empty list can hide malformed requests and blur error semantics.

🔧 Proposed fix
-            messages = request.get("messages", [])
-            if not messages:
-                raise HTTPException(status_code=400, detail="No messages provided")
+            if "messages" not in request:
+                raise HTTPException(status_code=400, detail="Missing required field: messages")
+            messages = request["messages"]
+            if not messages:
+                raise HTTPException(status_code=400, detail="No messages provided")

Based on learnings: "Applies to **/*.py : Don't use .get() for accessing dictionary keys if the code expects them to be present; use direct access data[key_name] to fail with a clear error instead of silently corrupting data."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/multimodal/server/unified_server.py` around lines 332 - 334, Replace
the permissive access request.get("messages", []) with a required lookup so
malformed requests fail fast: use messages = request["messages"] (or
request.pop("messages") if you need to consume it) and retain the existing
validation that raises HTTPException(status_code=400, detail="No messages
provided") when messages is empty or falsy; this ensures clear KeyError behavior
for missing keys and preserves the current check for empty messages in the
handler that reads request/messages.

292-293: ⚠️ Potential issue | 🟡 Minor

Use direct indexing for fixed server_config keys.

server_config is created with these keys, so .get() at Line 292, Line 293, Line 312, and Line 459 masks bugs.

🔧 Proposed fix
-            "backend": server_config.get("backend_type"),
-            "model": server_config.get("model_path"),
+            "backend": server_config["backend_type"],
+            "model": server_config["model_path"],
@@
-        model_id = server_config.get("model_path", "unknown")
+        model_id = server_config["model_path"]
@@
-                "model": server_config.get("model_path"),
+                "model": server_config["model_path"],

Based on learnings: "Applies to **/*.py : Don't use .get() for accessing dictionary keys if the code expects them to be present; use direct access data[key_name] to fail with a clear error instead of silently corrupting data."

Also applies to: 312-312, 459-459

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/multimodal/server/unified_server.py` around lines 292 - 293,
server_config is constructed with fixed keys, so replace all
server_config.get(...) uses with direct indexing to surface missing-key errors:
change server_config.get("backend_type") to server_config["backend_type"],
server_config.get("model_path") to server_config["model_path"], and likewise
update the other occurrences noted (the server_config.get(...) at the other two
locations) to server_config["<key_name>"] so the code fails fast with a clear
KeyError instead of silently returning None.
tests/slurm-tests/unified_tts/run_test.py (1)

126-129: ⚠️ Potential issue | 🟡 Minor

Preserve --workspace as a single argument for checker invocation.

At Line 126 and Line 128, checker_cmd is still passed through wrap_arguments(...), which uses whitespace splitting and breaks paths with spaces. This is still unresolved.

🔧 Proposed fix
 import argparse
+import shlex
 from pathlib import Path
@@
-    checker_cmd = f"python tests/slurm-tests/unified_tts/check_results.py --workspace {args.workspace}"
+    checker_cmd = (
+        "python tests/slurm-tests/unified_tts/check_results.py "
+        f"--workspace {shlex.quote(args.workspace)}"
+    )
     run_cmd(
-        ctx=wrap_arguments(checker_cmd),
+        ctx=wrap_arguments(""),
+        command=checker_cmd,
         cluster=args.cluster,
         expname=f"{args.expname_prefix}-check-results",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/slurm-tests/unified_tts/run_test.py` around lines 126 - 129, The
checker command is being split by wrap_arguments causing paths with spaces to
break; update the run invocation to preserve the entire --workspace argument by
not using wrap_arguments for checker_cmd and instead pass a single string or an
explicit list where the "--workspace" and its value are one element; locate
checker_cmd and the run_cmd call (references: checker_cmd, wrap_arguments,
run_cmd) and replace ctx=wrap_arguments(checker_cmd) with a call that supplies
the un-split command (e.g., ctx=checker_cmd or ctx=[...explicit args...] so
"--workspace" and args.workspace remain a single argument).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@recipes/multimodal/server/backends/magpie_tts_backend.py`:
- Around line 248-259: The code silently falls back to .nemo mode when only one
of self.tts_config.hparams_file or self.tts_config.checkpoint_file is provided;
change the logic around has_ckpt_mode to validate these fields and fail fast: if
exactly one of hparams_file or checkpoint_file is set, raise a ValueError (or
similar) describing the missing counterpart rather than proceeding, otherwise
continue to build _ModelLoadConfig as before; update the block that computes
has_ckpt_mode and constructs _ModelLoadConfig to enforce this check so
user-supplied partial checkpoint arguments are not ignored.

In `@recipes/multimodal/server/backends/nemo_asr_backend.py`:
- Around line 82-84: The current __init__ in nemo_asr_backend.py discards
top-level BackendConfig fields when config is not a NeMoASRConfig by only using
config.extra_config; update initialization to merge/propagate base fields (e.g.,
model_path, device, any generation defaults) into the NeMoASRConfig constructed
from config.extra_config or validate/raise if unsupported: call
NeMoASRConfig.from_dict(config.extra_config) to get a baseline, then override
its attributes with explicit values from the original BackendConfig
(config.model_path, config.device, etc.) before assigning self.asr_config, or
alternatively validate and error if those base fields are present but cannot be
applied.

In `@recipes/multimodal/server/unified_server.py`:
- Around line 168-183: The try/except around decoding audio blocks swallows all
exceptions causing silent drops of client audio; update the block that handles
audio_url/input_audio (variables: data_uri_pattern, audio_list, input_audio,
audio_url, data) to stop catching Exception broadly—either remove the broad
try/except or catch only expected decode errors (e.g., binascii.Error or
ValueError), and when such an error occurs log it unconditionally (use the
logger instead of print/DEBUG) and re-raise or propagate an error so the failure
is visible to callers instead of being ignored. Ensure the change targets the
decoding section that appends to audio_list so malformed base64 is not silently
dropped.

In `@tests/slurm-tests/unified_tts/check_results.py`:
- Around line 38-43: resolve_audio_path currently uses path.name which strips
any subdirectory structure and can cause collisions; update resolve_audio_path
to append the full relative Path (not path.name) under Path(workspace) /
"tts_outputs" / "audio" so nested relative paths are preserved (i.e., when
path.is_absolute() is False, return Path(workspace) / "tts_outputs" / "audio" /
path), and optionally call path = path.relative_to(path.anchor) or .resolve()
only for absolute cases to avoid path traversal surprises in resolve_audio_path.
- Around line 20-21: The current sys.path.append + from utils import assert_all,
soft_assert can pick up an unrelated installed utils; replace this with a
deterministic local import: either prepend the test directory using
sys.path.insert(0, str(Path(__file__).resolve().parent.parent)) before the
import, or (preferable) load the local utils.py explicitly with
importlib.util.spec_from_file_location / importlib.util.module_from_spec and
bind assert_all and soft_assert from that loaded module; update the code around
sys.path.append and from utils import ... to use one of these deterministic
approaches so the local utils.py is always imported.

---

Duplicate comments:
In `@nemo_skills/inference/server/serve_unified.py`:
- Around line 119-124: The YAML loader can return None for empty files causing
downstream calls like config_dict.pop(...) to fail; update load_yaml_config to
ensure it never returns None by checking the result of yaml.safe_load and
returning an empty dict when safe_load yields None (i.e., in function
load_yaml_config, after calling yaml.safe_load, coerce a None to {} and return
that instead).
- Around line 282-306: The broad "except Exception as e" block in
serve_unified.py (surrounding the uvicorn.run and create_app invocation) should
be removed so unexpected errors are not swallowed; keep the specific ImportError
handler that logs guidance, but delete the generic exception handler (the block
that prints the error, imports traceback, calls traceback.print_exc(), and
exits) so Python's default exception handling provides the traceback and proper
failure behavior for create_app/uvicorn.run.
- Around line 103-116: The function apply_safetensors_patch currently catches
Exception broadly and only prints a warning, which can hide real failures;
update apply_safetensors_patch to catch specific exceptions (e.g., ImportError,
FileNotFoundError, PermissionError, OSError) or re-raise after logging, include
the full exception traceback and context (hack_path, dest_path, and the import
target st_torch) in the log, and ensure critical failures either raise a clear
exception or return a distinct error result so callers can handle the failure
instead of proceeding silently.

In `@recipes/multimodal/server/backends/__init__.py`:
- Around line 70-76: The current import block catches ImportError but not
AttributeError from getattr(module, class_name), so typos in class names bypass
your friendly message; update the try/except around importlib.import_module and
getattr (the code that returns getattr(module, class_name)) to also catch
AttributeError and re-raise an ImportError that includes backend_name and
class_name and the original error to provide a clear message about a missing
class in the imported module.

In `@recipes/multimodal/server/unified_server.py`:
- Around line 332-334: Replace the permissive access request.get("messages", [])
with a required lookup so malformed requests fail fast: use messages =
request["messages"] (or request.pop("messages") if you need to consume it) and
retain the existing validation that raises HTTPException(status_code=400,
detail="No messages provided") when messages is empty or falsy; this ensures
clear KeyError behavior for missing keys and preserves the current check for
empty messages in the handler that reads request/messages.
- Around line 292-293: server_config is constructed with fixed keys, so replace
all server_config.get(...) uses with direct indexing to surface missing-key
errors: change server_config.get("backend_type") to
server_config["backend_type"], server_config.get("model_path") to
server_config["model_path"], and likewise update the other occurrences noted
(the server_config.get(...) at the other two locations) to
server_config["<key_name>"] so the code fails fast with a clear KeyError instead
of silently returning None.

In `@tests/slurm-tests/unified_asr/run_test.py`:
- Around line 109-118: Escape/quote the workspace before splitting so spaces
don't produce multiple args: import shlex and build checker_cmd using
shlex.quote(args.workspace) (e.g. checker_cmd = f"python ... --workspace
{shlex.quote(args.workspace)}"), then continue to pass that checker_cmd into
wrap_arguments and run_cmd; this ensures wrap_arguments(checker_cmd) yields a
single argument token for paths with spaces.

In `@tests/slurm-tests/unified_tts/run_test.py`:
- Around line 126-129: The checker command is being split by wrap_arguments
causing paths with spaces to break; update the run invocation to preserve the
entire --workspace argument by not using wrap_arguments for checker_cmd and
instead pass a single string or an explicit list where the "--workspace" and its
value are one element; locate checker_cmd and the run_cmd call (references:
checker_cmd, wrap_arguments, run_cmd) and replace
ctx=wrap_arguments(checker_cmd) with a call that supplies the un-split command
(e.g., ctx=checker_cmd or ctx=[...explicit args...] so "--workspace" and
args.workspace remain a single argument).

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d4ef8c0 and 5a1449f.

📒 Files selected for processing (22)
  • nemo_skills/inference/server/serve_unified.py
  • recipes/multimodal/__init__.py
  • recipes/multimodal/server/README.md
  • recipes/multimodal/server/__init__.py
  • recipes/multimodal/server/backends/__init__.py
  • recipes/multimodal/server/backends/base.py
  • recipes/multimodal/server/backends/magpie_tts_backend.py
  • recipes/multimodal/server/backends/nemo_asr_backend.py
  • recipes/multimodal/server/unified_server.py
  • tests/slurm-tests/run_all.sh
  • tests/slurm-tests/unified_asr/asr_openai.test
  • tests/slurm-tests/unified_asr/check_results.py
  • tests/slurm-tests/unified_asr/run_test.py
  • tests/slurm-tests/unified_tts/README.md
  • tests/slurm-tests/unified_tts/check_results.py
  • tests/slurm-tests/unified_tts/run_test.py
  • tests/slurm-tests/unified_tts/tts_openai.test
  • tests/test_magpie_tts_backend.py
  • tests/test_nemo_asr_backend.py
  • tests/test_unified_server_audio_parser.py
  • tests/test_unified_server_batcher.py
  • tests/test_unified_server_error_handling.py

Comment on lines +82 to +84
def __init__(self, config: BackendConfig):
self.asr_config = config if isinstance(config, NeMoASRConfig) else NeMoASRConfig.from_dict(config.extra_config)
super().__init__(self.asr_config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t drop base BackendConfig fields in fallback config conversion.

At Line 83, converting only config.extra_config can ignore explicit model_path, device, and generation defaults when config is not already NeMoASRConfig.

🔧 Proposed fix
     def __init__(self, config: BackendConfig):
-        self.asr_config = config if isinstance(config, NeMoASRConfig) else NeMoASRConfig.from_dict(config.extra_config)
+        if isinstance(config, NeMoASRConfig):
+            self.asr_config = config
+        else:
+            self.asr_config = NeMoASRConfig.from_dict(
+                {
+                    "model_path": config.model_path,
+                    "device": config.device,
+                    "dtype": config.dtype,
+                    "max_new_tokens": config.max_new_tokens,
+                    "temperature": config.temperature,
+                    "top_p": config.top_p,
+                    "top_k": config.top_k,
+                    **config.extra_config,
+                }
+            )
         super().__init__(self.asr_config)

As per coding guidelines: "Avoid cases where user-passed parameters are unused; code should fail if user specifies an unsupported argument or if a required argument is missing."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def __init__(self, config: BackendConfig):
self.asr_config = config if isinstance(config, NeMoASRConfig) else NeMoASRConfig.from_dict(config.extra_config)
super().__init__(self.asr_config)
def __init__(self, config: BackendConfig):
if isinstance(config, NeMoASRConfig):
self.asr_config = config
else:
self.asr_config = NeMoASRConfig.from_dict(
{
"model_path": config.model_path,
"device": config.device,
"dtype": config.dtype,
"max_new_tokens": config.max_new_tokens,
"temperature": config.temperature,
"top_p": config.top_p,
"top_k": config.top_k,
**config.extra_config,
}
)
super().__init__(self.asr_config)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/multimodal/server/backends/nemo_asr_backend.py` around lines 82 - 84,
The current __init__ in nemo_asr_backend.py discards top-level BackendConfig
fields when config is not a NeMoASRConfig by only using config.extra_config;
update initialization to merge/propagate base fields (e.g., model_path, device,
any generation defaults) into the NeMoASRConfig constructed from
config.extra_config or validate/raise if unsupported: call
NeMoASRConfig.from_dict(config.extra_config) to get a baseline, then override
its attributes with explicit values from the original BackendConfig
(config.model_path, config.device, etc.) before assigning self.asr_config, or
alternatively validate and error if those base fields are present but cannot be
applied.

Comment on lines +168 to +183
try:
if item.get("type") == "audio_url":
audio_url = item.get("audio_url", {})
url = audio_url.get("url", "")
match = data_uri_pattern.match(url)
if match:
audio_list.append(base64.b64decode(match.group(1)))
elif item.get("type") == "input_audio":
input_audio = item.get("input_audio", {})
data = input_audio.get("data", "")
if data:
audio_list.append(base64.b64decode(data))
except Exception as e:
if DEBUG:
print(f"[Server] Warning: Failed to decode audio block: {e}")
return audio_list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t silently ignore malformed audio blocks.

At Line 180-183, decode failures are swallowed and processing continues. This can silently drop client audio and produce misleading downstream behavior.

🔧 Proposed fix
+import binascii
@@
-                try:
-                    if item.get("type") == "audio_url":
-                        audio_url = item.get("audio_url", {})
-                        url = audio_url.get("url", "")
-                        match = data_uri_pattern.match(url)
-                        if match:
-                            audio_list.append(base64.b64decode(match.group(1)))
-                    elif item.get("type") == "input_audio":
-                        input_audio = item.get("input_audio", {})
-                        data = input_audio.get("data", "")
-                        if data:
-                            audio_list.append(base64.b64decode(data))
-                except Exception as e:
-                    if DEBUG:
-                        print(f"[Server] Warning: Failed to decode audio block: {e}")
+                if item.get("type") == "audio_url":
+                    audio_url = item.get("audio_url", {})
+                    url = audio_url.get("url", "")
+                    match = data_uri_pattern.match(url)
+                    if not match:
+                        raise ValueError("Invalid audio_url data URI format")
+                    try:
+                        audio_list.append(base64.b64decode(match.group(1), validate=True))
+                    except binascii.Error as e:
+                        raise ValueError("Invalid base64 payload in audio_url block") from e
+                elif item.get("type") == "input_audio":
+                    input_audio = item.get("input_audio", {})
+                    data = input_audio.get("data", "")
+                    if data:
+                        try:
+                            audio_list.append(base64.b64decode(data, validate=True))
+                        except binascii.Error as e:
+                            raise ValueError("Invalid base64 payload in input_audio block") from e

As per coding guidelines: "Don't catch exceptions when they are not expected to be normally raised; let the code fail when something unexpected happens so users notice it instead of silently misbehaving."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
if item.get("type") == "audio_url":
audio_url = item.get("audio_url", {})
url = audio_url.get("url", "")
match = data_uri_pattern.match(url)
if match:
audio_list.append(base64.b64decode(match.group(1)))
elif item.get("type") == "input_audio":
input_audio = item.get("input_audio", {})
data = input_audio.get("data", "")
if data:
audio_list.append(base64.b64decode(data))
except Exception as e:
if DEBUG:
print(f"[Server] Warning: Failed to decode audio block: {e}")
return audio_list
if item.get("type") == "audio_url":
audio_url = item.get("audio_url", {})
url = audio_url.get("url", "")
match = data_uri_pattern.match(url)
if not match:
raise ValueError("Invalid audio_url data URI format")
try:
audio_list.append(base64.b64decode(match.group(1), validate=True))
except binascii.Error as e:
raise ValueError("Invalid base64 payload in audio_url block") from e
elif item.get("type") == "input_audio":
input_audio = item.get("input_audio", {})
data = input_audio.get("data", "")
if data:
try:
audio_list.append(base64.b64decode(data, validate=True))
except binascii.Error as e:
raise ValueError("Invalid base64 payload in input_audio block") from e
🧰 Tools
🪛 Ruff (0.15.2)

[warning] 180-180: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/multimodal/server/unified_server.py` around lines 168 - 183, The
try/except around decoding audio blocks swallows all exceptions causing silent
drops of client audio; update the block that handles audio_url/input_audio
(variables: data_uri_pattern, audio_list, input_audio, audio_url, data) to stop
catching Exception broadly—either remove the broad try/except or catch only
expected decode errors (e.g., binascii.Error or ValueError), and when such an
error occurs log it unconditionally (use the logger instead of print/DEBUG) and
re-raise or propagate an error so the failure is visible to callers instead of
being ignored. Ensure the change targets the decoding section that appends to
audio_list so malformed base64 is not silently dropped.

Comment on lines +20 to +21
sys.path.append(str(Path(__file__).resolve().parent.parent)) # for utils.py
from utils import assert_all, soft_assert # noqa: E402
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Import utils deterministically from the local Slurm test directory.

Using sys.path.append(...) can let from utils import ... resolve to an unrelated installed utils module before the intended local helper.

🔧 Proposed fix
-sys.path.append(str(Path(__file__).resolve().parent.parent))  # for utils.py
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent))  # prioritize local slurm-tests utils.py
 from utils import assert_all, soft_assert  # noqa: E402
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/slurm-tests/unified_tts/check_results.py` around lines 20 - 21, The
current sys.path.append + from utils import assert_all, soft_assert can pick up
an unrelated installed utils; replace this with a deterministic local import:
either prepend the test directory using sys.path.insert(0,
str(Path(__file__).resolve().parent.parent)) before the import, or (preferable)
load the local utils.py explicitly with importlib.util.spec_from_file_location /
importlib.util.module_from_spec and bind assert_all and soft_assert from that
loaded module; update the code around sys.path.append and from utils import ...
to use one of these deterministic approaches so the local utils.py is always
imported.

Comment on lines +38 to +43
def resolve_audio_path(audio_path: str, workspace: str) -> Path:
path = Path(audio_path)
if path.is_absolute():
return path
return Path(workspace) / "tts_outputs" / "audio" / path.name

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Preserve relative path structure when resolving audio artifacts.

path.name drops subdirectories. If outputs include nested relative paths (or duplicate basenames), this can validate the wrong file.

🔧 Proposed fix
 def resolve_audio_path(audio_path: str, workspace: str) -> Path:
     path = Path(audio_path)
     if path.is_absolute():
         return path
-    return Path(workspace) / "tts_outputs" / "audio" / path.name
+    workspace_path = Path(workspace)
+    candidates = (
+        workspace_path / path,
+        workspace_path / "tts_outputs" / path,
+        workspace_path / "tts_outputs" / "audio" / path,
+    )
+    for candidate in candidates:
+        if candidate.exists():
+            return candidate
+    return workspace_path / "tts_outputs" / "audio" / path
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/slurm-tests/unified_tts/check_results.py` around lines 38 - 43,
resolve_audio_path currently uses path.name which strips any subdirectory
structure and can cause collisions; update resolve_audio_path to append the full
relative Path (not path.name) under Path(workspace) / "tts_outputs" / "audio" so
nested relative paths are preserved (i.e., when path.is_absolute() is False,
return Path(workspace) / "tts_outputs" / "audio" / path), and optionally call
path = path.relative_to(path.anchor) or .resolve() only for absolute cases to
avoid path traversal surprises in resolve_audio_path.

# sleep 10
python tests/slurm-tests/unified_asr/run_test.py --cluster $CLUSTER --workspace /workspace/nemo-skills-slurm-ci/$RUN_NAME/unified_asr --expname_prefix unified_asr_$RUN_NAME
# sleep 10
python tests/slurm-tests/unified_tts/run_test.py --cluster $CLUSTER --workspace /workspace/nemo-skills-slurm-ci/$RUN_NAME/unified_tts --expname_prefix unified_tts_$RUN_NAME
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--code_path?
For now, pass a NeMo code tree explicitly via `--code_path`. When a newer NeMo image includes the required Magpie TTS modules, this manual `--code_path` override should no longer be necessary.

inference_ctor_params = inspect.signature(_InferenceConfig).parameters
inference_kwargs = {}
if "batch_size" in inference_ctor_params:
inference_kwargs["batch_size"] = 16
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intentional?

@Jorjeous
Copy link
Member

Jorjeous commented Mar 2, 2026

I still see some tricky places, but it looks aligned with "fail fast" approach and all probably comes from user's mistakes

overall LGTM

Copy link
Collaborator

@karpnv karpnv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@gwarmstrong gwarmstrong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@vmendelev vmendelev merged commit 31735f9 into main Mar 2, 2026
5 checks passed
@vmendelev vmendelev deleted the pr2/unified-server-refactor branch March 2, 2026 22:11
gwarmstrong pushed a commit that referenced this pull request Mar 4, 2026
…ackends (#1250)

Signed-off-by: Valentin Mendelev <vmendelev@nvidia.com>
sgunasekar added a commit that referenced this pull request Mar 11, 2026
commit a5da597
Author: Igor Gitman <igitman@nvidia.com>
Date:   Fri Mar 6 12:13:36 2026 -0800

    Revert "Eval kit support  (#1239)" (#1294)

    Signed-off-by: Igor Gitman <igitman@nvidia.com>

commit b237e33
Author: George <37293288+Jorjeous@users.noreply.github.com>
Date:   Fri Mar 6 20:25:37 2026 +0400

    Eval kit support  (#1239)

    Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
    Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com>
    Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

commit dc28bbf
Author: George Armstrong <georgea@nvidia.com>
Date:   Thu Mar 5 10:17:44 2026 -0800

    Python direct tool calling without MCP (#1286)

    Signed-off-by: George Armstrong <georgea@nvidia.com>

commit 12454dd
Author: Sadegh Mahdavi <smahdavi4@gmail.com>
Date:   Wed Mar 4 13:06:21 2026 -0800

    Allow het servers for nemo-rl jobs (#1223)

    Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
    Co-authored-by: Igor Gitman <igitman@nvidia.com>

commit 8884a68
Author: Prasoon Varshney <prasoon1995@gmail.com>
Date:   Wed Mar 4 10:24:02 2026 -0800

    Support source_lang param for translation recipe (#1290)

    Signed-off-by: Prasoon Varshney <prasoonv@nvidia.com>
    Co-authored-by: George Armstrong <georgea@nvidia.com>

commit 4618b19
Author: Meriem B. <113170426+ka00ri@users.noreply.github.com>
Date:   Wed Mar 4 18:59:28 2026 +0100

    Add MMLU-Pro 10% optimized subset for checkpoint selection (#1285)

    Signed-off-by: Meriem Boubdir <mboubdir@nvidia.com>
    Co-authored-by: George Armstrong <georgea@nvidia.com>

commit 5ac8609
Author: Talor Abramovich <talor19@gmail.com>
Date:   Wed Mar 4 02:30:06 2026 +0200

    Add SPEED-Bench (within repo) (#1279)

    Signed-off-by: Talor Abramovich <talora@nvidia.com>
    Signed-off-by: talora <talora@nvidia.com>
    Signed-off-by: Talor Abramovich <talor19@gmail.com>
    Signed-off-by: George Armstrong <georgea@nvidia.com>
    Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
    Co-authored-by: George Armstrong <georgea@nvidia.com>
    Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com>

commit c31eec5
Author: George Armstrong <georgea@nvidia.com>
Date:   Tue Mar 3 12:18:15 2026 -0800

    Fix os.getlogin() crash in ns setup (#1289)

    Signed-off-by: George Armstrong <georgea@nvidia.com>

commit c228e66
Author: George Armstrong <georgea@nvidia.com>
Date:   Tue Mar 3 11:04:54 2026 -0800

    Fix streaming TypeError when delta.content is None (#1267) (#1288)

    Signed-off-by: George Armstrong <georgea@nvidia.com>

commit aa47923
Author: Matvei Novikov <mnovikov@nvidia.com>
Date:   Mon Mar 2 16:28:41 2026 -0800

    Add LibTrace recipe for generating domain-specific reasoning data (#1224)

    Signed-off-by: jubick1337 <mnovikov@nvidia.com>
    Signed-off-by: mnovikov <mnovikov@nvidia.com>
    Co-authored-by: George Armstrong <georgea@nvidia.com>

commit 313cad7
Author: Stephen Ge <stepheng@nvidia.com>
Date:   Mon Mar 2 18:28:49 2026 -0500

    fix: clean parse-failure retries in prover (#1284)

    Signed-off-by: Stephen Ge <stepheng@nvidia.com>

commit 813cfa3
Author: George Armstrong <georgea@nvidia.com>
Date:   Mon Mar 2 15:10:08 2026 -0800

    tst: rollback inference-api to integrate (#1287)

    Signed-off-by: George Armstrong <georgea@nvidia.com>

commit 31735f9
Author: Valentin Mendelev <vmendelev@nvidia.com>
Date:   Mon Mar 2 23:11:25 2026 +0100

    Add backend-agnostic unified inference server with NeMo ASR and TTS backends (#1250)

    Signed-off-by: Valentin Mendelev <vmendelev@nvidia.com>

commit d4ef8c0
Author: George <37293288+Jorjeous@users.noreply.github.com>
Date:   Fri Feb 27 23:58:54 2026 +0400

    Update promt_config to working with openai format + inline setup (#1210)

    Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
    Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com>
    Co-authored-by: Igor Gitman <igitman@nvidia.com>
    Co-authored-by: George Armstrong <georgea@nvidia.com>

commit e879cbc
Author: George Armstrong <georgea@nvidia.com>
Date:   Fri Feb 27 10:41:23 2026 -0800

    Update noc tutorial (#1282)

    Signed-off-by: George Armstrong <georgea@nvidia.com>

commit f6e3505
Author: George Armstrong <georgea@nvidia.com>
Date:   Fri Feb 27 10:17:33 2026 -0800

    Add noc reasoning tutorial (#1278)

    Signed-off-by: Amparo Canaveras <acanaveras@nvidia.com>
    Signed-off-by: rajeshwarid179 <rdevaramani@nvidia.com>
    Signed-off-by: acanaveras <142839082+acanaveras@users.noreply.github.com>
    Signed-off-by: George Armstrong <georgea@nvidia.com>
    Co-authored-by: Amparo Canaveras <acanaveras@nvidia.com>
    Co-authored-by: Cursor <cursoragent@cursor.com>
    Co-authored-by: acanaveras <142839082+acanaveras@users.noreply.github.com>
    Co-authored-by: rajeshwarid179 <rdevaramani@nvidia.com>

commit fc2072a
Author: Jiacheng Xu <jcxu@utexas.edu>
Date:   Fri Feb 27 10:10:25 2026 -0800

    CritPt generation add prompt_format=None (#1280)

    Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com>
    Co-authored-by: George Armstrong <georgea@nvidia.com>

commit c8abe5d
Author: Igor Gitman <igitman@nvidia.com>
Date:   Fri Feb 27 09:31:26 2026 -0800

    New slurm customization parameters (account, containers) (#1209)

    Signed-off-by: Igor Gitman <igitman@nvidia.com>
    Signed-off-by: George Armstrong <georgea@nvidia.com>
    Co-authored-by: George Armstrong <georgea@nvidia.com>

commit 2b38cce
Author: George Armstrong <georgea@nvidia.com>
Date:   Wed Feb 25 17:59:52 2026 -0800

    Add nemo-skills-core subpackage for lightweight installs (#1229)

    Signed-off-by: George Armstrong <georgea@nvidia.com>

commit 9fa8e83
Author: Dheeraj Peri <peri.dheeraj@gmail.com>
Date:   Wed Feb 25 12:56:35 2026 -0800

    feat: add custom judge type support for external repo integration (#1274)

    Signed-off-by: Igor Gitman <igitman@nvidia.com>
    Signed-off-by: bzantium <ryumin93@gmail.com>
    Signed-off-by: Dheeraj Peri <dperi@nvidia.com>
    Signed-off-by: suriya <sgunasekar@nvidia.com>
    Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com>
    Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
    Co-authored-by: Igor Gitman <igitman@nvidia.com>
    Co-authored-by: Minho Ryu <ryumin93@gmail.com>
    Co-authored-by: Yongqiang Wang <yongqiang.seagull@gmail.com>
    Co-authored-by: Suriya Gunasekar <sgunasekar@users.noreply.github.com>
    Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
    Co-authored-by: Jiacheng Xu <jcxu@utexas.edu>
    Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com>

commit 8a32b13
Author: Igor Gitman <igitman@nvidia.com>
Date:   Tue Feb 24 15:24:42 2026 -0800

    Exclude numb3rs form test_eval.py (#1275)

    Signed-off-by: Igor Gitman <igitman@nvidia.com>

commit 6da2219
Author: George <37293288+Jorjeous@users.noreply.github.com>
Date:   Mon Feb 23 18:37:46 2026 +0400

    Numb3rs ds addition (#1174)

    Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

commit ad034b5
Author: Suriya Gunasekar <sgunasekar@users.noreply.github.com>
Date:   Sun Feb 22 11:55:24 2026 -0800

    Add DSBench-DA evaluation (#1254)

    Squash merge of changes during code-review.
    Signed-off-by: suriya <sgunasekar@nvidia.com>

commit 7593ab3
Author: Jiacheng Xu <jcxu@utexas.edu>
Date:   Fri Feb 20 16:42:01 2026 -0800

    Add CritPt benchmark (#1200)

    Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com>
    Co-authored-by: Igor Gitman <igitman@nvidia.com>

commit 58c31b2
Author: Suriya Gunasekar <sgunasekar@users.noreply.github.com>
Date:   Fri Feb 20 16:19:22 2026 -0800

    Fix no_answer metric overcounting in _compute_pass_at_k (#1245)

    Signed-off-by: suriya <sgunasekar@nvidia.com>
    Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
    Co-authored-by: Igor Gitman <igitman@nvidia.com>

commit 1f1a2e7
Author: Igor Gitman <igitman@nvidia.com>
Date:   Fri Feb 20 15:58:40 2026 -0800

    Fix incorrect prompt tokens count due to HF api update (#1264)

    Signed-off-by: Igor Gitman <igitman@nvidia.com>

commit 8ebc6f5
Author: Igor Gitman <igitman@nvidia.com>
Date:   Fri Feb 20 09:05:33 2026 -0800

    Remove deprecated dataset group (#1263)

    Signed-off-by: Igor Gitman <igitman@nvidia.com>

commit ea4177f
Author: Yongqiang Wang <yongqiang.seagull@gmail.com>
Date:   Thu Feb 19 19:57:25 2026 -0500

    fix deps (#1258)

commit 60905a7
Author: Minho Ryu <ryumin93@gmail.com>
Date:   Fri Feb 20 09:39:39 2026 +0900

    Add aime26 (#1256)

    Signed-off-by: bzantium <ryumin93@gmail.com>

commit b28afc5
Author: Igor Gitman <igitman@nvidia.com>
Date:   Thu Feb 19 16:18:25 2026 -0800

    Rename custom -> external benchmarks (#1262)

    Signed-off-by: Igor Gitman <igitman@nvidia.com>

commit 6cc9c45
Author: Igor Gitman <igitman@nvidia.com>
Date:   Thu Feb 19 16:10:33 2026 -0800

    Add reference to internal benchmarks repo (#1261)

    Signed-off-by: Igor Gitman <igitman@nvidia.com>

commit 5202af6
Author: Igor Gitman <igitman@nvidia.com>
Date:   Thu Feb 19 16:08:05 2026 -0800

    Remove incorrect presence-penalty setting (#1259)

    Signed-off-by: Igor Gitman <igitman@nvidia.com>

commit 144c70b
Author: Igor Gitman <igitman@nvidia.com>
Date:   Thu Feb 19 15:26:33 2026 -0800

    Adding an option to store benchmarks in external repo (#1240)

    Signed-off-by: Igor Gitman <igitman@nvidia.com>
    Co-authored-by: George Armstrong <georgea@nvidia.com>

commit 10e6e39
Author: George <37293288+Jorjeous@users.noreply.github.com>
Date:   Thu Feb 19 19:57:21 2026 +0400

    update vllm miltimodal for api calls convenience (#1213)

    Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
    Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>
    Co-authored-by: mmkrtchyan <mmkrtchyan@nvidia.com>

commit 1ba4219
Author: Nick Ludwig <nliudvig@nvidia.com>
Date:   Wed Feb 18 03:28:23 2026 +0400

    Fix --server_container not being applied to dependent jobs (#1244)

    Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com>
    Co-authored-by: Igor Gitman <igitman@nvidia.com>

commit 9517614
Author: Wasi Ahmad <wasiahmad@ucla.edu>
Date:   Mon Feb 16 11:13:24 2026 -0800

    Support mini-swe-agent as agent harness (#1212)

    Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
    Signed-off-by: i-vainn <imoshkov@nvidia.com>
    Signed-off-by: George Armstrong <georgea@nvidia.com>
    Signed-off-by: Charlie Truong <chtruong@nvidia.com>
    Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com>
    Signed-off-by: Grigor Nalbandyan <gnalbandyan@nvidia.com>
    Signed-off-by: bzantium <ryumin93@gmail.com>
    Signed-off-by: Stephen Ge <stepheng@nvidia.com>
    Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com>
    Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
    Signed-off-by: Mateusz Winiarek <mwiniarek@nvidia.com>
    Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>
    Signed-off-by: Wei Du <wedu@nvidia.com>
    Signed-off-by: Igor Gitman <igitman@nvidia.com>
    Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com>
    Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
    Signed-off-by: Arkadiusz Nowaczynski <anowaczynski@nvidia.com>
    Signed-off-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
    Co-authored-by: Ivan <imoshkov@nvidia.com>
    Co-authored-by: George Armstrong <georgea@nvidia.com>
    Co-authored-by: Charlie Truong <chtruong@nvidia.com>
    Co-authored-by: Nick Ludwig <nliudvig@nvidia.com>
    Co-authored-by: Wojciech Prazuch <wojciechprazuch3@gmail.com>
    Co-authored-by: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com>
    Co-authored-by: Minho Ryu <ryumin93@gmail.com>
    Co-authored-by: Stephen Ge <stepheng@nvidia.com>
    Co-authored-by: Jiacheng Xu <jcxu@utexas.edu>
    Co-authored-by: Jiacheng Xu <jiachengx@nvidia.com>
    Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com>
    Co-authored-by: Sanyam Kapoor <sanyamk@nvidia.com>
    Co-authored-by: Mateusz Winiarek <72758259+Froxyy-dev@users.noreply.github.com>
    Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
    Co-authored-by: Meline Mkrtchyan <72409758+melllinia@users.noreply.github.com>
    Co-authored-by: Wei Du <wedu@nvidia.com>
    Co-authored-by: Igor Gitman <igitman@nvidia.com>
    Co-authored-by: Sean Naren <snarenthiran@nvidia.com>
    Co-authored-by: Mehrzad Samadi <mehrzadsamadi@gmail.com>
    Co-authored-by: anowaczynski-nvidia <anowaczynski@nvidia.com>
    Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>

commit a3d44dc
Author: Suriya Gunasekar <sgunasekar@users.noreply.github.com>
Date:   Fri Feb 13 22:32:15 2026 -0800

    Add --installation_command support to prepare_data (#1243)

    Signed-off-by: suriya <sgunasekar@nvidia.com>
    Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

commit e80d524
Author: George Armstrong <georgea@nvidia.com>
Date:   Thu Feb 12 17:26:00 2026 -0800

    Fix CI disk space for Docker image builds (#1241)

    Signed-off-by: George Armstrong <georgea@nvidia.com>

commit d22236c
Author: Sadegh Mahdavi <smahdavi4@gmail.com>
Date:   Wed Feb 11 17:55:00 2026 -0800

    Fix answerbench prompt parsing (#1235)

    Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

commit 2401628
Author: George Armstrong <georgea@nvidia.com>
Date:   Wed Feb 11 14:56:43 2026 -0800

    feat: add lockfiles for reproducible sandbox builds (#1233)

    Signed-off-by: George Armstrong <georgea@nvidia.com>

commit 5a0a84d
Author: Wasi Ahmad <wasiahmad@ucla.edu>
Date:   Wed Feb 11 13:30:03 2026 -0800

    removing datasets version restriction for LCB eval (#1230)

    Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

commit ef0a890
Author: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com>
Date:   Wed Feb 11 12:03:16 2026 +0400

    Gnalbandyan/add physics (#1214)

    Signed-off-by: Grigor Nalbandyan <gnalbandyan@nvidia.com>
    Signed-off-by: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com>

commit bd9d30c
Author: Wasi Ahmad <wasiahmad@ucla.edu>
Date:   Tue Feb 10 15:13:27 2026 -0800

    LCB generic prompting (#1215)

    Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

commit 7d6c49a
Author: Sadegh Mahdavi <smahdavi4@gmail.com>
Date:   Sat Feb 7 08:45:46 2026 -0800

    Add support for different variations of nemo-rl (#1220)

    Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

commit b19ba96
Author: George Armstrong <georgea@nvidia.com>
Date:   Fri Feb 6 21:40:56 2026 -0800

    Add multi-node sandbox support for SLURM clusters (#1218)

    Signed-off-by: George Armstrong <georgea@nvidia.com>

commit 8950bb0
Author: anowaczynski-nvidia <anowaczynski@nvidia.com>
Date:   Sat Feb 7 01:38:00 2026 +0100

    support structured outputs in hle judge for optional AA compatibility (#1186)

    Signed-off-by: Arkadiusz Nowaczynski <anowaczynski@nvidia.com>
    Signed-off-by: Igor Gitman <igitman@nvidia.com>
    Co-authored-by: Igor Gitman <igitman@nvidia.com>

commit b84f7a2
Author: Igor Gitman <igitman@nvidia.com>
Date:   Fri Feb 6 14:51:02 2026 -0800

    A small update on running tests docs (#1219)

    Signed-off-by: Igor Gitman <igitman@nvidia.com>

commit 8e838e1
Author: George Armstrong <georgea@nvidia.com>
Date:   Thu Feb 5 18:01:35 2026 -0800

    feat: add flag to disable sandbox replay (#1217)

    Signed-off-by: George Armstrong <georgea@nvidia.com>

commit 5fd9085
Author: Igor Gitman <igitman@nvidia.com>
Date:   Thu Feb 5 15:57:01 2026 -0800

    Add an option to limit number of tool calls (#1216)

    Signed-off-by: Igor Gitman <igitman@nvidia.com>

commit d820200
Author: Igor Gitman <igitman@nvidia.com>
Date:   Tue Feb 3 10:43:55 2026 -0800

    Add arena-hard v2 (#1205)

    Signed-off-by: bzantium <ryumin93@gmail.com>
    Signed-off-by: Igor Gitman <igitman@nvidia.com>
    Co-authored-by: bzantium <ryumin93@gmail.com>

commit a30920e
Author: Igor Gitman <igitman@nvidia.com>
Date:   Mon Feb 2 10:53:55 2026 -0800

    Fix mkdocs warnings (#1204)

    Signed-off-by: Igor Gitman <igitman@nvidia.com>

commit 19d7788
Author: Ivan <imoshkov@nvidia.com>
Date:   Mon Feb 2 23:25:13 2026 +0500

    Fix infinite wait in sandbox.wait_for_sandbox (#1206)

    Signed-off-by: i-vainn <imoshkov@nvidia.com>

commit 3e65fbf
Author: Sadegh Mahdavi <smahdavi4@gmail.com>
Date:   Fri Jan 30 19:38:38 2026 -0800

    Improve tts (#1203)

    Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

commit 250c862
Author: Nick Ludwig <nliudvig@nvidia.com>
Date:   Fri Jan 30 22:12:29 2026 +0400

    SWE-bench: fix SWE-agent hanging, adjust expected scores (#1202)

    Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com>

commit 7ded756
Author: Ivan <imoshkov@nvidia.com>
Date:   Fri Jan 30 09:57:41 2026 +0500

     Add proper token counting to code execution model (#1184)

    Signed-off-by: i-vainn <imoshkov@nvidia.com>
    Co-authored-by: Igor Gitman <igitman@nvidia.com>

commit b986304
Author: Igor Gitman <igitman@nvidia.com>
Date:   Thu Jan 29 17:57:07 2026 -0800

    Upgrade containers (#1198)

    Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
    Signed-off-by: Igor Gitman <igitman@nvidia.com>
    Co-authored-by: Sadegh Mahdavi <smahdavi@nvidia.com>

commit 3b44f02
Author: Dan Lord <blahblahasdf@gmail.com>
Date:   Thu Jan 29 16:40:47 2026 -0800

    Fix incorrect string format (#1199)

    Signed-off-by: dlord <dlord@nvidia.com>

commit c4854b8
Author: Sadegh Mahdavi <smahdavi4@gmail.com>
Date:   Thu Jan 29 13:43:36 2026 -0800

    Update nemo-rl to latest (#1087)

    Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
    Signed-off-by: Igor Gitman <igitman@nvidia.com>
    Co-authored-by: Igor Gitman <igitman@nvidia.com>
dgtm777 pushed a commit that referenced this pull request Mar 18, 2026
…ackends (#1250)

Signed-off-by: Valentin Mendelev <vmendelev@nvidia.com>
dgtm777 pushed a commit that referenced this pull request Mar 18, 2026
…ackends (#1250)

Signed-off-by: Valentin Mendelev <vmendelev@nvidia.com>
Signed-off-by: dgitman <dgitman@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants