Add backend-agnostic unified inference server#1250
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a unified, YAML-configurable NeMo inference server: a CLI entrypoint, backend registry and abstract backend API, two backends (NeMo ASR, Magpie TTS), an OpenAI-compatible FastAPI app with batched inference and audio handling, and Slurm test runners/checkers. Changes
Sequence Diagram(s)sequenceDiagram
actor User as User/CLI
participant CLI as serve_unified.py
participant Server as unified_server.create_app
participant Registry as backends.get_backend
participant Backend as InferenceBackend
participant Model as Model Loader
User->>CLI: Run CLI (args / --config)
CLI->>CLI: parse args, load YAML, setup PYTHONPATH/patches
CLI->>Server: create_app(backend_type, config_dict)
Server->>Registry: get_backend(backend_type)
Registry-->>Server: Backend class
Server->>Backend: instantiate(config) & load_model()
Backend->>Model: load model artifacts
Model-->>Backend: model ready
Server->>Server: register extra routes, init RequestBatcher
CLI->>CLI: start Uvicorn with app
sequenceDiagram
actor Client as Client
participant API as /v1/chat/completions
participant Parser as message parsing
participant Batcher as RequestBatcher
participant Backend as InferenceBackend
participant Model as Model
Client->>API: POST chat request (text/audio)
API->>Parser: extract text/audio/system prompt
Parser-->>API: GenerationRequest
API->>API: validate request
API->>Batcher: add_request(GenerationRequest)
Batcher->>Batcher: accumulate until batch_size or timeout
alt Batch ready or timeout
Batcher->>Backend: generate([requests])
Backend->>Model: run inference
Model-->>Backend: GenerationResult(s)
Backend-->>Batcher: return results
Batcher->>API: resolve futures with results
end
API->>Client: OpenAI-style JSON response (incl. audio if present)
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (6)
recipes/multimodal/server/unified_server.py (5)
38-38: Unusedimport uvicorn.
uvicornis imported at the top of this module but never referenced. It's only used inserve_unified.py. Remove to avoid confusion.Proposed fix
-import uvicorn from fastapi import FastAPI, HTTPException🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/multimodal/server/unified_server.py` at line 38, Remove the unused top-level import "uvicorn" from recipes/multimodal/server/unified_server.py: locate the import statement (import uvicorn) and delete it so the module no longer references an unused symbol; leave imports used by functions/classes in this file intact and ensure serve_unified.py continues to import uvicorn where required.
355-389: Optional file-save logic catches broadExceptionand silently continues.Both the JSON save (line 380) and audio save (line 388) catch
Exceptionbroadly and only print a warning. If the user has configuredAUDIO_SAVE_DIR, they likely expect artifacts to be written. Per coding guidelines, "Do not catch exceptions when they are not normally expected to be raised; let code fail with clear errors instead of silently misbehaving." At minimum, consider logging at a warning level or making the save operation best-effort with a more specific exception type (e.g.,OSError).Proposed fix — narrow the exception type
- except Exception as e: + except OSError as e: print(f"[Server] Warning: Failed to save JSON: {e}")- except Exception as e: + except OSError as e: print(f"[Server] Warning: Failed to save audio: {e}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/multimodal/server/unified_server.py` around lines 355 - 389, The save block currently swallows all Exceptions with prints; change it to handle only expected filesystem errors and surface unexpected failures: replace the broad except blocks around the JSON save (saved_json_path / json.dump) and audio save (saved_audio_path / f.write when result.audio_bytes) to catch OSError (and json.JSONEncodeError if needed) and log via the server logger at warning level, including the error and file path, while letting other exceptions propagate (or re-raise) so failures are visible; ensure you reference save_dir, saved_json_path, saved_audio_path, result.audio_bytes, and the same timestamp/base_filename variables when updating the handlers.
440-446: Re-raise withfromto preserve exception chain.Per best practice (and flagged by static analysis B904), use
raise ... from eso the original traceback is chained properly.Proposed fix
except Exception as e: import traceback traceback.print_exc() - raise HTTPException(status_code=500, detail=str(e)) + raise HTTPException(status_code=500, detail=str(e)) from e🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/multimodal/server/unified_server.py` around lines 440 - 446, In the generic exception handler in unified_server.py (the except Exception as e block), change the re-raise to preserve the original exception chain by using "raise HTTPException(status_code=500, detail=str(e)) from e" instead of the current bare raise; keep the traceback.print_exc() call if you want the local trace printed but ensure the HTTPException is raised with "from e" so the original exception (e) is chained and its traceback is preserved.
84-89: Fire-and-forgetasyncio.create_taskcalls risk silently lost exceptions and client hangs.If a
_process_batchtask fails during lock acquisition or at any point before setting results on futures, the exception is silently lost and clients await indefinitely. This pattern recurs at lines 84, 86, 99, and 136. Store task references and add a done-callback for logging.Example fix — add a done callback for error logging
+ def _log_task_exception(self, task: asyncio.Task): + if task.cancelled(): + return + exc = task.exception() + if exc: + print(f"[Batcher] Background task failed: {exc}") + async def add_request(self, request: GenerationRequest) -> GenerationResult: """Add a request and wait for result.""" future = asyncio.Future() pending = PendingRequest(request=request, future=future, timestamp=time.time()) async with self.lock: self.pending_requests.append(pending) if len(self.pending_requests) >= self.batch_size: if DEBUG: print(f"[Batcher] Batch full ({self.batch_size}), processing immediately") - asyncio.create_task(self._process_batch()) + task = asyncio.create_task(self._process_batch()) + task.add_done_callback(self._log_task_exception) elif self.batch_timeout == 0: - asyncio.create_task(self._process_batch()) + task = asyncio.create_task(self._process_batch()) + task.add_done_callback(self._log_task_exception) elif self.timeout_task is None or self.timeout_task.done(): self.timeout_task = asyncio.create_task(self._timeout_handler()) + self.timeout_task.add_done_callback(self._log_task_exception)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/multimodal/server/unified_server.py` around lines 84 - 89, The create_task calls that fire-and-forget _process_batch (and _timeout_handler) can swallow exceptions and leave client futures pending; change code to capture the returned Task (e.g., assign to a variable or a list field) and attach a done-callback that logs exceptions: for places creating tasks for self._process_batch() and for self._timeout_handler() (referenced as _process_batch and _timeout_handler and the timeout_task attribute), store the task reference (e.g., task = asyncio.create_task(...)), add task.add_done_callback(lambda t: log.error(...) or inspect and log t.exception() if t.cancelled() is False), and keep or dispose the reference appropriately so exceptions are surfaced and logged instead of being silently lost.
230-230: Migrate from@app.on_event("startup")to thelifespancontext manager pattern.
@app.on_event("startup")is deprecated in FastAPI (and its underlying Starlette framework) in favor of the ASGIlifespancontext manager. Usecontextlib.asynccontextmanagerwith thelifespan=parameter onFastAPI()instead, placing startup logic beforeyieldand shutdown logic after.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/multimodal/server/unified_server.py` at line 230, Replace the deprecated `@app.on_event`("startup") handler by creating an async lifespan context manager and passing it into FastAPI(lifespan=...); move the current startup logic implemented where the decorator is applied (look for the function decorated with `@app.on_event`("startup") and any code it runs) into the block before the yield of an asynccontextmanager, and move any teardown/shutdown code (if present) after the yield; then instantiate the app with FastAPI(lifespan=your_lifespan) and remove the `@app.on_event` decorator and its function.nemo_skills/inference/server/serve_unified.py (1)
114-158:build_config_from_argshardcodes backend-specific parameter names, coupling the entrypoint to specific backends.Every new backend that adds a CLI parameter requires updating this list. Consider having each backend declare its own argparse arguments or accepting arbitrary
--key=valuepairs to pass through, keeping the CLI entrypoint truly backend-agnostic.This is acceptable for now as backward-compatible scaffolding, but worth tracking for cleanup.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nemo_skills/inference/server/serve_unified.py` around lines 114 - 158, The build_config_from_args function currently hardcodes backend-specific keys; replace that with dynamic pass-through by iterating over vars(args).items(), skipping core CLI keys ("model","device","dtype","max_new_tokens","temperature","top_p","codec_model") and any keys with value None, and adding every other arg to config_dict; keep the special-case handling for codec_model (map to "codec_model_path") and the store_true flag no_decode_audio -> set "decode_audio": False as before. This removes the long maintained list and lets new backend args be forwarded automatically while preserving existing special mappings.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nemo_skills/inference/server/serve_unified.py`:
- Around line 106-111: load_yaml_config may return None for empty YAML files
which later causes config_dict.pop(...) to fail; update load_yaml_config to
ensure it always returns a dict (e.g., call yaml.safe_load and if result is None
return {} or wrap with cast to dict), so that downstream code using config_dict
(including the config_dict.pop(...) usage) does not raise AttributeError; change
only inside load_yaml_config to default to an empty dict when safe_load yields
None and preserve type hints.
- Around line 90-103: The apply_safetensors_patch function currently catches all
exceptions and only prints a warning which hides failures; update
apply_safetensors_patch to not silently swallow errors by either removing the
broad try/except or re-raising the caught exception after logging so the caller
sees the failure (keep the print/log line for context). Specifically, modify
apply_safetensors_patch to let exceptions from importing safetensors, getting
inspect.getfile, os.makedirs, or shutil.copyfile propagate (or if you keep the
except block, re-raise the exception with raise after the print) so callers
loading NeMo models will receive a clear failure instead of a later confusing
error.
- Line 220: The code currently calls parser.parse_known_args() and assigns to
args, extra_args which silently swallows unknown CLI flags; change this to
parser.parse_args() and remove the unused extra_args variable so unsupported
arguments cause a fast failure. Update the assignment from "args, extra_args =
parser.parse_known_args()" to just "args = parser.parse_args()" (and remove any
handling of extra_args), ensuring the parser variable and subsequent uses of
args (in serve_unified.py) continue to work unchanged.
In `@recipes/multimodal/server/backends/base.py`:
- Around line 220-227: The get_generation_params method is using truthy "or"
which treats explicit zeros (e.g., temperature=0.0, top_p=0.0, top_k=0) as falsy
and incorrectly falls back to self.config; update get_generation_params in the
class that defines it to use explicit None checks for each field (e.g., use
request.temperature if request.temperature is not None else
self.config.temperature, and similarly for max_new_tokens, top_p, top_k) so that
valid zero values are preserved while still falling back to config defaults when
the request field is None.
- Around line 237-244: In validate_request, the audio presence check only
inspects request.audio_bytes and request.audio_path but ignores multi-turn
fields audio_bytes_list and audio_paths; update the has_audio_input computation
in the validate_request method to also consider non-empty
request.audio_bytes_list and request.audio_paths (e.g., list not None and len>0)
so that requests with only multi-turn audio are treated as audio input for the
Modality.AUDIO_IN support check and the "must have either text or audio input"
validation; ensure references remain to Modality.AUDIO_IN and self.name for the
error message.
In `@recipes/multimodal/server/unified_server.py`:
- Around line 223-228: server_config is built with fixed keys, so replace all
uses of server_config.get("key", ...) with direct indexing server_config["key"]
to avoid masking bugs; update every access sites that read model_path,
batch_size, batch_timeout, backend_type (notably the usages around the root
handler and other handlers referenced in the diff: the occurrences you flagged
near lines ~265-274, ~271, ~291 and ~411) to use server_config["model_path"],
server_config["batch_size"], server_config["batch_timeout"], and
server_config["backend_type"] respectively, and remove any fallback defaults
since the keys are guaranteed present.
- Around line 167-198: extract_text_from_messages is including messages with
role "system", causing duplicate system text alongside extract_system_prompt;
modify extract_text_from_messages to skip any message where message.get("role")
== "system" (i.e., only collect content from non-system roles), leaving
extract_system_prompt as-is so the system prompt only appears in the
system_prompt field of GenerationRequest; update the function that iterates
messages (extract_text_from_messages) to check message.get("role") before
processing content.
- Around line 101-138: The batch-result association in _process_batch is unsafe
because zip(batch, results) will silently drop pairs if backend.generate returns
fewer results than requests; update the code to validate that results length
matches len(requests) after calling self.backend.generate and raise or set
exceptions for unmatched pending items if mismatched, and then iterate with an
indexed loop (e.g., for i, pending in enumerate(batch): result = results[i]) or
use explicit range checking to avoid silent truncation; reference
_process_batch, self.backend.generate, batch, results, and the zip(batch,
results) usage when making the change.
- Around line 304-338: The handler chat_completions currently uses
request.get("messages", []) which silently masks a missing required field;
change to use direct access request["messages"] (and catch KeyError to raise
HTTPException 400 with a clear message) so malformed requests fail fast, while
keeping .get(...) for optional params (max_tokens, temperature, top_p, seed).
Locate the chat_completions function and replace request.get("messages", [])
with request["messages"], add a try/except KeyError around that access (or
explicit if "messages" not in request) to raise HTTPException(status_code=400,
detail="Missing required field: messages"), and ensure subsequent uses
(extract_audio_from_messages, extract_text_from_messages, GenerationRequest)
operate on the validated messages list.
---
Nitpick comments:
In `@nemo_skills/inference/server/serve_unified.py`:
- Around line 114-158: The build_config_from_args function currently hardcodes
backend-specific keys; replace that with dynamic pass-through by iterating over
vars(args).items(), skipping core CLI keys
("model","device","dtype","max_new_tokens","temperature","top_p","codec_model")
and any keys with value None, and adding every other arg to config_dict; keep
the special-case handling for codec_model (map to "codec_model_path") and the
store_true flag no_decode_audio -> set "decode_audio": False as before. This
removes the long maintained list and lets new backend args be forwarded
automatically while preserving existing special mappings.
In `@recipes/multimodal/server/unified_server.py`:
- Line 38: Remove the unused top-level import "uvicorn" from
recipes/multimodal/server/unified_server.py: locate the import statement (import
uvicorn) and delete it so the module no longer references an unused symbol;
leave imports used by functions/classes in this file intact and ensure
serve_unified.py continues to import uvicorn where required.
- Around line 355-389: The save block currently swallows all Exceptions with
prints; change it to handle only expected filesystem errors and surface
unexpected failures: replace the broad except blocks around the JSON save
(saved_json_path / json.dump) and audio save (saved_audio_path / f.write when
result.audio_bytes) to catch OSError (and json.JSONEncodeError if needed) and
log via the server logger at warning level, including the error and file path,
while letting other exceptions propagate (or re-raise) so failures are visible;
ensure you reference save_dir, saved_json_path, saved_audio_path,
result.audio_bytes, and the same timestamp/base_filename variables when updating
the handlers.
- Around line 440-446: In the generic exception handler in unified_server.py
(the except Exception as e block), change the re-raise to preserve the original
exception chain by using "raise HTTPException(status_code=500, detail=str(e))
from e" instead of the current bare raise; keep the traceback.print_exc() call
if you want the local trace printed but ensure the HTTPException is raised with
"from e" so the original exception (e) is chained and its traceback is
preserved.
- Around line 84-89: The create_task calls that fire-and-forget _process_batch
(and _timeout_handler) can swallow exceptions and leave client futures pending;
change code to capture the returned Task (e.g., assign to a variable or a list
field) and attach a done-callback that logs exceptions: for places creating
tasks for self._process_batch() and for self._timeout_handler() (referenced as
_process_batch and _timeout_handler and the timeout_task attribute), store the
task reference (e.g., task = asyncio.create_task(...)), add
task.add_done_callback(lambda t: log.error(...) or inspect and log t.exception()
if t.cancelled() is False), and keep or dispose the reference appropriately so
exceptions are surfaced and logged instead of being silently lost.
- Line 230: Replace the deprecated `@app.on_event`("startup") handler by creating
an async lifespan context manager and passing it into FastAPI(lifespan=...);
move the current startup logic implemented where the decorator is applied (look
for the function decorated with `@app.on_event`("startup") and any code it runs)
into the block before the yield of an asynccontextmanager, and move any
teardown/shutdown code (if present) after the yield; then instantiate the app
with FastAPI(lifespan=your_lifespan) and remove the `@app.on_event` decorator and
its function.
| server_config = { | ||
| "backend_type": backend_type, | ||
| "model_path": config_dict.get("model_path", ""), | ||
| "batch_size": batch_size, | ||
| "batch_timeout": batch_timeout, | ||
| } |
There was a problem hiding this comment.
Use direct dictionary access instead of .get() on server_config where keys are always present.
server_config is populated at lines 223-228 with known keys (backend_type, model_path, batch_size, batch_timeout). Accessing these via .get() (lines 271, 291, 411) masks potential bugs. As per coding guidelines, "Do not use .get() for accessing dictionary keys if the code expects them to be present; use direct dictionary access dict[key] instead."
Proposed fix (example for root endpoint)
`@app.get`("/")
async def root():
"""Root endpoint with server info."""
return {
"service": "Unified NeMo Inference Server",
"version": "1.0.0",
- "backend": server_config.get("backend_type"),
- "model": server_config.get("model_path"),
+ "backend": server_config["backend_type"],
+ "model": server_config["model_path"],
"endpoints": ["/v1/chat/completions", "/health", "/v1/models"],
}Apply the same change at lines 291 and 411.
Also applies to: 265-274
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@recipes/multimodal/server/unified_server.py` around lines 223 - 228,
server_config is built with fixed keys, so replace all uses of
server_config.get("key", ...) with direct indexing server_config["key"] to avoid
masking bugs; update every access sites that read model_path, batch_size,
batch_timeout, backend_type (notably the usages around the root handler and
other handlers referenced in the diff: the occurrences you flagged near lines
~265-274, ~271, ~291 and ~411) to use server_config["model_path"],
server_config["batch_size"], server_config["batch_timeout"], and
server_config["backend_type"] respectively, and remove any fallback defaults
since the keys are guaranteed present.
| @app.post("/v1/chat/completions") | ||
| async def chat_completions(request: Dict[str, Any]): | ||
| """OpenAI-compatible chat completions endpoint with audio support.""" | ||
| if backend_instance is None or not backend_instance.is_loaded: | ||
| raise HTTPException(status_code=503, detail="Model not loaded") | ||
|
|
||
| try: | ||
| messages = request.get("messages", []) | ||
| if not messages: | ||
| raise HTTPException(status_code=400, detail="No messages provided") | ||
|
|
||
| # Extract components from messages | ||
| audio_bytes_list = extract_audio_from_messages(messages) | ||
| text = extract_text_from_messages(messages) | ||
| system_prompt = extract_system_prompt(messages) | ||
|
|
||
| # Get generation parameters | ||
| max_tokens = request.get("max_tokens", 512) | ||
| temperature = request.get("temperature", 1.0) | ||
| top_p = request.get("top_p", 1.0) | ||
| seed = request.get("seed") | ||
|
|
||
| # Create generation request | ||
| gen_request = GenerationRequest( | ||
| text=text if text else None, | ||
| system_prompt=system_prompt, | ||
| audio_bytes=audio_bytes_list[0] if len(audio_bytes_list) == 1 else None, | ||
| audio_bytes_list=audio_bytes_list if len(audio_bytes_list) > 1 else None, | ||
| max_new_tokens=max_tokens, | ||
| temperature=temperature, | ||
| top_p=top_p, | ||
| seed=seed, | ||
| request_id=hashlib.md5(f"{time.time()}".encode()).hexdigest()[:8], | ||
| extra_params=request.get("extra_body", {}), | ||
| ) |
There was a problem hiding this comment.
request.get("messages", []) — per coding guidelines, use direct access for expected keys.
messages is a required field for the /v1/chat/completions endpoint. Using .get("messages", []) silently returns an empty list instead of failing with a clear KeyError. Similarly, request.get("max_tokens", 512) at line 321 and others silently apply defaults for what may be an improperly formed request. Use request["messages"] for required fields. As per coding guidelines, "Do not use .get() for accessing dictionary keys if the code expects them to be present; use direct dictionary access dict[key] instead."
Note that max_tokens, temperature, top_p, and seed are legitimately optional per the OpenAI API spec, so .get() with defaults is appropriate for those.
Proposed fix for required field
- messages = request.get("messages", [])
- if not messages:
- raise HTTPException(status_code=400, detail="No messages provided")
+ messages = request.get("messages")
+ if not messages:
+ raise HTTPException(status_code=400, detail="'messages' field is required and must not be empty")🧰 Tools
🪛 Ruff (0.15.1)
[warning] 313-313: Abstract raise to an inner function
(TRY301)
[error] 336-336: Probable use of insecure hash functions in hashlib: md5
(S324)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@recipes/multimodal/server/unified_server.py` around lines 304 - 338, The
handler chat_completions currently uses request.get("messages", []) which
silently masks a missing required field; change to use direct access
request["messages"] (and catch KeyError to raise HTTPException 400 with a clear
message) so malformed requests fail fast, while keeping .get(...) for optional
params (max_tokens, temperature, top_p, seed). Locate the chat_completions
function and replace request.get("messages", []) with request["messages"], add a
try/except KeyError around that access (or explicit if "messages" not in
request) to raise HTTPException(status_code=400, detail="Missing required field:
messages"), and ensure subsequent uses (extract_audio_from_messages,
extract_text_from_messages, GenerationRequest) operate on the validated messages
list.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nemo_skills/inference/server/serve_unified.py`:
- Around line 155-159: The parse_extra_args loop silently skips tokens that
don't start with "--", which hides user mistakes; update the parse_extra_args
function to raise a clear error instead of continuing when encountering a token
that does not start with "--" (e.g., treat arg in the while loop as invalid and
raise ValueError or a custom exception mentioning the offending token), ensuring
any non-`--` prefixed token is reported as unrecognized so callers fail fast
rather than being ignored.
- Around line 301-306: Remove the redundant broad exception handler in the main
flow: delete the "except Exception as e:" block that prints the error, defers
"import traceback", calls traceback.print_exc(), and sys.exit(1); let unhandled
exceptions from create_app and uvicorn.run propagate so Python's default handler
provides the traceback and non-zero exit. Specifically, modify the main function
(the code surrounding create_app(...) and uvicorn.run(...)) to not catch
Exception, and remove the deferred import of traceback and the sys.exit call
tied to that handler.
---
Duplicate comments:
In `@nemo_skills/inference/server/serve_unified.py`:
- Around line 103-116: The try/except in apply_safetensors_patch currently
swallows all exceptions with a print; replace that silent handling by importing
logging, obtaining a logger (e.g. logging.getLogger(__name__)), and in the
except block call logger.exception or logger.error(..., exc_info=True) to record
the full stack trace and context (include the hack_path and dest_path), then
re-raise the exception so failures are visible to callers; update references to
safetensors.torch and inspect.getfile within apply_safetensors_patch only (do
not change other functions).
- Around line 119-124: load_yaml_config currently returns whatever
yaml.safe_load yields (which can be None for empty/comment-only files) causing
downstream code like the config_dict.pop(...) call to fail; update
load_yaml_config to coerce the result into an empty dict when safe_load returns
None (e.g., assign result = yaml.safe_load(f) and if not isinstance(result,
dict) set result = {} or raise a clear error) so callers always receive a dict
and config_dict.pop will not throw AttributeError; reference function
load_yaml_config and the downstream usage config_dict.pop to locate the change.
| while i < len(extra_args): | ||
| arg = extra_args[i] | ||
| if not arg.startswith("--"): | ||
| i += 1 | ||
| continue |
There was a problem hiding this comment.
parse_extra_args silently drops non--- prefixed tokens.
Any token in extra_args that does not start with -- (e.g., -model /path single-dash typo, or a stray positional argument) is silently skipped. Per coding guidelines, unused user-passed parameters should not be silently ignored — the code should fail fast.
🛡️ Proposed fix — raise on unrecognized tokens
if not arg.startswith("--"):
- i += 1
- continue
+ raise ValueError(f"Unexpected argument: {arg!r}. Backend-specific flags must use '--' prefix.")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@nemo_skills/inference/server/serve_unified.py` around lines 155 - 159, The
parse_extra_args loop silently skips tokens that don't start with "--", which
hides user mistakes; update the parse_extra_args function to raise a clear error
instead of continuing when encountering a token that does not start with "--"
(e.g., treat arg in the while loop as invalid and raise ValueError or a custom
exception mentioning the offending token), ensuring any non-`--` prefixed token
is reported as unrecognized so callers fail fast rather than being ignored.
| except Exception as e: | ||
| print(f"[serve_unified] Error: {e}") | ||
| import traceback | ||
|
|
||
| traceback.print_exc() | ||
| sys.exit(1) |
There was a problem hiding this comment.
Broad except Exception in main is redundant and violates coding guidelines.
uvicorn.run and create_app failures are not silently suppressed here (the traceback is printed), but Python's own unhandled-exception handler provides the same traceback and non-zero exit by default. Wrapping with a broad except Exception violates the guideline "Do not catch exceptions when they are not normally expected to be raised". The deferred import traceback inside the except block is also an unnecessary pattern.
🐛 Proposed fix — remove the redundant handler
- except ImportError as e:
- print(f"[serve_unified] Error: Failed to import unified server: {e}")
- print("[serve_unified] Make sure the recipes.multimodal.server package is in PYTHONPATH")
- sys.exit(1)
- except Exception as e:
- print(f"[serve_unified] Error: {e}")
- import traceback
-
- traceback.print_exc()
- sys.exit(1)
+ except ImportError as e:
+ print(f"[serve_unified] Error: Failed to import unified server: {e}")
+ print("[serve_unified] Make sure the recipes.multimodal.server package is in PYTHONPATH")
+ sys.exit(1)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| except Exception as e: | |
| print(f"[serve_unified] Error: {e}") | |
| import traceback | |
| traceback.print_exc() | |
| sys.exit(1) |
🧰 Tools
🪛 Ruff (0.15.1)
[warning] 301-301: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@nemo_skills/inference/server/serve_unified.py` around lines 301 - 306, Remove
the redundant broad exception handler in the main flow: delete the "except
Exception as e:" block that prints the error, defers "import traceback", calls
traceback.print_exc(), and sys.exit(1); let unhandled exceptions from create_app
and uvicorn.run propagate so Python's default handler provides the traceback and
non-zero exit. Specifically, modify the main function (the code surrounding
create_app(...) and uvicorn.run(...)) to not catch Exception, and remove the
deferred import of traceback and the sys.exit call tied to that handler.
3bb29cb to
424705c
Compare
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (8)
tests/slurm-tests/unified_asr/run_test.py (1)
110-110: Same unquotedworkspaceissue as inunified_tts/run_test.pyline 127.The fix is symmetric — see the comment raised there.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/slurm-tests/unified_asr/run_test.py` at line 110, The checker_cmd f-string uses unquoted args.workspace which breaks when the workspace path contains spaces; update the construction of checker_cmd to safely quote or shell-escape args.workspace (e.g. use shlex.quote on args.workspace) and ensure shlex is imported where checker_cmd is defined so the command becomes safe for shell execution; refer to the checker_cmd variable and args.workspace in run_test.py when making the change.recipes/multimodal/server/backends/base.py (2)
237-244:⚠️ Potential issue | 🟠 Major
validate_requestignores multi-turn audio fields (audio_bytes_list,audio_paths).A request with only
audio_bytes_listset passes neither the audio check nor the text check, so it's rejected as "must have either text or audio input" even though audio was provided.Proposed fix
has_text_input = request.text is not None - has_audio_input = request.audio_bytes is not None or request.audio_path is not None + has_audio_input = ( + request.audio_bytes is not None + or request.audio_path is not None + or (request.audio_bytes_list is not None and len(request.audio_bytes_list) > 0) + or (request.audio_paths is not None and len(request.audio_paths) > 0) + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/multimodal/server/backends/base.py` around lines 237 - 244, The validate_request logic incorrectly computes has_audio_input, ignoring multi-turn fields audio_bytes_list and audio_paths; update the has_audio_input check in validate_request to also consider non-empty request.audio_bytes_list and request.audio_paths (e.g., not None and len(...)>0) so requests with multi-turn audio are treated as audio input, and keep the existing Modality.AUDIO_IN check and error message (refer to validate_request, has_text_input, has_audio_input, Modality.AUDIO_IN, and self.name).
220-227:⚠️ Potential issue | 🔴 CriticalBug:
oroperator silently discards explicit0/0.0values.
request.temperature or self.config.temperatureevaluates to the config default when the user explicitly setstemperature=0.0(greedy decoding), because0.0is falsy. Same fortop_k=0andmax_new_tokens=0.Proposed fix
def get_generation_params(self, request: GenerationRequest) -> Dict[str, Any]: """Get effective generation parameters, merging request with config defaults.""" return { - "max_new_tokens": request.max_new_tokens or self.config.max_new_tokens, - "temperature": request.temperature or self.config.temperature, - "top_p": request.top_p or self.config.top_p, - "top_k": request.top_k or self.config.top_k, + "max_new_tokens": request.max_new_tokens if request.max_new_tokens is not None else self.config.max_new_tokens, + "temperature": request.temperature if request.temperature is not None else self.config.temperature, + "top_p": request.top_p if request.top_p is not None else self.config.top_p, + "top_k": request.top_k if request.top_k is not None else self.config.top_k, }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/multimodal/server/backends/base.py` around lines 220 - 227, The get_generation_params method is incorrectly using "or" which treats explicit zero values (e.g., request.temperature == 0.0, request.top_k == 0, request.max_new_tokens == 0) as falsy and falls back to self.config defaults; update get_generation_params to use explicit None checks so each field uses the request value when it is not None (e.g., use "request.max_new_tokens if request.max_new_tokens is not None else self.config.max_new_tokens", same for temperature, top_p, top_k) while keeping the return keys unchanged and the type signature (GenerationRequest and self.config) intact.recipes/multimodal/server/unified_server.py (2)
182-198:⚠️ Potential issue | 🟠 Major
extract_text_from_messagesincludes system-role text, duplicating the system prompt.System messages are extracted separately by
extract_system_prompt(line 333), butextract_text_from_messagesalso collects them. Both values are passed toGenerationRequest, causing the system prompt to appear in bothtextandsystem_promptfields.Proposed fix
def extract_text_from_messages(messages: List[Dict[str, Any]]) -> str: """Extract text content from OpenAI-format messages.""" texts = [] for message in messages: + if message.get("role") == "system": + continue content = message.get("content")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/multimodal/server/unified_server.py` around lines 182 - 198, extract_text_from_messages is currently including messages with role "system", causing the system prompt to be duplicated because extract_system_prompt also extracts it and both are passed into GenerationRequest; update extract_text_from_messages to skip any message where message.get("role") == "system" before processing content so system messages are not appended to the returned text. Keep all existing handling for string and list content, only add the role check at the top of the loop in extract_text_from_messages to prevent duplication with extract_system_prompt/GenerationRequest.
118-122:⚠️ Potential issue | 🟠 Major
zip(batch, results)silently truncates if backend returns fewer results than requests.If
self.backend.generatereturns a shorter list, some futures will never resolve. Add a length check or usestrict=True.Proposed fix
+ if len(results) != len(batch): + raise RuntimeError( + f"Backend returned {len(results)} results for {len(batch)} requests" + ) - for pending, result in zip(batch, results): + for pending, result in zip(batch, results, strict=True):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/multimodal/server/unified_server.py` around lines 118 - 122, The loop that pairs batch with results uses zip(batch, results) which silently truncates if self.backend.generate returns fewer items, leaving some pending.future unresolved; change the pairing to validate lengths (e.g., compare len(results) to len(batch)) or use zip(..., strict=True) and, if a mismatch occurs, set an exception on any remaining pending.future entries (via pending.future.set_exception(...)) and/or raise an explicit error so callers aren't left hanging; update the block around loop.run_in_executor and uses of results, batch, pending.future.set_result to perform this length check and handle the short/long result cases deterministically.nemo_skills/inference/server/serve_unified.py (3)
119-124:⚠️ Potential issue | 🟡 Minor
yaml.safe_loadreturnsNonefor empty files — will crash at line 244.If the YAML file is empty,
safe_loadreturnsNone, andconfig_dict.pop("backend", ...)will raiseAttributeError.Proposed fix
def load_yaml_config(config_path: str) -> dict: """Load YAML config file.""" import yaml with open(config_path) as f: - return yaml.safe_load(f) + config = yaml.safe_load(f) + if not isinstance(config, dict): + raise ValueError(f"Config file '{config_path}' must contain a YAML mapping, got {type(config)}") + return config🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nemo_skills/inference/server/serve_unified.py` around lines 119 - 124, The YAML loader can return None for empty files causing later code (e.g., config_dict.pop("backend", ...)) to fail; update load_yaml_config to ensure it always returns a dict by calling yaml.safe_load(f) into a variable, then return that variable if it's a dict or return an empty dict ({}) when safe_load returns None or a non-dict value so callers (like code that uses config_dict.pop) never receive None.
155-159:parse_extra_argssilently drops non---prefixed tokens.Stray positional arguments or single-dash typos (e.g.,
-model) are silently ignored. Consider raising on unrecognized tokens.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nemo_skills/inference/server/serve_unified.py` around lines 155 - 159, The loop in parse_extra_args silently skips tokens that don't start with "--" (variable extra_args, loop variable arg), causing stray positional args or single-dash typos to be dropped; update parse_extra_args to treat non-"--" prefixed tokens as invalid by raising a clear exception (e.g., ValueError) or returning an error with context (include the offending token and suggestion to use double-dash flags), instead of incrementing i and continuing, so callers are immediately notified of unrecognized/typoed arguments.
103-116:⚠️ Potential issue | 🟡 Minor
apply_safetensors_patchsilently swallows failures.If the file copy fails, the server continues but model loading may later fail with a confusing error. Consider separating
ImportError(safetensors not installed — skip) from other exceptions (propagate).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nemo_skills/inference/server/serve_unified.py` around lines 103 - 116, The apply_safetensors_patch function currently catches all exceptions and hides real errors; change its error handling so that ImportError (or ModuleNotFoundError) when importing safetensors.torch is handled by returning/skip, but any other exceptions (e.g., failures in inspect.getfile, os.makedirs, shutil.copyfile) are not swallowed—log a clear error including hack_path and dest_path and re-raise the exception so the caller sees the failure; update the try/except around the import to only catch import-related errors and move file operations into their own try/except that logs and re-raises on unexpected errors, referencing apply_safetensors_patch, hack_path, st_torch, and dest_path to locate the code.
🧹 Nitpick comments (11)
tests/slurm-tests/unified_asr/run_test.py (1)
52-64: Clarify the intent of++server.server_type=vllm_multimodalalongsideserver_type="generic".Line 52 passes a Hydra override
++server.server_type=vllm_multimodalwhile line 64 passesserver_type="generic"as a keyword argument togenerate(). These appear to configure the same concept through two different mechanisms. Ifvllm_multimodalis intentional for the inference client side (controlling how requests are formatted) andgenericis the server-launcher type, a brief comment explaining the distinction would prevent future confusion. The same pattern exists inunified_tts/run_test.py.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/slurm-tests/unified_asr/run_test.py` around lines 52 - 64, Clarify the dual configuration by documenting which setting controls the client request format vs the server launcher: in the call to generate(...) (symbol: generate) add a brief comment explaining that the Hydra override ++server.server_type=vllm_multimodal configures the inference client/request formatting while the keyword argument server_type="generic" selects the server-launcher implementation, and do the same explanatory comment in unified_tts/run_test.py to avoid future confusion.tests/slurm-tests/unified_asr/check_results.py (1)
64-73:load_outputsis duplicated verbatim fromunified_tts/check_results.py.Identical implementation — consider hoisting it into
tests/slurm-tests/utils.pyalongsidesoft_assert/assert_all.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/slurm-tests/unified_asr/check_results.py` around lines 64 - 73, The load_outputs implementation is duplicated; move the function into a shared module (e.g., tests/slurm-tests/utils.py) alongside existing helpers soft_assert/assert_all, export it, and replace the local definition in tests/slurm-tests/unified_asr/check_results.py with an import of load_outputs from that utils module so both unified_asr/check_results.py and unified_tts/check_results.py reuse the single shared function.tests/slurm-tests/unified_tts/run_test.py (1)
30-32:ensure_workspace_existsis duplicated verbatim fromunified_asr/run_test.py.Both files define the exact same three-line function. Consider extracting it into the shared
tests/slurm-tests/utils.py(which already hostssoft_assert/assert_all) and importing it from there.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/slurm-tests/unified_tts/run_test.py` around lines 30 - 32, The function ensure_workspace_exists is duplicated; move that implementation into the shared utils module (utils.py) so both tests reuse it: add ensure_workspace_exists(workspace: str, cluster: str, config_dir: str | None = None) to utils.py (it should call get_cluster_config and create_remote_directory) and then replace the local definitions in both run_test.py files with an import from utils (e.g., from utils import ensure_workspace_exists); update any imports if get_cluster_config/create_remote_directory are in different modules so utils has the needed imports, and remove the duplicated function bodies.recipes/multimodal/server/backends/nemo_asr_backend.py (2)
39-61: Hardcodedknownfield set infrom_dictcan drift from actual dataclass fields.Both
NeMoASRConfigandMagpieTTSConfigmaintain a manually-listedknownset instead of usingcls.__dataclass_fields__like the parentBackendConfig.from_dict()does. If a field is added to the dataclass but forgotten inknown, it will silently land inextra_config.Suggested approach — reuse parent pattern
`@classmethod` def from_dict(cls, d: Dict[str, Any]) -> "NeMoASRConfig": if d.get("model_name") and not d.get("model_path"): d = {**d, "model_path": d["model_name"]} - known = { - "model_path", - "model_name", - ... - } + known = {f.name for f in cls.__dataclass_fields__.values()} - {"extra_config"} return cls( **{k: v for k, v in d.items() if k in known}, extra_config={k: v for k, v in d.items() if k not in known}, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/multimodal/server/backends/nemo_asr_backend.py` around lines 39 - 61, The from_dict method in NeMoASRConfig currently uses a hardcoded known set which can drift from the dataclass fields; update NeMoASRConfig.from_dict to mirror the parent BackendConfig.from_dict pattern by deriving the known keys from cls.__dataclass_fields__ (e.g., known = set(cls.__dataclass_fields__.keys())) and then use that set to split d into constructor kwargs and extra_config, keeping the existing model_name->model_path override logic intact; apply the same change for MagpieTTSConfig.from_dict if present.
100-104: Silenced.to(device)exception — consider logging the swallowed error.The broad
except Exception: passwhen moving the model to a device can mask real failures (e.g., OOM). At minimum, log the exception at debug level so it's diagnosable.Proposed fix
try: self._model.to(self.config.device) - except Exception: - # Some NeMo wrappers may already be on the correct device. - pass + except Exception as e: + print(f"[NeMoASRBackend] Note: .to({self.config.device}) skipped: {e}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/multimodal/server/backends/nemo_asr_backend.py` around lines 100 - 104, Replace the bare "except Exception: pass" around self._model.to(self.config.device) with an exception handler that captures the exception (e.g., except Exception as e) and logs the full error/traceback at debug level so failures (including OOM) are diagnosable; use an existing logger (self.logger if available) or logging.getLogger(__name__) and call logger.debug("Failed to move model to %s", self.config.device, exc_info=True) before continuing.recipes/multimodal/server/backends/magpie_tts_backend.py (3)
413-415:__del__for temp directory cleanup is fragile.
__del__is not guaranteed to run (especially during interpreter shutdown). Since eachgenerate()call already cleans up itsbatch_dir, the_temp_dirparent would only contain stale data on unclean exits. Consider implementing a context manager or explicitshutdown()method that the server can call.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/multimodal/server/backends/magpie_tts_backend.py` around lines 413 - 415, The finalizer __del__ used to remove self._temp_dir is fragile; replace it with an explicit cleanup API and/or context manager so the server can deterministically remove stale files. Add a public method (e.g., shutdown() or close()) on the MagpieTTS backend that checks getattr(self, "_temp_dir", None) and removes it (shutil.rmtree(...)) and document that the server must call this on shutdown; alternatively implement __enter__/__exit__ on the same class so callers can use a with-statement to ensure cleanup; keep generate() behavior of removing each batch_dir but remove the reliance on __del__ for parent _temp_dir cleanup.
283-292: Silently swallowed cache reset and normalization failures could mask issues.Lines 290-291 (
except Exception: passon cache reset) and 321-322 (same for normalization) both swallow errors silently. While defensive, at least logging at debug level would help diagnose unexpected failures during generation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/multimodal/server/backends/magpie_tts_backend.py` around lines 283 - 292, The code currently swallows exceptions during the cache reset (around decoder.reset_cache on self._model) and the subsequent normalization step; change each bare "except Exception: pass" to log the caught exception at debug level (e.g., "except Exception as e: self._logger.debug('Failed to reset decoder cache', exc_info=e)" or similar) so failures are recorded without crashing; ensure you reference the same symbols (self._model, decoder.reset_cache and the normalization call) and preserve existing behavior (do not re-raise) while including contextual messages and exception info in the debug logs.
45-76: Same hardcodedknownfield set pattern asNeMoASRConfig.from_dict.See the comment on
nemo_asr_backend.py— usingcls.__dataclass_fields__avoids maintaining a duplicate list of field names.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/multimodal/server/backends/magpie_tts_backend.py` around lines 45 - 76, The from_dict method in MagpieTTSConfig currently builds a hardcoded known set; replace that by deriving allowed field names from the dataclass instead (similar to NeMoASRConfig): use cls.__dataclass_fields__ to compute the set of known keys, then construct the kwargs and extra_config exactly as before (i.e., pass {k:v for k,v in d.items() if k in dataclass_fields} into cls(...) and put the rest into extra_config), and keep the existing CLI alias handling for "codec_model" intact; update references to the local variable "known" to the computed dataclass field set so you don't maintain a duplicate list.recipes/multimodal/__init__.py (1)
23-37: Consider re-exportingModalityfor external consumers.
Modalityis part of the public API (used insupported_modalitiesproperty), yet it's not re-exported here. External consumers would need to reach intorecipes.multimodal.server.backends.baseto reference it. If the intent is to keep the surface minimal, this is fine — just flagging the asymmetry.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/multimodal/__init__.py` around lines 23 - 37, The package is missing a public re-export of Modality which external consumers need for things like InferenceBackend.supported_modalities; import Modality from its defining module (base) and add it to the top-level exports alongside InferenceBackend, GenerationRequest, GenerationResult, BackendConfig, and get_backend so callers can reference Modality without reaching into recipes.multimodal.server.backends.base.recipes/multimodal/server/unified_server.py (2)
78-90: Fire-and-forgetasyncio.create_task— store task references to avoid silent exception loss.Tasks created at lines 84, 86, 99, and 136 are not stored. If they raise unexpectedly, the exception is silently dropped ("Task exception was never retrieved" warning). Store references in a set and discard them on completion.
Proposed approach
class RequestBatcher: def __init__(self, backend, batch_size: int, batch_timeout: float): ... + self._background_tasks: set = set() + + def _track_task(self, task: asyncio.Task): + self._background_tasks.add(task) + task.add_done_callback(self._background_tasks.discard) async def add_request(self, request: GenerationRequest) -> GenerationResult: ... if len(self.pending_requests) >= self.batch_size: - asyncio.create_task(self._process_batch()) + self._track_task(asyncio.create_task(self._process_batch()))Apply the same pattern to lines 86, 99, and 136.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/multimodal/server/unified_server.py` around lines 78 - 90, The fire-and-forget asyncio.create_task calls that launch _process_batch and _timeout_handler must be tracked to avoid lost exceptions: add a self._tasks: Set[asyncio.Task] (initialized in the class __init__), replace each asyncio.create_task(...) in the batcher (the places that reference _process_batch and _timeout_handler when managing pending_requests, batch_size, and timeout_task) with creating the task, adding it to self._tasks, and attaching a done-callback that removes it from the set and retrieves/logs task.exception() (or calls task.exception() inside the callback to suppress "Task exception was never retrieved"). Ensure you apply this pattern for the create_task calls that currently live alongside pending_requests, batch_size, timeout_task and any other create_task usages in the same class (e.g., the ones around lines referencing _process_batch and _timeout_handler).
245-278: Replace@app.on_event("startup")with FastAPI'slifespancontext manager pattern. Theon_eventdecorator is deprecated in FastAPI's API reference in favor of the ASGI lifespan pattern, which consolidates startup and shutdown logic in a single async context manager withyield. FastAPI will stop supportingon_eventin a future version.from contextlib import asynccontextmanager from fastapi import FastAPI `@asynccontextmanager` async def lifespan(app: FastAPI): # startup code yield # shutdown code app = FastAPI(lifespan=lifespan)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/multimodal/server/unified_server.py` around lines 245 - 278, Replace the deprecated `@app.on_event`("startup") startup() function with a FastAPI lifespan asynccontextmanager: move all startup logic (lookup via get_backend(backend_type), ConfigClass.from_dict(config_dict), instantiating BackendClass and calling backend_instance.load_model(), creating RequestBatcher as request_batcher, registering extra routes via BackendClass.get_extra_routes() and app.add_api_route(), and the print-ready logs) into the pre-yield section of the lifespan, yield control, then put any shutdown/cleanup (teardown of request_batcher/backend_instance if needed) after the yield; register the lifespan with FastAPI via app = FastAPI(lifespan=lifespan) so the same symbols (BackendClass, ConfigClass, backend_instance, request_batcher, RequestBatcher, get_backend) are used but following the ASGI lifespan pattern.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@recipes/multimodal/server/unified_server.py`:
- Around line 115-122: Replace asyncio.get_event_loop() with
asyncio.get_running_loop() in the batch processing code so the coroutine uses
the currently running loop; specifically update the call that assigns loop =
asyncio.get_event_loop() (the loop used to call loop.run_in_executor(None,
self.backend.generate, requests)) to loop = asyncio.get_running_loop() to avoid
the deprecation warning and ensure it runs in the active FastAPI async context.
In `@tests/slurm-tests/unified_asr/check_results.py`:
- Around line 38-47: normalize_text only maps single-character tokens via
_DIGIT_TO_WORD so multi-digit numeric tokens like "42" or "100" remain
unchanged; update normalize_text to detect token.isdigit() (or a cleaned numeric
token) and expand multi-digit numbers by mapping each digit to its word form
(e.g., tokens.append(" ".join(_DIGIT_TO_WORD[d] for d in token))) or
alternatively use a numeric-to-words utility (num2words) to convert the whole
token, keeping the current single-digit logic for token in _DIGIT_TO_WORD and
preserving non-numeric tokens as-is.
- Around line 55-61: Accessing row["_reference"] can raise KeyError and crash
the checker; change the code that reads reference_path (the loop that parses
each line into row and assigns refs[row["id"]] = row["_reference"]) to handle
missing keys gracefully: use row.get("_reference") (or wrap in try/except
KeyError) and when the key is absent, record a soft failure (e.g., append a
message to the existing failures/alerts collection or call the test failure
logger) and skip adding that entry to refs, then continue processing. Ensure you
reference variables used in the diff (reference_path.open, row, refs) so the fix
is applied in the same block that builds refs.
In `@tests/slurm-tests/unified_tts/run_test.py`:
- Line 127: The checker_cmd string uses an unquoted args.workspace which will
break when the workspace path contains spaces; change its construction to safely
quote the workspace value (e.g., use shlex.quote(args.workspace) or otherwise
wrap the workspace in quotes) so that wrap_arguments(checker_cmd) receives a
single argument for the workspace; apply the same fix for the similar
checker_cmd in unified_asr run_test.py (the variable name checker_cmd and the
call to wrap_arguments remain the targets to edit).
In `@tests/test_nemo_asr_backend.py`:
- Around line 1-2: Add the standard NVIDIA copyright header to the very top of
tests/test_nemo_asr_backend.py (before any imports) to satisfy CI; ensure the
header block is the exact company copyright text used across the repo and leave
a blank line after the header before the first import (which currently imports
GenerationRequest and NeMoASRBackend/NeMoASRConfig).
---
Duplicate comments:
In `@nemo_skills/inference/server/serve_unified.py`:
- Around line 119-124: The YAML loader can return None for empty files causing
later code (e.g., config_dict.pop("backend", ...)) to fail; update
load_yaml_config to ensure it always returns a dict by calling yaml.safe_load(f)
into a variable, then return that variable if it's a dict or return an empty
dict ({}) when safe_load returns None or a non-dict value so callers (like code
that uses config_dict.pop) never receive None.
- Around line 155-159: The loop in parse_extra_args silently skips tokens that
don't start with "--" (variable extra_args, loop variable arg), causing stray
positional args or single-dash typos to be dropped; update parse_extra_args to
treat non-"--" prefixed tokens as invalid by raising a clear exception (e.g.,
ValueError) or returning an error with context (include the offending token and
suggestion to use double-dash flags), instead of incrementing i and continuing,
so callers are immediately notified of unrecognized/typoed arguments.
- Around line 103-116: The apply_safetensors_patch function currently catches
all exceptions and hides real errors; change its error handling so that
ImportError (or ModuleNotFoundError) when importing safetensors.torch is handled
by returning/skip, but any other exceptions (e.g., failures in inspect.getfile,
os.makedirs, shutil.copyfile) are not swallowed—log a clear error including
hack_path and dest_path and re-raise the exception so the caller sees the
failure; update the try/except around the import to only catch import-related
errors and move file operations into their own try/except that logs and
re-raises on unexpected errors, referencing apply_safetensors_patch, hack_path,
st_torch, and dest_path to locate the code.
In `@recipes/multimodal/server/backends/base.py`:
- Around line 237-244: The validate_request logic incorrectly computes
has_audio_input, ignoring multi-turn fields audio_bytes_list and audio_paths;
update the has_audio_input check in validate_request to also consider non-empty
request.audio_bytes_list and request.audio_paths (e.g., not None and len(...)>0)
so requests with multi-turn audio are treated as audio input, and keep the
existing Modality.AUDIO_IN check and error message (refer to validate_request,
has_text_input, has_audio_input, Modality.AUDIO_IN, and self.name).
- Around line 220-227: The get_generation_params method is incorrectly using
"or" which treats explicit zero values (e.g., request.temperature == 0.0,
request.top_k == 0, request.max_new_tokens == 0) as falsy and falls back to
self.config defaults; update get_generation_params to use explicit None checks
so each field uses the request value when it is not None (e.g., use
"request.max_new_tokens if request.max_new_tokens is not None else
self.config.max_new_tokens", same for temperature, top_p, top_k) while keeping
the return keys unchanged and the type signature (GenerationRequest and
self.config) intact.
In `@recipes/multimodal/server/unified_server.py`:
- Around line 182-198: extract_text_from_messages is currently including
messages with role "system", causing the system prompt to be duplicated because
extract_system_prompt also extracts it and both are passed into
GenerationRequest; update extract_text_from_messages to skip any message where
message.get("role") == "system" before processing content so system messages are
not appended to the returned text. Keep all existing handling for string and
list content, only add the role check at the top of the loop in
extract_text_from_messages to prevent duplication with
extract_system_prompt/GenerationRequest.
- Around line 118-122: The loop that pairs batch with results uses zip(batch,
results) which silently truncates if self.backend.generate returns fewer items,
leaving some pending.future unresolved; change the pairing to validate lengths
(e.g., compare len(results) to len(batch)) or use zip(..., strict=True) and, if
a mismatch occurs, set an exception on any remaining pending.future entries (via
pending.future.set_exception(...)) and/or raise an explicit error so callers
aren't left hanging; update the block around loop.run_in_executor and uses of
results, batch, pending.future.set_result to perform this length check and
handle the short/long result cases deterministically.
In `@tests/slurm-tests/unified_asr/run_test.py`:
- Line 110: The checker_cmd f-string uses unquoted args.workspace which breaks
when the workspace path contains spaces; update the construction of checker_cmd
to safely quote or shell-escape args.workspace (e.g. use shlex.quote on
args.workspace) and ensure shlex is imported where checker_cmd is defined so the
command becomes safe for shell execution; refer to the checker_cmd variable and
args.workspace in run_test.py when making the change.
---
Nitpick comments:
In `@recipes/multimodal/__init__.py`:
- Around line 23-37: The package is missing a public re-export of Modality which
external consumers need for things like InferenceBackend.supported_modalities;
import Modality from its defining module (base) and add it to the top-level
exports alongside InferenceBackend, GenerationRequest, GenerationResult,
BackendConfig, and get_backend so callers can reference Modality without
reaching into recipes.multimodal.server.backends.base.
In `@recipes/multimodal/server/backends/magpie_tts_backend.py`:
- Around line 413-415: The finalizer __del__ used to remove self._temp_dir is
fragile; replace it with an explicit cleanup API and/or context manager so the
server can deterministically remove stale files. Add a public method (e.g.,
shutdown() or close()) on the MagpieTTS backend that checks getattr(self,
"_temp_dir", None) and removes it (shutil.rmtree(...)) and document that the
server must call this on shutdown; alternatively implement __enter__/__exit__ on
the same class so callers can use a with-statement to ensure cleanup; keep
generate() behavior of removing each batch_dir but remove the reliance on
__del__ for parent _temp_dir cleanup.
- Around line 283-292: The code currently swallows exceptions during the cache
reset (around decoder.reset_cache on self._model) and the subsequent
normalization step; change each bare "except Exception: pass" to log the caught
exception at debug level (e.g., "except Exception as e:
self._logger.debug('Failed to reset decoder cache', exc_info=e)" or similar) so
failures are recorded without crashing; ensure you reference the same symbols
(self._model, decoder.reset_cache and the normalization call) and preserve
existing behavior (do not re-raise) while including contextual messages and
exception info in the debug logs.
- Around line 45-76: The from_dict method in MagpieTTSConfig currently builds a
hardcoded known set; replace that by deriving allowed field names from the
dataclass instead (similar to NeMoASRConfig): use cls.__dataclass_fields__ to
compute the set of known keys, then construct the kwargs and extra_config
exactly as before (i.e., pass {k:v for k,v in d.items() if k in
dataclass_fields} into cls(...) and put the rest into extra_config), and keep
the existing CLI alias handling for "codec_model" intact; update references to
the local variable "known" to the computed dataclass field set so you don't
maintain a duplicate list.
In `@recipes/multimodal/server/backends/nemo_asr_backend.py`:
- Around line 39-61: The from_dict method in NeMoASRConfig currently uses a
hardcoded known set which can drift from the dataclass fields; update
NeMoASRConfig.from_dict to mirror the parent BackendConfig.from_dict pattern by
deriving the known keys from cls.__dataclass_fields__ (e.g., known =
set(cls.__dataclass_fields__.keys())) and then use that set to split d into
constructor kwargs and extra_config, keeping the existing model_name->model_path
override logic intact; apply the same change for MagpieTTSConfig.from_dict if
present.
- Around line 100-104: Replace the bare "except Exception: pass" around
self._model.to(self.config.device) with an exception handler that captures the
exception (e.g., except Exception as e) and logs the full error/traceback at
debug level so failures (including OOM) are diagnosable; use an existing logger
(self.logger if available) or logging.getLogger(__name__) and call
logger.debug("Failed to move model to %s", self.config.device, exc_info=True)
before continuing.
In `@recipes/multimodal/server/unified_server.py`:
- Around line 78-90: The fire-and-forget asyncio.create_task calls that launch
_process_batch and _timeout_handler must be tracked to avoid lost exceptions:
add a self._tasks: Set[asyncio.Task] (initialized in the class __init__),
replace each asyncio.create_task(...) in the batcher (the places that reference
_process_batch and _timeout_handler when managing pending_requests, batch_size,
and timeout_task) with creating the task, adding it to self._tasks, and
attaching a done-callback that removes it from the set and retrieves/logs
task.exception() (or calls task.exception() inside the callback to suppress
"Task exception was never retrieved"). Ensure you apply this pattern for the
create_task calls that currently live alongside pending_requests, batch_size,
timeout_task and any other create_task usages in the same class (e.g., the ones
around lines referencing _process_batch and _timeout_handler).
- Around line 245-278: Replace the deprecated `@app.on_event`("startup") startup()
function with a FastAPI lifespan asynccontextmanager: move all startup logic
(lookup via get_backend(backend_type), ConfigClass.from_dict(config_dict),
instantiating BackendClass and calling backend_instance.load_model(), creating
RequestBatcher as request_batcher, registering extra routes via
BackendClass.get_extra_routes() and app.add_api_route(), and the print-ready
logs) into the pre-yield section of the lifespan, yield control, then put any
shutdown/cleanup (teardown of request_batcher/backend_instance if needed) after
the yield; register the lifespan with FastAPI via app =
FastAPI(lifespan=lifespan) so the same symbols (BackendClass, ConfigClass,
backend_instance, request_batcher, RequestBatcher, get_backend) are used but
following the ASGI lifespan pattern.
In `@tests/slurm-tests/unified_asr/check_results.py`:
- Around line 64-73: The load_outputs implementation is duplicated; move the
function into a shared module (e.g., tests/slurm-tests/utils.py) alongside
existing helpers soft_assert/assert_all, export it, and replace the local
definition in tests/slurm-tests/unified_asr/check_results.py with an import of
load_outputs from that utils module so both unified_asr/check_results.py and
unified_tts/check_results.py reuse the single shared function.
In `@tests/slurm-tests/unified_asr/run_test.py`:
- Around line 52-64: Clarify the dual configuration by documenting which setting
controls the client request format vs the server launcher: in the call to
generate(...) (symbol: generate) add a brief comment explaining that the Hydra
override ++server.server_type=vllm_multimodal configures the inference
client/request formatting while the keyword argument server_type="generic"
selects the server-launcher implementation, and do the same explanatory comment
in unified_tts/run_test.py to avoid future confusion.
In `@tests/slurm-tests/unified_tts/run_test.py`:
- Around line 30-32: The function ensure_workspace_exists is duplicated; move
that implementation into the shared utils module (utils.py) so both tests reuse
it: add ensure_workspace_exists(workspace: str, cluster: str, config_dir: str |
None = None) to utils.py (it should call get_cluster_config and
create_remote_directory) and then replace the local definitions in both
run_test.py files with an import from utils (e.g., from utils import
ensure_workspace_exists); update any imports if
get_cluster_config/create_remote_directory are in different modules so utils has
the needed imports, and remove the duplicated function bodies.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (18)
nemo_skills/inference/server/serve_unified.pyrecipes/multimodal/__init__.pyrecipes/multimodal/server/__init__.pyrecipes/multimodal/server/backends/__init__.pyrecipes/multimodal/server/backends/base.pyrecipes/multimodal/server/backends/magpie_tts_backend.pyrecipes/multimodal/server/backends/nemo_asr_backend.pyrecipes/multimodal/server/unified_server.pytests/slurm-tests/run_all.shtests/slurm-tests/unified_asr/asr_openai.testtests/slurm-tests/unified_asr/check_results.pytests/slurm-tests/unified_asr/run_test.pytests/slurm-tests/unified_tts/README.mdtests/slurm-tests/unified_tts/check_results.pytests/slurm-tests/unified_tts/run_test.pytests/slurm-tests/unified_tts/tts_openai.testtests/test_nemo_asr_backend.pytests/test_unified_server_audio_parser.py
✅ Files skipped from review due to trivial changes (1)
- tests/slurm-tests/unified_asr/asr_openai.test
| with reference_path.open("rt", encoding="utf-8") as fin: | ||
| for line in fin: | ||
| if not line.strip(): | ||
| continue | ||
| row = json.loads(line) | ||
| refs[row["id"]] = row["_reference"] | ||
| return refs |
There was a problem hiding this comment.
row["_reference"] raises an unhandled KeyError if the field is absent.
If any row in asr_openai.test is missing the "_reference" key (e.g., a row added for a different purpose or a malformed test file), this will hard-crash the script rather than recording a soft assertion failure like the rest of the checker.
🛡️ Proposed fix
- refs[row["id"]] = row["_reference"]
+ ref = row.get("_reference")
+ if ref is None:
+ soft_assert(False, f"Row id={row.get('id')} missing '_reference' field in test file")
+ else:
+ refs[row["id"]] = ref🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/slurm-tests/unified_asr/check_results.py` around lines 55 - 61,
Accessing row["_reference"] can raise KeyError and crash the checker; change the
code that reads reference_path (the loop that parses each line into row and
assigns refs[row["id"]] = row["_reference"]) to handle missing keys gracefully:
use row.get("_reference") (or wrap in try/except KeyError) and when the key is
absent, record a soft failure (e.g., append a message to the existing
failures/alerts collection or call the test failure logger) and skip adding that
entry to refs, then continue processing. Ensure you reference variables used in
the diff (reference_path.open, row, refs) so the fix is applied in the same
block that builds refs.
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (3)
recipes/multimodal/server/unified_server.py (2)
184-197:⚠️ Potential issue | 🟠 MajorExclude
role=="system"fromextract_text_from_messagesto prevent prompt duplication.System text is currently included in both
text(via this function) andsystem_prompt(viaextract_system_prompt), which can double-apply instructions.🔧 Proposed fix
def extract_text_from_messages(messages: List[Dict[str, Any]]) -> str: """Extract text content from OpenAI-format messages.""" texts = [] for message in messages: + if message.get("role") == "system": + continue content = message.get("content") if isinstance(content, str): if content: texts.append(content)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/multimodal/server/unified_server.py` around lines 184 - 197, The function extract_text_from_messages should skip messages where message.get("role") == "system" to avoid duplicating system prompt text (which is already handled by extract_system_prompt); update the loop in extract_text_from_messages to check message.get("role") and continue when it equals "system", keeping the existing logic for string, list, dict items (look for "content", list items with type=="text", and string items) so only non-system content is collected and returned.
117-121:⚠️ Potential issue | 🔴 CriticalGuard against backend/result length mismatch before completing futures.
Line 119 still uses
zip(batch, results)without validating result count. If backend returns fewer items, some futures are never resolved and requests can hang.🔧 Proposed fix
loop = asyncio.get_event_loop() results = await loop.run_in_executor(None, self.backend.generate, requests) - for pending, result in zip(batch, results): + if len(results) != len(batch): + err = RuntimeError( + f"Backend returned {len(results)} results for {len(batch)} requests" + ) + for pending in batch: + if not pending.future.done(): + pending.future.set_exception(err) + return + + for pending, result in zip(batch, results, strict=True): if not pending.future.done(): pending.future.set_result(result)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/multimodal/server/unified_server.py` around lines 117 - 121, Validate the number of items returned from self.backend.generate against the batch size before using zip: after results = await loop.run_in_executor(None, self.backend.generate, requests) check len(results) vs len(batch) and for each index i in range(len(batch)) if i < len(results) set pending.future.set_result(results[i]) (skipping if future.done()), otherwise set pending.future.set_exception(RuntimeError(...)) describing the backend returned too few results; also handle the inverse case (len(results) > len(batch)) by logging/ignoring extra results so no futures are left unresolved; reference batch, results, self.backend.generate, loop.run_in_executor, and pending.future.set_result when making the change.tests/slurm-tests/unified_tts/run_test.py (1)
126-129:⚠️ Potential issue | 🟡 Minor
checker_cmdstill breaks on workspace paths containing spaces.Line 126 and Line 128 still use the split-prone
wrap_arguments(checker_cmd)flow, so--workspacecan be fragmented into multiple args.🔧 Proposed fix
- checker_cmd = f"python tests/slurm-tests/unified_tts/check_results.py --workspace {args.workspace}" + checker_cmd = "python tests/slurm-tests/unified_tts/check_results.py --workspace" + checker_ctx = wrap_arguments(checker_cmd) + checker_ctx.args.append(args.workspace) run_cmd( - ctx=wrap_arguments(checker_cmd), + ctx=checker_ctx, cluster=args.cluster,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/slurm-tests/unified_tts/run_test.py` around lines 126 - 129, The checker_cmd string can still be split incorrectly when workspace contains spaces because wrap_arguments(checker_cmd) tokenizes the full string; change the call to pass arguments safely by not wrapping the entire formatted string—construct ctx as a list/sequence of command and separate args so "--workspace" and args.workspace are distinct elements (or alternatively use a proper shell-quoting function like shlex.quote on args.workspace before building checker_cmd) and call run_cmd(ctx=...) with that safe argument list; update references to checker_cmd, wrap_arguments, run_cmd, and args.workspace accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@recipes/multimodal/server/backends/__init__.py`:
- Around line 70-76: The lazy loader currently catches ImportError but not
AttributeError, so a typo in the registry's class name (the getattr(module,
class_name) call) will raise AttributeError and skip your friendly message;
modify the try/except around importlib.import_module and getattr to also catch
AttributeError (e.g., except (ImportError, AttributeError) as e) and re-raise an
ImportError that includes backend_name, module_name and class_name with clear
guidance that the backend class is missing or misnamed, preserving the original
exception as the cause (using "from e") so startup errors show the custom
message for both missing modules and missing classes.
In `@recipes/multimodal/server/backends/magpie_tts_backend.py`:
- Around line 304-310: The code in the block handling
p.get("context_audio_filepath") allows client-supplied filesystem paths (in
magpie_tts_backend.py) which can access arbitrary local files; update the
handler (the code around p.get("context_audio_filepath", "") and link creation)
to only accept paths under a specific allowlist root (e.g., a CONTEXT_ROOT
constant) by resolving the candidate with realpath/Path.resolve and verifying it
is a descendant (or use Path.is_relative_to) before creating any symlink; if the
resolved path is outside the allowlist (or if the request is remote), reject or
ignore the context path and log a warning instead of creating the link. Ensure
you also handle symlink/to-parent attacks by resolving both the candidate and
the allowlist root before comparison and avoid following untrusted symlinks when
creating links.
In `@recipes/multimodal/server/unified_server.py`:
- Around line 456-460: The except block that currently does
traceback.print_exc() and raises HTTPException(status_code=500, detail=str(e))
should not return raw exception text to clients; instead log/print the full
traceback for server-side debugging and raise HTTPException(status_code=500,
detail="Internal server error") while preserving the original exception chain
(raise ... from e). Update the except block surrounding the handler where
Exception as e is caught (the block that calls traceback.print_exc()) to replace
str(e) with a generic message and use exception chaining to keep the original
exception available for logs.
In `@tests/slurm-tests/unified_asr/run_test.py`:
- Around line 109-112: The checker invocation breaks when args.workspace
contains spaces because wrap_arguments(checker_cmd) splits on whitespace; fix by
preserving the workspace as a single argument: build the ctx by passing the
command prefix through wrap_arguments for "python
tests/slurm-tests/unified_asr/check_results.py --workspace" (or simply the
script and flag) and then append args.workspace as one element, and use that ctx
in run_cmd (update the checker_cmd/ctx construction used in the run_cmd call so
that --workspace and its path are separate args but the path is a single
element).
---
Duplicate comments:
In `@recipes/multimodal/server/unified_server.py`:
- Around line 184-197: The function extract_text_from_messages should skip
messages where message.get("role") == "system" to avoid duplicating system
prompt text (which is already handled by extract_system_prompt); update the loop
in extract_text_from_messages to check message.get("role") and continue when it
equals "system", keeping the existing logic for string, list, dict items (look
for "content", list items with type=="text", and string items) so only
non-system content is collected and returned.
- Around line 117-121: Validate the number of items returned from
self.backend.generate against the batch size before using zip: after results =
await loop.run_in_executor(None, self.backend.generate, requests) check
len(results) vs len(batch) and for each index i in range(len(batch)) if i <
len(results) set pending.future.set_result(results[i]) (skipping if
future.done()), otherwise set pending.future.set_exception(RuntimeError(...))
describing the backend returned too few results; also handle the inverse case
(len(results) > len(batch)) by logging/ignoring extra results so no futures are
left unresolved; reference batch, results, self.backend.generate,
loop.run_in_executor, and pending.future.set_result when making the change.
In `@tests/slurm-tests/unified_tts/run_test.py`:
- Around line 126-129: The checker_cmd string can still be split incorrectly
when workspace contains spaces because wrap_arguments(checker_cmd) tokenizes the
full string; change the call to pass arguments safely by not wrapping the entire
formatted string—construct ctx as a list/sequence of command and separate args
so "--workspace" and args.workspace are distinct elements (or alternatively use
a proper shell-quoting function like shlex.quote on args.workspace before
building checker_cmd) and call run_cmd(ctx=...) with that safe argument list;
update references to checker_cmd, wrap_arguments, run_cmd, and args.workspace
accordingly.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
recipes/multimodal/server/backends/__init__.pyrecipes/multimodal/server/backends/magpie_tts_backend.pyrecipes/multimodal/server/unified_server.pytests/slurm-tests/unified_asr/run_test.pytests/slurm-tests/unified_tts/run_test.pytests/test_nemo_asr_backend.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/test_nemo_asr_backend.py
| try: | ||
| module = importlib.import_module(f".{module_name}", package=__name__) | ||
| return getattr(module, class_name) | ||
| except ImportError as e: | ||
| raise ImportError( | ||
| f"Failed to import backend '{backend_name}'. Make sure required dependencies are installed. Error: {e}" | ||
| ) from e |
There was a problem hiding this comment.
Handle missing backend class names explicitly in lazy loader.
If the registry has a class-name typo, getattr(...) on Line 72 raises AttributeError, which currently bypasses your custom import error message. Catching it explicitly makes startup failures much clearer.
🔧 Proposed fix
try:
module = importlib.import_module(f".{module_name}", package=__name__)
- return getattr(module, class_name)
+ try:
+ return getattr(module, class_name)
+ except AttributeError as e:
+ raise ImportError(
+ f"Backend '{backend_name}' is registered with class '{class_name}', "
+ f"but that class was not found in module '{module_name}'."
+ ) from e
except ImportError as e:
raise ImportError(
f"Failed to import backend '{backend_name}'. Make sure required dependencies are installed. Error: {e}"
) from e🧰 Tools
🪛 Ruff (0.15.2)
[warning] 74-76: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@recipes/multimodal/server/backends/__init__.py` around lines 70 - 76, The
lazy loader currently catches ImportError but not AttributeError, so a typo in
the registry's class name (the getattr(module, class_name) call) will raise
AttributeError and skip your friendly message; modify the try/except around
importlib.import_module and getattr to also catch AttributeError (e.g., except
(ImportError, AttributeError) as e) and re-raise an ImportError that includes
backend_name, module_name and class_name with clear guidance that the backend
class is missing or misnamed, preserving the original exception as the cause
(using "from e") so startup errors show the custom message for both missing
modules and missing classes.
| checker_cmd = f"python tests/slurm-tests/unified_asr/check_results.py --workspace {args.workspace}" | ||
| run_cmd( | ||
| ctx=wrap_arguments(checker_cmd), | ||
| cluster=args.cluster, |
There was a problem hiding this comment.
Preserve --workspace as a single argument when building checker context.
Line 109 + Line 111 currently rely on wrap_arguments(checker_cmd), but wrap_arguments does plain whitespace splitting. A workspace path with spaces will break the checker invocation.
🔧 Proposed fix
- checker_cmd = f"python tests/slurm-tests/unified_asr/check_results.py --workspace {args.workspace}"
+ checker_cmd = "python tests/slurm-tests/unified_asr/check_results.py --workspace"
+ checker_ctx = wrap_arguments(checker_cmd)
+ checker_ctx.args.append(args.workspace)
run_cmd(
- ctx=wrap_arguments(checker_cmd),
+ ctx=checker_ctx,
cluster=args.cluster,
expname=f"{args.expname_prefix}-check-results",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/slurm-tests/unified_asr/run_test.py` around lines 109 - 112, The
checker invocation breaks when args.workspace contains spaces because
wrap_arguments(checker_cmd) splits on whitespace; fix by preserving the
workspace as a single argument: build the ctx by passing the command prefix
through wrap_arguments for "python
tests/slurm-tests/unified_asr/check_results.py --workspace" (or simply the
script and flag) and then append args.workspace as one element, and use that ctx
in run_cmd (update the checker_cmd/ctx construction used in the run_cmd call so
that --workspace and its path are separate args but the path is a single
element).
05f1a0f to
754aedc
Compare
|
Will look tmr |
|
Comment about temperature seem legit? |
Which one? |
Signed-off-by: Valentin Mendelev <vmendelev@nvidia.com>
Signed-off-by: Valentin Mendelev <vmendelev@nvidia.com>
Signed-off-by: Valentin Mendelev <vmendelev@nvidia.com>
Signed-off-by: Valentin Mendelev <vmendelev@nvidia.com>
6e7b703 to
ac7771f
Compare
Signed-off-by: Valentin Mendelev <vmendelev@nvidia.com>
Signed-off-by: Valentin Mendelev <vmendelev@nvidia.com>
|
@coderabbitai help |
ChatThere are 3 ways to chat with CodeRabbit:
CodeRabbit commands
Other keywords and placeholders
Status, support, documentation and community
|
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (8)
recipes/multimodal/server/backends/__init__.py (1)
70-76:⚠️ Potential issue | 🟡 MinorHandle
AttributeErrorfromgetattr()to provide clearer error messages.If the registry has a class-name typo,
getattr(module, class_name)raisesAttributeError, which bypasses the custom import error message. This was flagged in a prior review and remains unaddressed.🔧 Proposed fix
try: module = importlib.import_module(f".{module_name}", package=__name__) - return getattr(module, class_name) + try: + return getattr(module, class_name) + except AttributeError as e: + raise ImportError( + f"Backend '{backend_name}' registered class '{class_name}' not found in '{module_name}'" + ) from e except ImportError as e: raise ImportError( f"Failed to import backend '{backend_name}'. Make sure required dependencies are installed. Error: {e}" ) from e🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/multimodal/server/backends/__init__.py` around lines 70 - 76, The current import block catches ImportError but not AttributeError from getattr(module, class_name), so typos in class names bypass your friendly message; update the try/except around importlib.import_module and getattr (the code that returns getattr(module, class_name)) to also catch AttributeError and re-raise an ImportError that includes backend_name and class_name and the original error to provide a clear message about a missing class in the imported module.nemo_skills/inference/server/serve_unified.py (3)
119-124:⚠️ Potential issue | 🟡 Minor
yaml.safe_loadreturnsNonefor empty files, causing downstream crash.If the YAML file is empty,
safe_loadreturnsNone, which will causeconfig_dict.pop(...)at line 244 to raiseAttributeError. This was flagged in prior review.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nemo_skills/inference/server/serve_unified.py` around lines 119 - 124, The YAML loader can return None for empty files causing downstream calls like config_dict.pop(...) to fail; update load_yaml_config to ensure it never returns None by checking the result of yaml.safe_load and returning an empty dict when safe_load yields None (i.e., in function load_yaml_config, after calling yaml.safe_load, coerce a None to {} and return that instead).
282-306:⚠️ Potential issue | 🟡 MinorBroad
except Exceptionblock remains from prior review.The
ImportErrorcatch with helpful guidance is appropriate, but the genericExceptioncatch at line 301 violates the guideline "Don't catch exceptions when they are not expected to be normally raised." Python's default handler provides the same traceback. This was flagged in prior review.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nemo_skills/inference/server/serve_unified.py` around lines 282 - 306, The broad "except Exception as e" block in serve_unified.py (surrounding the uvicorn.run and create_app invocation) should be removed so unexpected errors are not swallowed; keep the specific ImportError handler that logs guidance, but delete the generic exception handler (the block that prints the error, imports traceback, calls traceback.print_exc(), and exits) so Python's default exception handling provides the traceback and proper failure behavior for create_app/uvicorn.run.
103-116:⚠️ Potential issue | 🟡 MinorBroad exception handling may hide patch failures.
Per coding guidelines, exceptions should not be silently suppressed. If the file copy fails (e.g., permission error), the server continues but model loading may fail later with an unrelated error. This was flagged in prior review.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nemo_skills/inference/server/serve_unified.py` around lines 103 - 116, The function apply_safetensors_patch currently catches Exception broadly and only prints a warning, which can hide real failures; update apply_safetensors_patch to catch specific exceptions (e.g., ImportError, FileNotFoundError, PermissionError, OSError) or re-raise after logging, include the full exception traceback and context (hack_path, dest_path, and the import target st_torch) in the log, and ensure critical failures either raise a clear exception or return a distinct error result so callers can handle the failure instead of proceeding silently.tests/slurm-tests/unified_asr/run_test.py (1)
109-118:⚠️ Potential issue | 🟡 MinorWorkspace paths with spaces will break the checker invocation.
wrap_arguments(checker_cmd)splits on whitespace, so a workspace path like/path/with spaces/becomes multiple arguments. This was flagged in prior review.🔧 Proposed fix
- checker_cmd = f"python tests/slurm-tests/unified_asr/check_results.py --workspace {args.workspace}" + checker_ctx = wrap_arguments("python tests/slurm-tests/unified_asr/check_results.py --workspace") + checker_ctx.args.append(args.workspace) run_cmd( - ctx=wrap_arguments(checker_cmd), + ctx=checker_ctx, cluster=args.cluster,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/slurm-tests/unified_asr/run_test.py` around lines 109 - 118, Escape/quote the workspace before splitting so spaces don't produce multiple args: import shlex and build checker_cmd using shlex.quote(args.workspace) (e.g. checker_cmd = f"python ... --workspace {shlex.quote(args.workspace)}"), then continue to pass that checker_cmd into wrap_arguments and run_cmd; this ensures wrap_arguments(checker_cmd) yields a single argument token for paths with spaces.recipes/multimodal/server/unified_server.py (2)
332-334:⚠️ Potential issue | 🟡 MinorRequire
messagesexplicitly instead of defaulting to[].At Line 332, defaulting a required field to empty list can hide malformed requests and blur error semantics.
🔧 Proposed fix
- messages = request.get("messages", []) - if not messages: - raise HTTPException(status_code=400, detail="No messages provided") + if "messages" not in request: + raise HTTPException(status_code=400, detail="Missing required field: messages") + messages = request["messages"] + if not messages: + raise HTTPException(status_code=400, detail="No messages provided")Based on learnings: "Applies to **/*.py : Don't use
.get()for accessing dictionary keys if the code expects them to be present; use direct accessdata[key_name]to fail with a clear error instead of silently corrupting data."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/multimodal/server/unified_server.py` around lines 332 - 334, Replace the permissive access request.get("messages", []) with a required lookup so malformed requests fail fast: use messages = request["messages"] (or request.pop("messages") if you need to consume it) and retain the existing validation that raises HTTPException(status_code=400, detail="No messages provided") when messages is empty or falsy; this ensures clear KeyError behavior for missing keys and preserves the current check for empty messages in the handler that reads request/messages.
292-293:⚠️ Potential issue | 🟡 MinorUse direct indexing for fixed
server_configkeys.
server_configis created with these keys, so.get()at Line 292, Line 293, Line 312, and Line 459 masks bugs.🔧 Proposed fix
- "backend": server_config.get("backend_type"), - "model": server_config.get("model_path"), + "backend": server_config["backend_type"], + "model": server_config["model_path"], @@ - model_id = server_config.get("model_path", "unknown") + model_id = server_config["model_path"] @@ - "model": server_config.get("model_path"), + "model": server_config["model_path"],Based on learnings: "Applies to **/*.py : Don't use
.get()for accessing dictionary keys if the code expects them to be present; use direct accessdata[key_name]to fail with a clear error instead of silently corrupting data."Also applies to: 312-312, 459-459
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/multimodal/server/unified_server.py` around lines 292 - 293, server_config is constructed with fixed keys, so replace all server_config.get(...) uses with direct indexing to surface missing-key errors: change server_config.get("backend_type") to server_config["backend_type"], server_config.get("model_path") to server_config["model_path"], and likewise update the other occurrences noted (the server_config.get(...) at the other two locations) to server_config["<key_name>"] so the code fails fast with a clear KeyError instead of silently returning None.tests/slurm-tests/unified_tts/run_test.py (1)
126-129:⚠️ Potential issue | 🟡 MinorPreserve
--workspaceas a single argument for checker invocation.At Line 126 and Line 128,
checker_cmdis still passed throughwrap_arguments(...), which uses whitespace splitting and breaks paths with spaces. This is still unresolved.🔧 Proposed fix
import argparse +import shlex from pathlib import Path @@ - checker_cmd = f"python tests/slurm-tests/unified_tts/check_results.py --workspace {args.workspace}" + checker_cmd = ( + "python tests/slurm-tests/unified_tts/check_results.py " + f"--workspace {shlex.quote(args.workspace)}" + ) run_cmd( - ctx=wrap_arguments(checker_cmd), + ctx=wrap_arguments(""), + command=checker_cmd, cluster=args.cluster, expname=f"{args.expname_prefix}-check-results",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/slurm-tests/unified_tts/run_test.py` around lines 126 - 129, The checker command is being split by wrap_arguments causing paths with spaces to break; update the run invocation to preserve the entire --workspace argument by not using wrap_arguments for checker_cmd and instead pass a single string or an explicit list where the "--workspace" and its value are one element; locate checker_cmd and the run_cmd call (references: checker_cmd, wrap_arguments, run_cmd) and replace ctx=wrap_arguments(checker_cmd) with a call that supplies the un-split command (e.g., ctx=checker_cmd or ctx=[...explicit args...] so "--workspace" and args.workspace remain a single argument).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@recipes/multimodal/server/backends/magpie_tts_backend.py`:
- Around line 248-259: The code silently falls back to .nemo mode when only one
of self.tts_config.hparams_file or self.tts_config.checkpoint_file is provided;
change the logic around has_ckpt_mode to validate these fields and fail fast: if
exactly one of hparams_file or checkpoint_file is set, raise a ValueError (or
similar) describing the missing counterpart rather than proceeding, otherwise
continue to build _ModelLoadConfig as before; update the block that computes
has_ckpt_mode and constructs _ModelLoadConfig to enforce this check so
user-supplied partial checkpoint arguments are not ignored.
In `@recipes/multimodal/server/backends/nemo_asr_backend.py`:
- Around line 82-84: The current __init__ in nemo_asr_backend.py discards
top-level BackendConfig fields when config is not a NeMoASRConfig by only using
config.extra_config; update initialization to merge/propagate base fields (e.g.,
model_path, device, any generation defaults) into the NeMoASRConfig constructed
from config.extra_config or validate/raise if unsupported: call
NeMoASRConfig.from_dict(config.extra_config) to get a baseline, then override
its attributes with explicit values from the original BackendConfig
(config.model_path, config.device, etc.) before assigning self.asr_config, or
alternatively validate and error if those base fields are present but cannot be
applied.
In `@recipes/multimodal/server/unified_server.py`:
- Around line 168-183: The try/except around decoding audio blocks swallows all
exceptions causing silent drops of client audio; update the block that handles
audio_url/input_audio (variables: data_uri_pattern, audio_list, input_audio,
audio_url, data) to stop catching Exception broadly—either remove the broad
try/except or catch only expected decode errors (e.g., binascii.Error or
ValueError), and when such an error occurs log it unconditionally (use the
logger instead of print/DEBUG) and re-raise or propagate an error so the failure
is visible to callers instead of being ignored. Ensure the change targets the
decoding section that appends to audio_list so malformed base64 is not silently
dropped.
In `@tests/slurm-tests/unified_tts/check_results.py`:
- Around line 38-43: resolve_audio_path currently uses path.name which strips
any subdirectory structure and can cause collisions; update resolve_audio_path
to append the full relative Path (not path.name) under Path(workspace) /
"tts_outputs" / "audio" so nested relative paths are preserved (i.e., when
path.is_absolute() is False, return Path(workspace) / "tts_outputs" / "audio" /
path), and optionally call path = path.relative_to(path.anchor) or .resolve()
only for absolute cases to avoid path traversal surprises in resolve_audio_path.
- Around line 20-21: The current sys.path.append + from utils import assert_all,
soft_assert can pick up an unrelated installed utils; replace this with a
deterministic local import: either prepend the test directory using
sys.path.insert(0, str(Path(__file__).resolve().parent.parent)) before the
import, or (preferable) load the local utils.py explicitly with
importlib.util.spec_from_file_location / importlib.util.module_from_spec and
bind assert_all and soft_assert from that loaded module; update the code around
sys.path.append and from utils import ... to use one of these deterministic
approaches so the local utils.py is always imported.
---
Duplicate comments:
In `@nemo_skills/inference/server/serve_unified.py`:
- Around line 119-124: The YAML loader can return None for empty files causing
downstream calls like config_dict.pop(...) to fail; update load_yaml_config to
ensure it never returns None by checking the result of yaml.safe_load and
returning an empty dict when safe_load yields None (i.e., in function
load_yaml_config, after calling yaml.safe_load, coerce a None to {} and return
that instead).
- Around line 282-306: The broad "except Exception as e" block in
serve_unified.py (surrounding the uvicorn.run and create_app invocation) should
be removed so unexpected errors are not swallowed; keep the specific ImportError
handler that logs guidance, but delete the generic exception handler (the block
that prints the error, imports traceback, calls traceback.print_exc(), and
exits) so Python's default exception handling provides the traceback and proper
failure behavior for create_app/uvicorn.run.
- Around line 103-116: The function apply_safetensors_patch currently catches
Exception broadly and only prints a warning, which can hide real failures;
update apply_safetensors_patch to catch specific exceptions (e.g., ImportError,
FileNotFoundError, PermissionError, OSError) or re-raise after logging, include
the full exception traceback and context (hack_path, dest_path, and the import
target st_torch) in the log, and ensure critical failures either raise a clear
exception or return a distinct error result so callers can handle the failure
instead of proceeding silently.
In `@recipes/multimodal/server/backends/__init__.py`:
- Around line 70-76: The current import block catches ImportError but not
AttributeError from getattr(module, class_name), so typos in class names bypass
your friendly message; update the try/except around importlib.import_module and
getattr (the code that returns getattr(module, class_name)) to also catch
AttributeError and re-raise an ImportError that includes backend_name and
class_name and the original error to provide a clear message about a missing
class in the imported module.
In `@recipes/multimodal/server/unified_server.py`:
- Around line 332-334: Replace the permissive access request.get("messages", [])
with a required lookup so malformed requests fail fast: use messages =
request["messages"] (or request.pop("messages") if you need to consume it) and
retain the existing validation that raises HTTPException(status_code=400,
detail="No messages provided") when messages is empty or falsy; this ensures
clear KeyError behavior for missing keys and preserves the current check for
empty messages in the handler that reads request/messages.
- Around line 292-293: server_config is constructed with fixed keys, so replace
all server_config.get(...) uses with direct indexing to surface missing-key
errors: change server_config.get("backend_type") to
server_config["backend_type"], server_config.get("model_path") to
server_config["model_path"], and likewise update the other occurrences noted
(the server_config.get(...) at the other two locations) to
server_config["<key_name>"] so the code fails fast with a clear KeyError instead
of silently returning None.
In `@tests/slurm-tests/unified_asr/run_test.py`:
- Around line 109-118: Escape/quote the workspace before splitting so spaces
don't produce multiple args: import shlex and build checker_cmd using
shlex.quote(args.workspace) (e.g. checker_cmd = f"python ... --workspace
{shlex.quote(args.workspace)}"), then continue to pass that checker_cmd into
wrap_arguments and run_cmd; this ensures wrap_arguments(checker_cmd) yields a
single argument token for paths with spaces.
In `@tests/slurm-tests/unified_tts/run_test.py`:
- Around line 126-129: The checker command is being split by wrap_arguments
causing paths with spaces to break; update the run invocation to preserve the
entire --workspace argument by not using wrap_arguments for checker_cmd and
instead pass a single string or an explicit list where the "--workspace" and its
value are one element; locate checker_cmd and the run_cmd call (references:
checker_cmd, wrap_arguments, run_cmd) and replace
ctx=wrap_arguments(checker_cmd) with a call that supplies the un-split command
(e.g., ctx=checker_cmd or ctx=[...explicit args...] so "--workspace" and
args.workspace remain a single argument).
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (22)
nemo_skills/inference/server/serve_unified.pyrecipes/multimodal/__init__.pyrecipes/multimodal/server/README.mdrecipes/multimodal/server/__init__.pyrecipes/multimodal/server/backends/__init__.pyrecipes/multimodal/server/backends/base.pyrecipes/multimodal/server/backends/magpie_tts_backend.pyrecipes/multimodal/server/backends/nemo_asr_backend.pyrecipes/multimodal/server/unified_server.pytests/slurm-tests/run_all.shtests/slurm-tests/unified_asr/asr_openai.testtests/slurm-tests/unified_asr/check_results.pytests/slurm-tests/unified_asr/run_test.pytests/slurm-tests/unified_tts/README.mdtests/slurm-tests/unified_tts/check_results.pytests/slurm-tests/unified_tts/run_test.pytests/slurm-tests/unified_tts/tts_openai.testtests/test_magpie_tts_backend.pytests/test_nemo_asr_backend.pytests/test_unified_server_audio_parser.pytests/test_unified_server_batcher.pytests/test_unified_server_error_handling.py
| def __init__(self, config: BackendConfig): | ||
| self.asr_config = config if isinstance(config, NeMoASRConfig) else NeMoASRConfig.from_dict(config.extra_config) | ||
| super().__init__(self.asr_config) |
There was a problem hiding this comment.
Don’t drop base BackendConfig fields in fallback config conversion.
At Line 83, converting only config.extra_config can ignore explicit model_path, device, and generation defaults when config is not already NeMoASRConfig.
🔧 Proposed fix
def __init__(self, config: BackendConfig):
- self.asr_config = config if isinstance(config, NeMoASRConfig) else NeMoASRConfig.from_dict(config.extra_config)
+ if isinstance(config, NeMoASRConfig):
+ self.asr_config = config
+ else:
+ self.asr_config = NeMoASRConfig.from_dict(
+ {
+ "model_path": config.model_path,
+ "device": config.device,
+ "dtype": config.dtype,
+ "max_new_tokens": config.max_new_tokens,
+ "temperature": config.temperature,
+ "top_p": config.top_p,
+ "top_k": config.top_k,
+ **config.extra_config,
+ }
+ )
super().__init__(self.asr_config)As per coding guidelines: "Avoid cases where user-passed parameters are unused; code should fail if user specifies an unsupported argument or if a required argument is missing."
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def __init__(self, config: BackendConfig): | |
| self.asr_config = config if isinstance(config, NeMoASRConfig) else NeMoASRConfig.from_dict(config.extra_config) | |
| super().__init__(self.asr_config) | |
| def __init__(self, config: BackendConfig): | |
| if isinstance(config, NeMoASRConfig): | |
| self.asr_config = config | |
| else: | |
| self.asr_config = NeMoASRConfig.from_dict( | |
| { | |
| "model_path": config.model_path, | |
| "device": config.device, | |
| "dtype": config.dtype, | |
| "max_new_tokens": config.max_new_tokens, | |
| "temperature": config.temperature, | |
| "top_p": config.top_p, | |
| "top_k": config.top_k, | |
| **config.extra_config, | |
| } | |
| ) | |
| super().__init__(self.asr_config) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@recipes/multimodal/server/backends/nemo_asr_backend.py` around lines 82 - 84,
The current __init__ in nemo_asr_backend.py discards top-level BackendConfig
fields when config is not a NeMoASRConfig by only using config.extra_config;
update initialization to merge/propagate base fields (e.g., model_path, device,
any generation defaults) into the NeMoASRConfig constructed from
config.extra_config or validate/raise if unsupported: call
NeMoASRConfig.from_dict(config.extra_config) to get a baseline, then override
its attributes with explicit values from the original BackendConfig
(config.model_path, config.device, etc.) before assigning self.asr_config, or
alternatively validate and error if those base fields are present but cannot be
applied.
| try: | ||
| if item.get("type") == "audio_url": | ||
| audio_url = item.get("audio_url", {}) | ||
| url = audio_url.get("url", "") | ||
| match = data_uri_pattern.match(url) | ||
| if match: | ||
| audio_list.append(base64.b64decode(match.group(1))) | ||
| elif item.get("type") == "input_audio": | ||
| input_audio = item.get("input_audio", {}) | ||
| data = input_audio.get("data", "") | ||
| if data: | ||
| audio_list.append(base64.b64decode(data)) | ||
| except Exception as e: | ||
| if DEBUG: | ||
| print(f"[Server] Warning: Failed to decode audio block: {e}") | ||
| return audio_list |
There was a problem hiding this comment.
Don’t silently ignore malformed audio blocks.
At Line 180-183, decode failures are swallowed and processing continues. This can silently drop client audio and produce misleading downstream behavior.
🔧 Proposed fix
+import binascii
@@
- try:
- if item.get("type") == "audio_url":
- audio_url = item.get("audio_url", {})
- url = audio_url.get("url", "")
- match = data_uri_pattern.match(url)
- if match:
- audio_list.append(base64.b64decode(match.group(1)))
- elif item.get("type") == "input_audio":
- input_audio = item.get("input_audio", {})
- data = input_audio.get("data", "")
- if data:
- audio_list.append(base64.b64decode(data))
- except Exception as e:
- if DEBUG:
- print(f"[Server] Warning: Failed to decode audio block: {e}")
+ if item.get("type") == "audio_url":
+ audio_url = item.get("audio_url", {})
+ url = audio_url.get("url", "")
+ match = data_uri_pattern.match(url)
+ if not match:
+ raise ValueError("Invalid audio_url data URI format")
+ try:
+ audio_list.append(base64.b64decode(match.group(1), validate=True))
+ except binascii.Error as e:
+ raise ValueError("Invalid base64 payload in audio_url block") from e
+ elif item.get("type") == "input_audio":
+ input_audio = item.get("input_audio", {})
+ data = input_audio.get("data", "")
+ if data:
+ try:
+ audio_list.append(base64.b64decode(data, validate=True))
+ except binascii.Error as e:
+ raise ValueError("Invalid base64 payload in input_audio block") from eAs per coding guidelines: "Don't catch exceptions when they are not expected to be normally raised; let the code fail when something unexpected happens so users notice it instead of silently misbehaving."
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| if item.get("type") == "audio_url": | |
| audio_url = item.get("audio_url", {}) | |
| url = audio_url.get("url", "") | |
| match = data_uri_pattern.match(url) | |
| if match: | |
| audio_list.append(base64.b64decode(match.group(1))) | |
| elif item.get("type") == "input_audio": | |
| input_audio = item.get("input_audio", {}) | |
| data = input_audio.get("data", "") | |
| if data: | |
| audio_list.append(base64.b64decode(data)) | |
| except Exception as e: | |
| if DEBUG: | |
| print(f"[Server] Warning: Failed to decode audio block: {e}") | |
| return audio_list | |
| if item.get("type") == "audio_url": | |
| audio_url = item.get("audio_url", {}) | |
| url = audio_url.get("url", "") | |
| match = data_uri_pattern.match(url) | |
| if not match: | |
| raise ValueError("Invalid audio_url data URI format") | |
| try: | |
| audio_list.append(base64.b64decode(match.group(1), validate=True)) | |
| except binascii.Error as e: | |
| raise ValueError("Invalid base64 payload in audio_url block") from e | |
| elif item.get("type") == "input_audio": | |
| input_audio = item.get("input_audio", {}) | |
| data = input_audio.get("data", "") | |
| if data: | |
| try: | |
| audio_list.append(base64.b64decode(data, validate=True)) | |
| except binascii.Error as e: | |
| raise ValueError("Invalid base64 payload in input_audio block") from e |
🧰 Tools
🪛 Ruff (0.15.2)
[warning] 180-180: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@recipes/multimodal/server/unified_server.py` around lines 168 - 183, The
try/except around decoding audio blocks swallows all exceptions causing silent
drops of client audio; update the block that handles audio_url/input_audio
(variables: data_uri_pattern, audio_list, input_audio, audio_url, data) to stop
catching Exception broadly—either remove the broad try/except or catch only
expected decode errors (e.g., binascii.Error or ValueError), and when such an
error occurs log it unconditionally (use the logger instead of print/DEBUG) and
re-raise or propagate an error so the failure is visible to callers instead of
being ignored. Ensure the change targets the decoding section that appends to
audio_list so malformed base64 is not silently dropped.
| sys.path.append(str(Path(__file__).resolve().parent.parent)) # for utils.py | ||
| from utils import assert_all, soft_assert # noqa: E402 |
There was a problem hiding this comment.
Import utils deterministically from the local Slurm test directory.
Using sys.path.append(...) can let from utils import ... resolve to an unrelated installed utils module before the intended local helper.
🔧 Proposed fix
-sys.path.append(str(Path(__file__).resolve().parent.parent)) # for utils.py
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent)) # prioritize local slurm-tests utils.py
from utils import assert_all, soft_assert # noqa: E402🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/slurm-tests/unified_tts/check_results.py` around lines 20 - 21, The
current sys.path.append + from utils import assert_all, soft_assert can pick up
an unrelated installed utils; replace this with a deterministic local import:
either prepend the test directory using sys.path.insert(0,
str(Path(__file__).resolve().parent.parent)) before the import, or (preferable)
load the local utils.py explicitly with importlib.util.spec_from_file_location /
importlib.util.module_from_spec and bind assert_all and soft_assert from that
loaded module; update the code around sys.path.append and from utils import ...
to use one of these deterministic approaches so the local utils.py is always
imported.
| def resolve_audio_path(audio_path: str, workspace: str) -> Path: | ||
| path = Path(audio_path) | ||
| if path.is_absolute(): | ||
| return path | ||
| return Path(workspace) / "tts_outputs" / "audio" / path.name | ||
|
|
There was a problem hiding this comment.
Preserve relative path structure when resolving audio artifacts.
path.name drops subdirectories. If outputs include nested relative paths (or duplicate basenames), this can validate the wrong file.
🔧 Proposed fix
def resolve_audio_path(audio_path: str, workspace: str) -> Path:
path = Path(audio_path)
if path.is_absolute():
return path
- return Path(workspace) / "tts_outputs" / "audio" / path.name
+ workspace_path = Path(workspace)
+ candidates = (
+ workspace_path / path,
+ workspace_path / "tts_outputs" / path,
+ workspace_path / "tts_outputs" / "audio" / path,
+ )
+ for candidate in candidates:
+ if candidate.exists():
+ return candidate
+ return workspace_path / "tts_outputs" / "audio" / path🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/slurm-tests/unified_tts/check_results.py` around lines 38 - 43,
resolve_audio_path currently uses path.name which strips any subdirectory
structure and can cause collisions; update resolve_audio_path to append the full
relative Path (not path.name) under Path(workspace) / "tts_outputs" / "audio" so
nested relative paths are preserved (i.e., when path.is_absolute() is False,
return Path(workspace) / "tts_outputs" / "audio" / path), and optionally call
path = path.relative_to(path.anchor) or .resolve() only for absolute cases to
avoid path traversal surprises in resolve_audio_path.
| # sleep 10 | ||
| python tests/slurm-tests/unified_asr/run_test.py --cluster $CLUSTER --workspace /workspace/nemo-skills-slurm-ci/$RUN_NAME/unified_asr --expname_prefix unified_asr_$RUN_NAME | ||
| # sleep 10 | ||
| python tests/slurm-tests/unified_tts/run_test.py --cluster $CLUSTER --workspace /workspace/nemo-skills-slurm-ci/$RUN_NAME/unified_tts --expname_prefix unified_tts_$RUN_NAME |
There was a problem hiding this comment.
--code_path?
For now, pass a NeMo code tree explicitly via `--code_path`. When a newer NeMo image includes the required Magpie TTS modules, this manual `--code_path` override should no longer be necessary.
| inference_ctor_params = inspect.signature(_InferenceConfig).parameters | ||
| inference_kwargs = {} | ||
| if "batch_size" in inference_ctor_params: | ||
| inference_kwargs["batch_size"] = 16 |
|
I still see some tricky places, but it looks aligned with "fail fast" approach and all probably comes from user's mistakes overall LGTM |
…ackends (#1250) Signed-off-by: Valentin Mendelev <vmendelev@nvidia.com>
commit a5da597 Author: Igor Gitman <igitman@nvidia.com> Date: Fri Mar 6 12:13:36 2026 -0800 Revert "Eval kit support (#1239)" (#1294) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit b237e33 Author: George <37293288+Jorjeous@users.noreply.github.com> Date: Fri Mar 6 20:25:37 2026 +0400 Eval kit support (#1239) Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> commit dc28bbf Author: George Armstrong <georgea@nvidia.com> Date: Thu Mar 5 10:17:44 2026 -0800 Python direct tool calling without MCP (#1286) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 12454dd Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Wed Mar 4 13:06:21 2026 -0800 Allow het servers for nemo-rl jobs (#1223) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit 8884a68 Author: Prasoon Varshney <prasoon1995@gmail.com> Date: Wed Mar 4 10:24:02 2026 -0800 Support source_lang param for translation recipe (#1290) Signed-off-by: Prasoon Varshney <prasoonv@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 4618b19 Author: Meriem B. <113170426+ka00ri@users.noreply.github.com> Date: Wed Mar 4 18:59:28 2026 +0100 Add MMLU-Pro 10% optimized subset for checkpoint selection (#1285) Signed-off-by: Meriem Boubdir <mboubdir@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 5ac8609 Author: Talor Abramovich <talor19@gmail.com> Date: Wed Mar 4 02:30:06 2026 +0200 Add SPEED-Bench (within repo) (#1279) Signed-off-by: Talor Abramovich <talora@nvidia.com> Signed-off-by: talora <talora@nvidia.com> Signed-off-by: Talor Abramovich <talor19@gmail.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com> commit c31eec5 Author: George Armstrong <georgea@nvidia.com> Date: Tue Mar 3 12:18:15 2026 -0800 Fix os.getlogin() crash in ns setup (#1289) Signed-off-by: George Armstrong <georgea@nvidia.com> commit c228e66 Author: George Armstrong <georgea@nvidia.com> Date: Tue Mar 3 11:04:54 2026 -0800 Fix streaming TypeError when delta.content is None (#1267) (#1288) Signed-off-by: George Armstrong <georgea@nvidia.com> commit aa47923 Author: Matvei Novikov <mnovikov@nvidia.com> Date: Mon Mar 2 16:28:41 2026 -0800 Add LibTrace recipe for generating domain-specific reasoning data (#1224) Signed-off-by: jubick1337 <mnovikov@nvidia.com> Signed-off-by: mnovikov <mnovikov@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 313cad7 Author: Stephen Ge <stepheng@nvidia.com> Date: Mon Mar 2 18:28:49 2026 -0500 fix: clean parse-failure retries in prover (#1284) Signed-off-by: Stephen Ge <stepheng@nvidia.com> commit 813cfa3 Author: George Armstrong <georgea@nvidia.com> Date: Mon Mar 2 15:10:08 2026 -0800 tst: rollback inference-api to integrate (#1287) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 31735f9 Author: Valentin Mendelev <vmendelev@nvidia.com> Date: Mon Mar 2 23:11:25 2026 +0100 Add backend-agnostic unified inference server with NeMo ASR and TTS backends (#1250) Signed-off-by: Valentin Mendelev <vmendelev@nvidia.com> commit d4ef8c0 Author: George <37293288+Jorjeous@users.noreply.github.com> Date: Fri Feb 27 23:58:54 2026 +0400 Update promt_config to working with openai format + inline setup (#1210) Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit e879cbc Author: George Armstrong <georgea@nvidia.com> Date: Fri Feb 27 10:41:23 2026 -0800 Update noc tutorial (#1282) Signed-off-by: George Armstrong <georgea@nvidia.com> commit f6e3505 Author: George Armstrong <georgea@nvidia.com> Date: Fri Feb 27 10:17:33 2026 -0800 Add noc reasoning tutorial (#1278) Signed-off-by: Amparo Canaveras <acanaveras@nvidia.com> Signed-off-by: rajeshwarid179 <rdevaramani@nvidia.com> Signed-off-by: acanaveras <142839082+acanaveras@users.noreply.github.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Amparo Canaveras <acanaveras@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: acanaveras <142839082+acanaveras@users.noreply.github.com> Co-authored-by: rajeshwarid179 <rdevaramani@nvidia.com> commit fc2072a Author: Jiacheng Xu <jcxu@utexas.edu> Date: Fri Feb 27 10:10:25 2026 -0800 CritPt generation add prompt_format=None (#1280) Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit c8abe5d Author: Igor Gitman <igitman@nvidia.com> Date: Fri Feb 27 09:31:26 2026 -0800 New slurm customization parameters (account, containers) (#1209) Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 2b38cce Author: George Armstrong <georgea@nvidia.com> Date: Wed Feb 25 17:59:52 2026 -0800 Add nemo-skills-core subpackage for lightweight installs (#1229) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 9fa8e83 Author: Dheeraj Peri <peri.dheeraj@gmail.com> Date: Wed Feb 25 12:56:35 2026 -0800 feat: add custom judge type support for external repo integration (#1274) Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: bzantium <ryumin93@gmail.com> Signed-off-by: Dheeraj Peri <dperi@nvidia.com> Signed-off-by: suriya <sgunasekar@nvidia.com> Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Minho Ryu <ryumin93@gmail.com> Co-authored-by: Yongqiang Wang <yongqiang.seagull@gmail.com> Co-authored-by: Suriya Gunasekar <sgunasekar@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Jiacheng Xu <jcxu@utexas.edu> Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com> commit 8a32b13 Author: Igor Gitman <igitman@nvidia.com> Date: Tue Feb 24 15:24:42 2026 -0800 Exclude numb3rs form test_eval.py (#1275) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 6da2219 Author: George <37293288+Jorjeous@users.noreply.github.com> Date: Mon Feb 23 18:37:46 2026 +0400 Numb3rs ds addition (#1174) Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> commit ad034b5 Author: Suriya Gunasekar <sgunasekar@users.noreply.github.com> Date: Sun Feb 22 11:55:24 2026 -0800 Add DSBench-DA evaluation (#1254) Squash merge of changes during code-review. Signed-off-by: suriya <sgunasekar@nvidia.com> commit 7593ab3 Author: Jiacheng Xu <jcxu@utexas.edu> Date: Fri Feb 20 16:42:01 2026 -0800 Add CritPt benchmark (#1200) Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit 58c31b2 Author: Suriya Gunasekar <sgunasekar@users.noreply.github.com> Date: Fri Feb 20 16:19:22 2026 -0800 Fix no_answer metric overcounting in _compute_pass_at_k (#1245) Signed-off-by: suriya <sgunasekar@nvidia.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit 1f1a2e7 Author: Igor Gitman <igitman@nvidia.com> Date: Fri Feb 20 15:58:40 2026 -0800 Fix incorrect prompt tokens count due to HF api update (#1264) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 8ebc6f5 Author: Igor Gitman <igitman@nvidia.com> Date: Fri Feb 20 09:05:33 2026 -0800 Remove deprecated dataset group (#1263) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit ea4177f Author: Yongqiang Wang <yongqiang.seagull@gmail.com> Date: Thu Feb 19 19:57:25 2026 -0500 fix deps (#1258) commit 60905a7 Author: Minho Ryu <ryumin93@gmail.com> Date: Fri Feb 20 09:39:39 2026 +0900 Add aime26 (#1256) Signed-off-by: bzantium <ryumin93@gmail.com> commit b28afc5 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 19 16:18:25 2026 -0800 Rename custom -> external benchmarks (#1262) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 6cc9c45 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 19 16:10:33 2026 -0800 Add reference to internal benchmarks repo (#1261) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 5202af6 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 19 16:08:05 2026 -0800 Remove incorrect presence-penalty setting (#1259) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 144c70b Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 19 15:26:33 2026 -0800 Adding an option to store benchmarks in external repo (#1240) Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 10e6e39 Author: George <37293288+Jorjeous@users.noreply.github.com> Date: Thu Feb 19 19:57:21 2026 +0400 update vllm miltimodal for api calls convenience (#1213) Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com> Co-authored-by: mmkrtchyan <mmkrtchyan@nvidia.com> commit 1ba4219 Author: Nick Ludwig <nliudvig@nvidia.com> Date: Wed Feb 18 03:28:23 2026 +0400 Fix --server_container not being applied to dependent jobs (#1244) Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit 9517614 Author: Wasi Ahmad <wasiahmad@ucla.edu> Date: Mon Feb 16 11:13:24 2026 -0800 Support mini-swe-agent as agent harness (#1212) Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: i-vainn <imoshkov@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: Charlie Truong <chtruong@nvidia.com> Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> Signed-off-by: Grigor Nalbandyan <gnalbandyan@nvidia.com> Signed-off-by: bzantium <ryumin93@gmail.com> Signed-off-by: Stephen Ge <stepheng@nvidia.com> Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: Mateusz Winiarek <mwiniarek@nvidia.com> Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com> Signed-off-by: Wei Du <wedu@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Signed-off-by: Arkadiusz Nowaczynski <anowaczynski@nvidia.com> Signed-off-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> Co-authored-by: Ivan <imoshkov@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Charlie Truong <chtruong@nvidia.com> Co-authored-by: Nick Ludwig <nliudvig@nvidia.com> Co-authored-by: Wojciech Prazuch <wojciechprazuch3@gmail.com> Co-authored-by: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com> Co-authored-by: Minho Ryu <ryumin93@gmail.com> Co-authored-by: Stephen Ge <stepheng@nvidia.com> Co-authored-by: Jiacheng Xu <jcxu@utexas.edu> Co-authored-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: Sanyam Kapoor <sanyamk@nvidia.com> Co-authored-by: Mateusz Winiarek <72758259+Froxyy-dev@users.noreply.github.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Meline Mkrtchyan <72409758+melllinia@users.noreply.github.com> Co-authored-by: Wei Du <wedu@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Sean Naren <snarenthiran@nvidia.com> Co-authored-by: Mehrzad Samadi <mehrzadsamadi@gmail.com> Co-authored-by: anowaczynski-nvidia <anowaczynski@nvidia.com> Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> commit a3d44dc Author: Suriya Gunasekar <sgunasekar@users.noreply.github.com> Date: Fri Feb 13 22:32:15 2026 -0800 Add --installation_command support to prepare_data (#1243) Signed-off-by: suriya <sgunasekar@nvidia.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> commit e80d524 Author: George Armstrong <georgea@nvidia.com> Date: Thu Feb 12 17:26:00 2026 -0800 Fix CI disk space for Docker image builds (#1241) Signed-off-by: George Armstrong <georgea@nvidia.com> commit d22236c Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Wed Feb 11 17:55:00 2026 -0800 Fix answerbench prompt parsing (#1235) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> commit 2401628 Author: George Armstrong <georgea@nvidia.com> Date: Wed Feb 11 14:56:43 2026 -0800 feat: add lockfiles for reproducible sandbox builds (#1233) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 5a0a84d Author: Wasi Ahmad <wasiahmad@ucla.edu> Date: Wed Feb 11 13:30:03 2026 -0800 removing datasets version restriction for LCB eval (#1230) Signed-off-by: wasiahmad <wasiahmad@ucla.edu> commit ef0a890 Author: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com> Date: Wed Feb 11 12:03:16 2026 +0400 Gnalbandyan/add physics (#1214) Signed-off-by: Grigor Nalbandyan <gnalbandyan@nvidia.com> Signed-off-by: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com> commit bd9d30c Author: Wasi Ahmad <wasiahmad@ucla.edu> Date: Tue Feb 10 15:13:27 2026 -0800 LCB generic prompting (#1215) Signed-off-by: wasiahmad <wasiahmad@ucla.edu> commit 7d6c49a Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Sat Feb 7 08:45:46 2026 -0800 Add support for different variations of nemo-rl (#1220) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> commit b19ba96 Author: George Armstrong <georgea@nvidia.com> Date: Fri Feb 6 21:40:56 2026 -0800 Add multi-node sandbox support for SLURM clusters (#1218) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 8950bb0 Author: anowaczynski-nvidia <anowaczynski@nvidia.com> Date: Sat Feb 7 01:38:00 2026 +0100 support structured outputs in hle judge for optional AA compatibility (#1186) Signed-off-by: Arkadiusz Nowaczynski <anowaczynski@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit b84f7a2 Author: Igor Gitman <igitman@nvidia.com> Date: Fri Feb 6 14:51:02 2026 -0800 A small update on running tests docs (#1219) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 8e838e1 Author: George Armstrong <georgea@nvidia.com> Date: Thu Feb 5 18:01:35 2026 -0800 feat: add flag to disable sandbox replay (#1217) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 5fd9085 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 5 15:57:01 2026 -0800 Add an option to limit number of tool calls (#1216) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit d820200 Author: Igor Gitman <igitman@nvidia.com> Date: Tue Feb 3 10:43:55 2026 -0800 Add arena-hard v2 (#1205) Signed-off-by: bzantium <ryumin93@gmail.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: bzantium <ryumin93@gmail.com> commit a30920e Author: Igor Gitman <igitman@nvidia.com> Date: Mon Feb 2 10:53:55 2026 -0800 Fix mkdocs warnings (#1204) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 19d7788 Author: Ivan <imoshkov@nvidia.com> Date: Mon Feb 2 23:25:13 2026 +0500 Fix infinite wait in sandbox.wait_for_sandbox (#1206) Signed-off-by: i-vainn <imoshkov@nvidia.com> commit 3e65fbf Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Fri Jan 30 19:38:38 2026 -0800 Improve tts (#1203) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> commit 250c862 Author: Nick Ludwig <nliudvig@nvidia.com> Date: Fri Jan 30 22:12:29 2026 +0400 SWE-bench: fix SWE-agent hanging, adjust expected scores (#1202) Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> commit 7ded756 Author: Ivan <imoshkov@nvidia.com> Date: Fri Jan 30 09:57:41 2026 +0500 Add proper token counting to code execution model (#1184) Signed-off-by: i-vainn <imoshkov@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit b986304 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Jan 29 17:57:07 2026 -0800 Upgrade containers (#1198) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Sadegh Mahdavi <smahdavi@nvidia.com> commit 3b44f02 Author: Dan Lord <blahblahasdf@gmail.com> Date: Thu Jan 29 16:40:47 2026 -0800 Fix incorrect string format (#1199) Signed-off-by: dlord <dlord@nvidia.com> commit c4854b8 Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Thu Jan 29 13:43:36 2026 -0800 Update nemo-rl to latest (#1087) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com>
…ackends (#1250) Signed-off-by: Valentin Mendelev <vmendelev@nvidia.com>
…ackends (#1250) Signed-off-by: Valentin Mendelev <vmendelev@nvidia.com> Signed-off-by: dgitman <dgitman@nvidia.com>
Summary
This PR adds a backend-agnostic unified inference server and integrates speech backends used in NeMo-Skills generation workflows.
Included:
nemo_skills.inference.server.serve_unifiedrecipes/multimodal/server/unified_server.pyrecipes/multimodal/server/backends/*magpie_ttsnemo_asrtests/slurm-tests/unified_asr/*tests/slurm-tests/unified_tts/*tests/slurm-tests/run_all.shtests/test_nemo_asr_backend.pytests/test_unified_server_audio_parser.pyScope and constraints
nemo_skills/inference/generate.pypublic config surface.Notable follow-up-compatible behavior
tests/slurm-tests/unified_tts/run_test.pykeeps default server image asnvcr.io/nvidia/nemo:25.11.--code_pathto a NeMo tree that containsmagpietts_inference.tests/slurm-tests/unified_tts/README.mdValidation
Unit tests
Slurm validation (rerun on Feb 25, 2026)
Ran ASR+TTS on both clusters with local image override and correct TTS
--code_path.DFW:
9662732, checker9662734->ALL TESTS PASSED9662733, checker9662735->ALL TESTS PASSEDIAD (draco_oci):
8611354, checker8611357->ALL TESTS PASSED8611353, checker8611356->ALL TESTS PASSEDCommit notes
Latest incremental updates in this PR branch include:
Add unified TTS Slurm test READMEAdd copyright header to NeMo ASR backend testSummary by CodeRabbit
New Features
Documentation
Tests