Open
Conversation
Introduces the unified NeMo inference server with a pluggable backend architecture. All backend-specific logic lives in backend modules, not the server. Backends declare their config via get_config_class() and can register additional routes via get_extra_routes(). - recipes/multimodal/server/unified_server.py: Generic FastAPI server with request batching and OpenAI-compatible /v1/chat/completions - recipes/multimodal/server/backends/base.py: Abstract InferenceBackend with get_config_class(), get_extra_routes() for future extensibility - recipes/multimodal/server/backends/__init__.py: Lazy-loading registry - nemo_skills/inference/server/serve_unified.py: CLI entrypoint with YAML config support and backward-compatible CLI args Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 tasks
Replace ~20 hard-coded backend-specific CLI arguments with a generic parse_extra_args() that converts unknown flags to a config dict. This makes serve_unified.py truly backend-agnostic — new backends no longer need to edit the server entrypoint. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements MagpieTTSBackend with get_config_class() for the refactored unified server. Includes all bugfixes from the feature branch: - Checkpoint + hparams loading (alternative to .nemo) - Dummy wav for missing context audio - Decoder cache reset per request batch - HF resolve URL caching via huggingface_hub - KV cache disabled to avoid shape mismatches - Batch size configurable via config Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
f4bc5cd to
1a9f4fa
Compare
6e7b703 to
ac7771f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
InferenceBackendinterfaceget_config_class()returningMagpieTTSConfigfor YAML-based configurationBugfixes included
.nemofilehuggingface_hubto avoid 429sDepends on
Files
recipes/multimodal/server/backends/magpie_tts_backend.py— full backend implementationTest plan
--backend magpie_tts --model /path --codec_model /path🤖 Generated with Claude Code