Skip to content

Api call server#1189

Closed
Jorjeous wants to merge 14 commits intomainfrom
api_call_server
Closed

Api call server#1189
Jorjeous wants to merge 14 commits intomainfrom
api_call_server

Conversation

@Jorjeous
Copy link
Member

@Jorjeous Jorjeous commented Jan 27, 2026

add ability pro process audio with inference server

Summary by CodeRabbit

  • New Features

    • Added APIMultimodal server type for audio processing with configurable audio chunking
    • Introduced apply_whisper_normalization option to control audio normalization behavior
    • Added audio_format field to generation configuration
  • Improvements

    • Enhanced import robustness for optional evaluators

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
server_config = dict(self.cfg.server)
if needs_audio and server_config.get("server_type") not in ["vllm", "vllm_multimodal"]:
if needs_audio and server_config.get("server_type") not in ["vllm", "vllm_multimodal", "api_multimodal"]:
LOG.warning(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated warnign, as it was misleading

@Jorjeous Jorjeous requested a review from Kipok January 27, 2026 14:58
@Jorjeous
Copy link
Member Author

Jorjeous commented Jan 27, 2026

@Kipok Please consider for rewiev only files without my comments.
files with comments will be reverted to "main" branch condition

Jorjeous and others added 4 commits January 27, 2026 07:23
Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
Copy link
Collaborator

@karpnv karpnv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor changes needed

@Jorjeous Jorjeous marked this pull request as ready for review February 3, 2026 13:03
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 3, 2026

📝 Walkthrough

Walkthrough

This PR introduces multimodal API audio support by adding an APIMultimodal model class with audio chunking capabilities, registers it in the inference ecosystem, extends generation configuration with audio format options, makes certain evaluators optional imports, and adds ASR normalization control to the audio evaluator.

Changes

Cohort / File(s) Summary
Evaluator Import & Registration
nemo_skills/evaluation/evaluator/__init__.py
Wrapped ComputeEvalEvaluator and ruler imports in try/except blocks; defer map population until import success. Added validation to detect overlaps between class- and function-based evaluator maps.
Audio Evaluator Configuration
nemo_skills/evaluation/evaluator/audio.py
Added apply_whisper_normalization: bool = True flag to AudioEvaluatorConfig to conditionally gate Whisper normalization in ASR evaluation paths.
Generation Task Configuration
nemo_skills/inference/generate.py
Added audio_format: str = "audio_url" field; updated drop_content_types default to include both "audio_url" and "input_audio"; extended server type allowances to "api_multimodal" with audio settings passthrough.
Model Registry & Server Support
nemo_skills/inference/model/__init__.py, nemo_skills/pipeline/utils/server.py
Imported and registered new APIMultimodal model class in models mapping; added api_multimodal = "api_multimodal" enum member to SupportedServers.
Multimodal API Model Implementation
nemo_skills/inference/model/api_multimodal.py
New module implementing APIMultimodal(OpenAIModel) with audio input encoding to base64, configurable audio chunking by duration threshold and task type filtering, chunked generation with result aggregation, and message preprocessing for audio block placement before text content.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant APIMultimodal
    participant AudioProcessor
    participant OpenAIModel
    
    Client->>APIMultimodal: generate_async(prompt with audio)
    APIMultimodal->>APIMultimodal: _preprocess_messages_for_model()
    APIMultimodal->>APIMultimodal: convert audio refs to base64
    APIMultimodal->>APIMultimodal: _needs_audio_chunking()
    
    alt Audio Chunking Required
        APIMultimodal->>AudioProcessor: load audio, determine duration
        AudioProcessor-->>APIMultimodal: audio duration
        APIMultimodal->>APIMultimodal: split audio into chunks
        
        loop For Each Audio Chunk
            APIMultimodal->>OpenAIModel: generate_async(chunk)
            OpenAIModel-->>APIMultimodal: result (text, tokens)
            APIMultimodal->>APIMultimodal: aggregate result
        end
    else No Chunking Needed
        APIMultimodal->>OpenAIModel: generate_async(full message)
        OpenAIModel-->>APIMultimodal: result
    end
    
    APIMultimodal-->>Client: final aggregated result
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • Kipok
  • gwarmstrong
  • melllinia
🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Api call server' is vague and generic, failing to convey the specific nature of the changes which center on adding multimodal audio support to an inference API client. Consider a more descriptive title such as 'Add APIMultimodal server with audio chunking support' or 'Support audio processing in API multimodal inference' to better communicate the primary change.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 88.89% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch api_call_server

Important

Action Needed: IP Allowlist Update

If your organization protects your Git platform with IP whitelisting, please add the new CodeRabbit IP address to your allowlist:

  • 136.113.208.247/32 (new)
  • 34.170.211.100/32
  • 35.222.179.152/32

Failure to add the new IP will result in interrupted reviews.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
nemo_skills/evaluation/evaluator/__init__.py (1)

141-141: ⚠️ Potential issue | 🟡 Minor

Remove debug print statement.

This appears to be leftover debug code that should be removed or converted to a proper log statement.

Proposed fix
     if eval_type in EVALUATOR_CLASS_MAP:
         evaluator = get_evaluator_class(eval_type, eval_config)
-        print(f"evaluator: {evaluator}")
         return asyncio.run(evaluator.eval_full())

Or if logging is desired:

     if eval_type in EVALUATOR_CLASS_MAP:
         evaluator = get_evaluator_class(eval_type, eval_config)
-        print(f"evaluator: {evaluator}")
+        LOG.debug("evaluator: %s", evaluator)
         return asyncio.run(evaluator.eval_full())
🧹 Nitpick comments (4)
nemo_skills/evaluation/evaluator/audio.py (1)

511-531: Consider extracting repeated mode selection logic.

The pattern mode = config.normalization_mode if config.apply_whisper_normalization else "none" is duplicated across ASR-PC, ASR, and ASR_LEADERBOARD branches. This is acceptable given the localized scope, but could be extracted to a helper or computed once at the start of evaluate_sample if more task types need the same logic.

nemo_skills/inference/model/api_multimodal.py (3)

154-154: Use explicit | None type annotation for optional parameters.

Per PEP 484, using = None without | None in the type hint creates an implicit Optional which is discouraged.

Proposed fix
-    def _needs_audio_chunking(self, messages: list[dict], task_type: str = None) -> tuple[bool, str, float]:
+    def _needs_audio_chunking(self, messages: list[dict], task_type: str | None = None) -> tuple[bool, str, float]:
-        task_type: str = None,
+        task_type: str | None = None,

Also applies to: 291-291


230-230: Rename unused loop variable.

The chunk_idx variable is not used within the loop body. Rename to _chunk_idx or use _ to indicate it's intentionally unused.

Proposed fix
-        for chunk_idx, audio_chunk in enumerate(chunks):
+        for _chunk_idx, audio_chunk in enumerate(chunks):

Or if the index isn't needed at all:

-        for chunk_idx, audio_chunk in enumerate(chunks):
+        for audio_chunk in chunks:

315-321: Eliminate redundant audio preprocessing in the non-chunking path.

When generate_async takes the non-chunking path (lines 316-318), it preprocesses messages with content_text_to_list and _preprocess_messages_for_model, then calls super().generate_async(). The parent's generate_async calls self._build_chat_request_params (line 275 in base.py), triggering the same preprocessing again in the overridden method (lines 328-329).

While safe—content_text_to_list is idempotent after the first call—the redundant deep copies (lines 318 and 328) add unnecessary overhead. Preprocess messages only once before calling the parent, or remove preprocessing from the non-chunking path if _build_chat_request_params handles it universally.

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
@Jorjeous Jorjeous requested a review from melllinia February 3, 2026 14:10
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 3, 2026

Additional Comments (1)

nemo_skills/inference/model/vllm_multimodal.py
content_text_to_list mutates the input message dict (modifying message["content"], deleting message["audio"]/message["audios"]). Same mutation issue as in api_multimodal.py - this violates the principle of avoiding silent bugs through mutation.

Jorjeous and others added 2 commits February 4, 2026 01:56
Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 5 comments

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 4, 2026

Additional Comments (4)

nemo_skills/evaluation/evaluator/__init__.py
print(f"evaluator: {evaluator}") introduces an unconditional stdout side-effect in the evaluator execution path. This can break callers/pipelines that expect clean stdout (e.g., JSON output) and bypasses the project’s logging patterns.

    # (remove debug print; use logging if needed)

nemo_skills/inference/model/__init__.py
server_type is normalized for the registry lookup (server_type.lower()), but later checks use the original casing (e.g., if server_type == "trtllm" ...). If a caller passes TRTLLM, the model loads but the trtllm-specific validation/behavior is skipped.

if server_type.lower() == "trtllm" and kwargs.get("enable_soft_fail", False):

nemo_skills/inference/generate.py
This cleanup assumes litellm.cache.cache always has force_save(). If litellm.cache is configured differently (or the cache wrapper shape changes), generation can crash during teardown when enable_litellm_cache=True. Safer to call getattr(litellm.cache, "cache", None) / hasattr(..., "force_save") or rely on the cache implementation’s public API.


tests/test_vllm_audio.py
This fixture patches VLLMMultimodalModel.__init__ to lambda: None, so the object never runs VLLMModel/BaseModel initialization. That makes the test brittle (it can pass while real construction fails) and may mask regressions tied to init-time defaults.

If the goal is to unit-test _preprocess_messages_for_model, consider constructing a minimal instance without patching __init__ (or patch only the specific heavyweight parts called by __init__) so the object’s invariants match production.

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@Jorjeous Jorjeous closed this Feb 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants