move audio processing into model by gwarmstrong · Pull Request #1137 · NVIDIA-NeMo/Skills

gwarmstrong · 2025-12-19T19:35:24Z

Key Components

1. `AudioProcessorConfig` (nested dataclass)

@nested_dataclass(kw_only=True)
class AudioProcessorConfig:
    data_dir: str = ""              # Base directory for audio files
    enable_chunking: bool = True    # Whether to chunk long audio
    chunk_task_types: list[str] | None = None
    chunk_threshold_sec: int = 30   # Chunk audio longer than this

2. `AudioProcessor` (wrapper class)

class AudioProcessor:
    def __init__(self, model, config, eval_config=None, eval_type=None):
        self.model = model
        # Resolves data_dir from config or eval_config
        ...
    
    async def generate_async(self, prompt, task_type=None, **kwargs):
        if isinstance(prompt, list):
            # Check if chunking needed
            # Convert audio files to base64
            # Handle chunking if audio is long
        return await self.model.generate_async(prompt=prompt, **kwargs)

3. Integration in `generate.py`

# In GenerateSolutionsConfig:
audio: AudioProcessorConfig | None = None  # Nested config, None = disabled

# In setup_llm():
if self.cfg.audio is not None:
    audio_supported_servers = {"vllm"}
    if server_type not in audio_supported_servers:
        raise ValueError(f"Audio not supported for {server_type}")
    llm = AudioProcessor(llm, self.cfg.audio, eval_config=..., eval_type=...)

Benefits

1. Clean Separation of Concerns

VLLMModel: ~150 lines, pure VLLM API client
AudioProcessor: ~350 lines, all audio logic in one place

2. Composability

Audio processing wraps the base model, so it works with any underlying model:

# Works with VLLM
llm = AudioProcessor(VLLMModel(...), config)

# Could work with our tool calling modules
llm = AudioProcessor(ToolCallingWrapper(VLLMModel(...)), config)

Signed-off-by: George Armstrong <georgea@nvidia.com>

Jorjeous · 2025-12-22T16:13:35Z

Probably should set default chunking to 60s, or decide automaticly, based on model is being used.
For megatron it's 60s, for qwern - 30s

karpnv

LGTM

gwarmstrong added 4 commits December 19, 2025 11:18

move audio processing into model

bb882bf

Signed-off-by: George Armstrong <georgea@nvidia.com>

simplify audio config in main model

7a8c840

Signed-off-by: George Armstrong <georgea@nvidia.com>

revert unrelated model initialization

15b94c1

Signed-off-by: George Armstrong <georgea@nvidia.com>

add back supported servers check

57b32e2

Signed-off-by: George Armstrong <georgea@nvidia.com>

gwarmstrong requested review from Jorjeous, karpnv and melllinia December 19, 2025 19:35

gwarmstrong mentioned this pull request Dec 19, 2025

Request to vLLM with audio message #1042

Closed

karpnv approved these changes Dec 23, 2025

View reviewed changes

karpnv merged commit a88736e into audio_bin Dec 23, 2025
2 checks passed

karpnv deleted the georgea/update-audio_bin branch December 23, 2025 00:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

move audio processing into model#1137

move audio processing into model#1137
karpnv merged 4 commits intoaudio_binfrom
georgea/update-audio_bin

gwarmstrong commented Dec 19, 2025 •

edited

Loading

Uh oh!

Jorjeous commented Dec 22, 2025

Uh oh!

karpnv left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

gwarmstrong commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key Components

1. AudioProcessorConfig (nested dataclass)

2. AudioProcessor (wrapper class)

3. Integration in generate.py

Benefits

1. Clean Separation of Concerns

2. Composability

Uh oh!

Jorjeous commented Dec 22, 2025

Uh oh!

karpnv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gwarmstrong commented Dec 19, 2025 •

edited

Loading

1. `AudioProcessorConfig` (nested dataclass)

2. `AudioProcessor` (wrapper class)

3. Integration in `generate.py`