mtmd : add Nemotron 3 Nano Omni support (parakeet) by danbev · Pull Request #22520 · ggml-org/llama.cpp

danbev · 2026-04-29T14:36:59Z

Overview

This is a work in progress. It will not be merged until the whisper.cpp/parakeet.cpp PR has been merged. Working on both allows for discovering improvements/painpoints which can feedback both ways

This commit adds support for the subsampling and encoder part of Nemotron Nemo 3 omni model.

Additional information

The Parakeet subsampling/encoder were taken from parakeet.cpp which is currently a pull request against whisper.cpp. I've tried to copy the code as close as possible to hopefully enable easy patching between these two project later.

Refs: ggml-org/whisper.cpp#3735

I have read and agree with the contributing guidelines
AI usage disclosure: No

For testing a converted model can be found here and can be run using the following command:

llama-mtmd-cli -hf danbev/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16-mtmd-GGUF --no-warmup --audio jfk.wav -p "Transcribe this audio clip, only the trancription and nothing else."

This commit adds support for the subsampling and encoder part of Nemotron Nemo 3 omni model. The Parakeet subsampling/encoder were taken from parakeet.cpp which is currently a pull request against whisper.cpp. I've tried to copy the code a close as possible to hopefully enable easy patching between the these two project later. Refs: ggml-org/whisper.cpp#3735

ngxson

looks good, I'm leaving some early-review comments

This commit removes the generation of the relative positional tensor in the model conversion script and instead computes it in the encoder graph. This is only done for the window of positions required for the current audio sample.

This commit adds a function to get access to the clip_model. It also removes the two functions clip_get_mel_filter_tensor, and clip_get_window_tensor(const struct clip_ctx * ctx) which can now use clip_get_model to access the model tensors that it needs.

ngxson

looking good so far

…tmd-audio

…tmd-audio [no ci]

…tmd-audio

ngxson reviewed Apr 29, 2026

View reviewed changes

Comment thread convert_hf_to_gguf.py Outdated

Comment thread tools/mtmd/mtmd-audio.cpp Outdated

github-actions Bot added examples python python script changes labels Apr 29, 2026

ngxson reviewed Apr 30, 2026

View reviewed changes

Comment thread tools/mtmd/clip.h Outdated

ngxson reviewed Apr 30, 2026

View reviewed changes

Comment thread tools/mtmd/mtmd-audio.cpp Outdated

danbev added 2 commits April 30, 2026 14:50

mtmd : read mel_filters and window into hparams

8e279f4

ngxson reviewed Apr 30, 2026

View reviewed changes

Comment thread tools/mtmd/mtmd-audio.cpp Outdated

Comment thread tools/mtmd/clip.cpp Outdated

danbev added 14 commits May 1, 2026 10:16

mtmd : use set_input_f32 lambda [no ci]

ffd1b99

mtmd : add better asserts for mel_filters and hann window [no ci]

8af100f

Merge remote-tracking branch 'upstream/master' into nemotron-3-omni-m…

b5a35e0

…tmd-audio

Merge remote-tracking branch 'upstream/master' into nemotron-3-omni-m…

7ed9294

…tmd-audio

mtmd : add missing size_t cast

49658ba

mtmd : change type of pad to size_t

9a8398e

mtmd : zero initialize samples_padded

6ba52fc

mtmd : remove unsued ctx member from parakeet preprocessor

385b2d4

mtmd : make log_mel_spectrogram_parakeet_worker_thread private static

cef7ff7

Merge remote-tracking branch 'upstream/master' into nemotron-3-omni-m…

681a199

…tmd-audio

Merge remote-tracking branch 'upstream/master' into nemotron-3-omni-m…

44cb51f

…tmd-audio

Merge remote-tracking branch 'upstream/master' into nemotron-3-omni-m…

0cd9e16

…tmd-audio

Merge remote-tracking branch 'upstream/master' into nemotron-3-omni-m…

78e28f4

…tmd-audio [no ci]

Merge remote-tracking branch 'upstream/master' into nemotron-3-omni-m…

96b1326

…tmd-audio

danbev self-assigned this Jun 2, 2026

Merge remote-tracking branch 'upstream/master' into nemotron-3-omni-m…

656437b

…tmd-audio

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mtmd : add Nemotron 3 Nano Omni support (parakeet)#22520

mtmd : add Nemotron 3 Nano Omni support (parakeet)#22520
danbev wants to merge 19 commits into
ggml-org:masterfrom
danbev:nemotron-3-omni-mtmd-audio

danbev commented Apr 29, 2026 •

edited

Loading

Uh oh!

ngxson left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ngxson left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

danbev commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Additional information

Uh oh!

ngxson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ngxson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

danbev commented Apr 29, 2026 •

edited

Loading