Skip to content

mtmd: mtmd_audio_streaming_istft for audio output#18645

Merged
ngxson merged 1 commit intoggml-org:masterfrom
tdakhran:tarek/dev/istft-upstream
Jan 6, 2026
Merged

mtmd: mtmd_audio_streaming_istft for audio output#18645
ngxson merged 1 commit intoggml-org:masterfrom
tdakhran:tarek/dev/istft-upstream

Conversation

@tdakhran
Copy link
Contributor

@tdakhran tdakhran commented Jan 6, 2026

Change is decoupled from #18641.

LFM2.5-Audio-1.5B needs streaming istft for generating output audio.

  • add streaming ISTFT class (mtmd_audio_streaming_istft) with overlap-add for audio reconstruction
  • replace global audio cache with per-instance cache, the model requires two independent caches, for preprocessing (audio input) and for istft (audio output).
  • unified templated FFT/IFFT implementation supporting both forward and inverse transforms

Make sure to read the contributing guidelines before submitting a PR

Change is decoupled from ggml-org#18641.

[LFM2.5-Audio-1.5B](https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B)
needs streaming istft for generating output audio.

* add streaming ISTFT class (`mtmd_audio_streaming_istft`) with overlap-add for audio reconstruction
* replace global audio cache with per-instance cache, the model requires
  two independent caches, for preprocessing (audio input) and for istft
  (audio output).
* unified templated FFT/IFFT implementation supporting both forward and inverse transforms
Copy link
Collaborator

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, thanks! I ran the test and confirmed that this doesn't break other models

@ngxson ngxson merged commit ccbc84a into ggml-org:master Jan 6, 2026
71 of 72 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants