parakeet : add support for NVIDIA Parakeet#3735
Conversation
ffmpeg --enable-parakeet instructionsTo try this out we need to first checkout this PRs branch: $ git clone -b parakeet-support https://github.com/danbev/whisper.cpp.gitThen we build and install the parakeet library to a directory named $ cat build-install.sh
#!/bin/bash
set -e
build_dir=build
install_dir=build-install
rm -rf ${install_dir}
mkdir -p ${install_dir}
cmake -S . -B ${build_dir} -DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX=/home/danbev/work/ai/whisper-work/${install_dir} \
-DGGML_BACKEND_DIR=/home/danbev/work/ai/whisper-work/${install_dir}/lib \
-DBUILD_SHARED_LIBS=ON \
-DGGML_USE_CPU=ON \
-DGGML_CPU_ALL_VARIANTS=ON \
-DWHISPER_ALL_WARNINGS=ON \
-DWHISPER_FATAL_WARNINGS=ON \
-DGGML_BACKEND_DL=ON \
-DGGML_CUDA=ON \
-DCMAKE_CUDA_ARCHITECTURES="89-real" \
-DGGML_CPU_AARCH64=OFF \
-DGGML_CUDA_F16=ON
cmake --build ${build_dir} -j 8
cmake --install ${build_dir} --prefix ${install_dir}Then we need to check out the following FFmpeg branch: $ git clone -b parakeet.cpp https://github.com/danbev/FFmpeg.gitAnd then build FFmpeg using the following configuration options and we explicitly $ export PKG_CONFIG_PATH="/home/danbev/work/ai/whisper-work/build-install/lib/pkgconfig${PKG_CONFIG_PATH:+:$PKG_CONFIG_PATH}"
$ ./configure --prefix=/usr --enable-version3 --disable-shared --enable-gpl \
--enable-nonfree --enable-static --enable-pthreads --enable-filters \
--enable-openssl --enable-runtime-cpudetect --enable-libvpx --enable-libx264 \
--enable-libx265 --enable-libspeex --enable-libfreetype --enable-fontconfig \
--enable-libzimg --enable-libvorbis --enable-libwebp --enable-libfribidi \
--enable-libharfbuzz --enable-libass --enable-whisper --enable-parakeet
$ makeTo run we need to set $ export LD_LIBRARY_PATH=/home/danbev/work/ai/whisper-work/build-install/lib/:$LD_LIBRARY_PATHAfter that it should be possible to run using the following command: $ ./ffmpeg -i gb1.wav -loglevel quiet -af parakeet=model=ggml-parakeet-tdt-0.6b-v3.bin:use_gpu=1:destination=- -f null -
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 11903 MiB):
Device 0: NVIDIA GeForce RTX 4070, compute capability 8.9, VMM: yes, VRAM: 11903 MiB
load_backend: loaded CUDA backend from /home/danbev/work/ai/whisper-work/build-install/lib/libggml-cuda.so
load_backend: loaded CPU backend from /home/danbev/work/ai/whisper-work/build-install/lib/libggml-cpu-alderlake.so
My fellow Americans, this day has brought terrible news and great sadness to our country. At nine o'clock this morning, mission control in Houston lost contact with our space shuttle Columbia. A short time later, debris was seen falling from the skies above Texas. The Columbia's lost. There are no survivors. On board was a crew of seven Colonel Rick Husband, Lieutenant Colonel Michael Anderson, Commander Laurel Clark, Captain David Brown, Commander William McCool, Dr. Kulpna Shavla, and Ilan Ramon, a colonel in the Israeli Air Force. These men and women assumed great risk in the service to all humanity. In an age when spaceflight has come to seem almost routine, it is easy to overlook the dangers of travel by rocket and the difficulties of navigating the fierce outer atmosphere of the Earth. Because of their courage and daring and idealism, we will miss them all the more. All Americans today are thinking as well of the families of these men and women who have been given this sudden shock and grief. You're not alone. Our entire nation grieves with you, and those you love will always have the respect and gratitude of this country. The cause in which they died will continue. Mankind is led into the darkness beyond our world by the inspiration of discovery and the longing to understand. Our journey into space will go on. In the skies today, we saw destruction and tragedy. Yet farther than we can see, there is comfort and hope. In the words of the prophet Isaiah, lift your eyes and look to the heavens. Who created all these? He who brings out the starry hosts one by one and calls them each by name, because of his great power and mighty strength, not one of them is missing. The crew of the shuttle Columbia did not return safely to Earth. Yet we can pray that all are safely home. May God bless the grieving families, and may God continue to bless America.The ffmpeg branch contains a build.sh script and a run-parakeet.sh script that might be easier to modify than trying to copy and paste the above commands. |
3d04340 to
ad6274f
Compare
|
This would be a great addition! Looking forward to it! |
…[no ci] This commit removes the generation of the relative positional tensor in the model conversion script and instead computes it in the encoder graph. This is only done for the window of positions required for the current audio sample. This was suggested in the mtmd integration of parakeet and the same approach is used there.
This is to enable librispeech testing which will be enabled in a follow up commit.
The result from running the tests was: ```console $ cat parakeet-tdt-0.6b-v3.txt WER: 1.96% ```
…no ci] Remove hardcoded build-cuda-89-release and just use build like whisper.cpp does.
This commit updates the parkeet requirements that are out of date as I've ben using a virtual environment on linux/mac that contains torch and numpy. This also fixes the reading of the model configuration which was failing on window.
|
LGTM, Is anyone about to review and merge? |
Thanks for the review. I still have a few things to sort out but I hope to be able to merge this early next week. I was a bit quick on moving this from draft in hindsight. |
This commit adds a function to reset the parakeet state that can be resused instead of duplicating code. It also resets the lstm state which was not done by parakeet_full leading to incorrect transcriptions when called multiple times
|
I found that we can register |
Thanks for pointing that out, I'll take a closer look. |
|
@KitaitiMakoto The callbacks were indeed missing and I've updated this now. |
|
Yay, I tried just after you pushed new commits and they worked for Ruby binding: https://github.com/danbev/whisper.cpp/compare/parakeet-support...KitaitiMakoto:whisper.cpp:ruby-parakeet?expand=1 Thank you. |
|
I wonder how we should test Parakeet model on CI. parakeet-tdt-0.6b-v3.bin on Hugging Face is 2.51GiB and I think it's too large to download on every CI run. |
I've only be running the tests manually. I'll take a look at what options we have. |
This commit adds the n_conv_kernel model parameter which is currently missing from the model conversion.
|
Thank you for reply. I think it would be practical to create a small dummy model similar to I don't intend to block the merge with this question. It's just exploratory. Thanks. |
|
Yes I’ve actually done this now and trying it out 👍
mån 25 maj 2026 kl. 09:17 skrev KITAITI Makoto ***@***.***>:
… *KitaitiMakoto* left a comment (ggml-org/whisper.cpp#3735)
<#3735 (comment)>
Thank you for reply.
I think it would be practical to create a small dummy model such as
for‑tests-ggml-base.bin and use it to test that the model loads correctly
and its some attributes, while omitting the actual transcription results in
CI.
I don't intend to block the merge with this question. It's just
exploratory.
Thanks.
—
Reply to this email directly, view it on GitHub
<#3735 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADJRXZGV4K7I6MKRXNPHQT44PXQXAVCNFSM6AAAAACXJA6MXGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DKMZSGMZTKOBRGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
This commit also changes the sample to be jfk.wav instead of gdb1.wav which is not a checked in sample.
|
The 2 failing CI jobs are also present on the master branch. I'm looking into them in #3829. |
Commit 9164f9c ("parakeet : remove flash attention param) removed the flash attention options but I forgot to remote them from the readme.
|
Not sure how it compares, but there's been another ggml based parakeet implementation published here: https://github.com/mudler/parakeet.cpp |
I saw this for the first time the other day but have not looked closer at it. |
This commit adds exception handling to the parakeet_model_load function call. The motivation for this is to avoid exceptions being thrown from this function as it is part of the extern C interface and instead log the error and return nullptr. Refs: ggml-org#3831
This commit applies the same changes as Commit ef24de1 ("cmake : do not assume /usr/lib library installation. (ggml-org#3693)") but for parakeet. Refs: ggml-org#3693
This is a work in progress to support the Parakeet model.
Usage instructions can be found in examples/parakeet-cli.