19 Nov 16:55

6266a9f

v1.7.2 Latest

Latest

Overview

Various improvements in the Metal backend
Fix extra memory usage for large samples
Remove limit for ggml_context (i.e. more beams and processors are supported)

CPU	Config	Model	Th	FA	Enc.	Dec.	Bch5	PP	Commit
M2 Ultra	METAL	tiny	1	1	9.51	1.39	0.41	0.01	`83ac284`
M2 Ultra	METAL	tiny-q5_0	1	1	9.57	1.41	0.42	0.01	`83ac284`
M2 Ultra	METAL	tiny-q5_1	1	1	8.74	1.39	0.42	0.01	`83ac284`
M2 Ultra	METAL	tiny-q8_0	1	1	8.36	1.33	0.41	0.01	`83ac284`
M2 Ultra	METAL	base	1	1	14.27	1.90	0.63	0.02	`83ac284`
M2 Ultra	METAL	base-q5_0	1	1	15.50	1.90	0.65	0.02	`83ac284`
M2 Ultra	METAL	base-q5_1	1	1	15.67	1.88	0.65	0.02	`83ac284`
M2 Ultra	METAL	base-q8_0	1	1	14.69	1.81	0.63	0.02	`83ac284`
M2 Ultra	METAL	small	1	1	40.85	3.77	1.43	0.05	`83ac284`
M2 Ultra	METAL	small-q5_0	1	1	45.99	3.90	1.52	0.05	`83ac284`
M2 Ultra	METAL	small-q5_1	1	1	46.19	3.83	1.50	0.06	`83ac284`
M2 Ultra	METAL	small-q8_0	1	1	42.90	3.65	1.46	0.05	`83ac284`
M2 Ultra	METAL	medium	1	1	109.01	7.59	3.24	0.11	`83ac284`
M2 Ultra	METAL	medium-q5_0	1	1	126.78	7.55	3.45	0.13	`83ac284`
M2 Ultra	METAL	medium-q5_1	1	1	127.71	7.39	3.43	0.13	`83ac284`
M2 Ultra	METAL	medium-q8_0	1	1	115.97	7.21	3.35	0.12	`83ac284`
M2 Ultra	METAL	medium-dis	1	1	97.74	1.06	0.36	0.01	`83ac284`
M2 Ultra	METAL	large-v2	1	1	196.99	11.29	5.06	0.20	`83ac284`
M2 Ultra	METAL	large-v2-q5_0	1	1	233.88	10.83	5.56	0.24	`83ac284`
M2 Ultra	METAL	large-v2-q5_1	1	1	234.03	10.73	5.46	0.24	`83ac284`
M2 Ultra	METAL	large-v2-q8_0	1	1	210.83	10.29	5.23	0.22	`83ac284`
M2 Ultra	METAL	large-v2-dis	1	1	175.37	1.18	0.42	0.02	`83ac284`
M2 Ultra	METAL	large-v3-turbo	1	1	177.35	1.85	0.73	0.03	`83ac284`
M2 Ultra	METAL	large-v3-turbo-q5_0	1	1	209.31	1.69	0.80	0.04	`83ac284`
M2 Ultra	METAL	large-v3-turbo-q8_0	1	1	189.55	1.64	0.75	0.03	`83ac284`

What's Changed

Added OpenVino init on state by @sandrohanea in #2464
Updating the Quick start by @stsfaroz in #2475
max_length from max_target_positions by @CrispStrobe in #2477
Add dtw preset for large-v3-turbo by @rotemdan in #2481
make : fix GGML_VULKAN=1 build by @ggerganov in #2485
Add Vulkan notice in README.md by @toboil-features in #2488
Fix Ruby binding building by @KitaitiMakoto in #2484
Update of README.md by @toboil-features in #2489
whisper: fix index overflow by @Josscii in #2505
ruby : Add Metal support by @KitaitiMakoto in #2516
ruby: New segment callback by @KitaitiMakoto in #2506
ruby : add more APIs by @KitaitiMakoto in #2518
ruby: fix installation test by @KitaitiMakoto in #2519
When DTW timestamps are enabled, defer new_segment_callback until after DTW compute step by @jettoblack in #2515
ci : fix openblas build by @ggerganov in #2511
whisper : reduce ggml_context usage by @ggerganov in #2525
sync : ggml by @ggerganov in #2528
passing samples_padded by ref to the threads. by @vinmisra in #2534
fix ffmpeg v5 build by @stsydow in #2543
fix: ggml-vulkan logs by @thewh1teagle in #2547
Fix the instructions on the Ruby binding by @wilsonsilva in #2548
whisper.swiftui : add model download list & bench methods by @jhen0409 in #2546
ruby : Add more API by @KitaitiMakoto in #2551
Fix building workflow for linux/arm64 container by @rai62 in #2555
sync : ggml by @ggerganov in #2561
whisper.swiftui : switch Mac dest to Mac (Designed for iPad) by @jhen0409 in #2562
ci : use local ggml by @ggerganov in #2567
sycl: fix example build by @stsydow in #2570

New Contributors

@stsfaroz made their first contribution in #2475
@CrispStrobe made their first contribution in #2477
@toboil-features made their first contribution in #2488
@KitaitiMakoto made their first contribution in #2484
@Josscii made their first contribution in #2505
@jettoblack made their first contribution in #2515
@vinmisra made their first contribution in #2534
@stsydow made their first contribution in #2543
@wilsonsilva made their first contribution in #2548
@rai62 made their first contribution in #2555

Full Changelog: v1.7.1...v1.7.2

Contributors

KitaitiMakoto, wilsonsilva, and 13 other contributors

Assets 2

4 Join discussion

15 Nov 14:05

ggerganov

v1.7.2-pre

f02b40b

v1.7.2-pre Pre-release

Pre-release

Overview

This is a pre-release since I think there have been some reports about memory leaks which I haven't had the time to investigate and confirm. If these are resolved in the next days, will add them to the official 1.7.2 release next week.

Various improvements in the Metal backend
Fix extra memory usage for large samples
Remove limit for ggml_context (i.e. more beams and processors are supported)

CPU	Config	Model	Th	FA	Enc.	Dec.	Bch5	PP	Commit
M2 Ultra	METAL	tiny	1	1	9.51	1.39	0.41	0.01	`83ac284`
M2 Ultra	METAL	tiny-q5_0	1	1	9.57	1.41	0.42	0.01	`83ac284`
M2 Ultra	METAL	tiny-q5_1	1	1	8.74	1.39	0.42	0.01	`83ac284`
M2 Ultra	METAL	tiny-q8_0	1	1	8.36	1.33	0.41	0.01	`83ac284`
M2 Ultra	METAL	base	1	1	14.27	1.90	0.63	0.02	`83ac284`
M2 Ultra	METAL	base-q5_0	1	1	15.50	1.90	0.65	0.02	`83ac284`
M2 Ultra	METAL	base-q5_1	1	1	15.67	1.88	0.65	0.02	`83ac284`
M2 Ultra	METAL	base-q8_0	1	1	14.69	1.81	0.63	0.02	`83ac284`
M2 Ultra	METAL	small	1	1	40.85	3.77	1.43	0.05	`83ac284`
M2 Ultra	METAL	small-q5_0	1	1	45.99	3.90	1.52	0.05	`83ac284`
M2 Ultra	METAL	small-q5_1	1	1	46.19	3.83	1.50	0.06	`83ac284`
M2 Ultra	METAL	small-q8_0	1	1	42.90	3.65	1.46	0.05	`83ac284`
M2 Ultra	METAL	medium	1	1	109.01	7.59	3.24	0.11	`83ac284`
M2 Ultra	METAL	medium-q5_0	1	1	126.78	7.55	3.45	0.13	`83ac284`
M2 Ultra	METAL	medium-q5_1	1	1	127.71	7.39	3.43	0.13	`83ac284`
M2 Ultra	METAL	medium-q8_0	1	1	115.97	7.21	3.35	0.12	`83ac284`
M2 Ultra	METAL	medium-dis	1	1	97.74	1.06	0.36	0.01	`83ac284`
M2 Ultra	METAL	large-v2	1	1	196.99	11.29	5.06	0.20	`83ac284`
M2 Ultra	METAL	large-v2-q5_0	1	1	233.88	10.83	5.56	0.24	`83ac284`
M2 Ultra	METAL	large-v2-q5_1	1	1	234.03	10.73	5.46	0.24	`83ac284`
M2 Ultra	METAL	large-v2-q8_0	1	1	210.83	10.29	5.23	0.22	`83ac284`
M2 Ultra	METAL	large-v2-dis	1	1	175.37	1.18	0.42	0.02	`83ac284`
M2 Ultra	METAL	large-v3-turbo	1	1	177.35	1.85	0.73	0.03	`83ac284`
M2 Ultra	METAL	large-v3-turbo-q5_0	1	1	209.31	1.69	0.80	0.04	`83ac284`
M2 Ultra	METAL	large-v3-turbo-q8_0	1	1	189.55	1.64	0.75	0.03	`83ac284`

What's Changed

Added OpenVino init on state by @sandrohanea in #2464
Updating the Quick start by @stsfaroz in #2475
max_length from max_target_positions by @CrispStrobe in #2477
Add dtw preset for large-v3-turbo by @rotemdan in #2481
make : fix GGML_VULKAN=1 build by @ggerganov in #2485
Add Vulkan notice in README.md by @toboil-features in #2488
Fix Ruby binding building by @KitaitiMakoto in #2484
Update of README.md by @toboil-features in #2489
whisper: fix index overflow by @Josscii in #2505
ruby : Add Metal support by @KitaitiMakoto in #2516
ruby: New segment callback by @KitaitiMakoto in #2506
ruby : add more APIs by @KitaitiMakoto in #2518
ruby: fix installation test by @KitaitiMakoto in #2519
When DTW timestamps are enabled, defer new_segment_callback until after DTW compute step by @jettoblack in #2515
ci : fix openblas build by @ggerganov in #2511
whisper : reduce ggml_context usage by @ggerganov in #2525
sync : ggml by @ggerganov in #2528
passing samples_padded by ref to the threads. by @vinmisra in #2534
fix ffmpeg v5 build by @stsydow in #2543
fix: ggml-vulkan logs by @thewh1teagle in #2547
Fix the instructions on the Ruby binding by @wilsonsilva in #2548
whisper.swiftui : add model download list & bench methods by @jhen0409 in #2546
ruby : Add more API by @KitaitiMakoto in #2551
Fix building workflow for linux/arm64 container by @rai62 in #2555
sync : ggml by @ggerganov in #2561
whisper.swiftui : switch Mac dest to Mac (Designed for iPad) by @jhen0409 in #2562

New Contributors

@stsfaroz made their first contribution in #2475
@CrispStrobe made their first contribution in #2477
@toboil-features made their first contribution in #2488
@KitaitiMakoto made their first contribution in #2484
@Josscii made their first contribution in #2505
@jettoblack made their first contribution in #2515
@vinmisra made their first contribution in #2534
@stsydow made their first contribution in #2543
@wilsonsilva made their first contribution in #2548
@rai62 made their first contribution in #2555

Full Changelog: v1.7.1...v1.7.2-pre

Contributors

KitaitiMakoto, wilsonsilva, and 13 other contributors

Assets 2

0 Join discussion

07 Oct 10:09

ggerganov

v1.7.1

ebca09a

v1.7.1

Overview

Fix Vulkan crashes
Performance stats for Vulkan on RTX 2060

GPU	Config	Model	Th	Enc.	Dec.	Bch5	PP	Commit
RTX 2060	VULKAN	tiny	1	30.38	1.37	1.04	0.05	`9f346d0`
RTX 2060	VULKAN	tiny-q5_0	1	20.98	1.38	0.99	0.05	`9f346d0`
RTX 2060	VULKAN	tiny-q5_1	1	20.74	1.30	0.96	0.05	`9f346d0`
RTX 2060	VULKAN	base	1	44.69	1.59	1.78	0.09	`9f346d0`
RTX 2060	VULKAN	base-q5_0	1	39.72	2.11	1.72	0.08	`9f346d0`
RTX 2060	VULKAN	base-q5_1	1	39.45	2.01	1.63	0.08	`9f346d0`
RTX 2060	VULKAN	small	1	160.02	3.53	4.64	0.23	`9f346d0`
RTX 2060	VULKAN	small-q5_0	1	141.52	4.54	4.44	0.20	`9f346d0`
RTX 2060	VULKAN	small-q5_1	1	141.03	4.63	4.18	0.20	`9f346d0`
RTX 2060	VULKAN	medium	1	472.66	7.55	11.35	0.56	`9f346d0`
RTX 2060	VULKAN	medium-q5_0	1	395.55	9.81	10.64	0.49	`9f346d0`
RTX 2060	VULKAN	medium-q5_1	1	398.85	10.16	10.15	0.50	`9f346d0`
RTX 2060	VULKAN	medium-dis	1	427.26	1.26	1.20	0.08	`9f346d0`
RTX 2060	VULKAN	large-v2	1	924.60	12.36	18.56	1.01	`9f346d0`
RTX 2060	VULKAN	large-v2-q5_0	1	774.21	17.25	17.17	0.85	`9f346d0`
RTX 2060	VULKAN	large-v2-q5_1	1	779.75	17.44	16.27	0.85	`9f346d0`
RTX 2060	VULKAN	large-v2-dis	1	833.35	1.38	1.56	0.10	`9f346d0`
RTX 2060	VULKAN	large-v3-turbo	1	839.90	2.11	2.70	0.16	`9f346d0`
RTX 2060	VULKAN	large-v3-turbo-q5_0	1	705.49	3.22	2.53	0.14	`9f346d0`

What's Changed

Retry allocation with fallback flags by @SRHMorris in #2451

New Contributors

@SRHMorris made their first contribution in #2451

Full Changelog: v1.7.0...v1.7.1

Binaries

https://github.com/ggerganov/whisper.cpp/actions/runs/11213279590

Contributors

SRHMorris

Assets 2

05 Oct 14:15

ggerganov

v1.7.0

6a94163

v1.7.0

Overview

Fix crashes with high number of beams
Reduce overal VRAM usage
Optimize Encoder performance

Some performance numbers for this release:

M2 Ultra

Flash Attention ON:

GPU	Config	Model	Th	FA	Enc.	Dec.	Bch5	PP	Commit
M2 Ultra	METAL	tiny	1	1	8.37	1.44	0.48	0.01	`6a94163`
M2 Ultra	METAL	tiny-q5_0	1	1	9.81	1.46	0.50	0.01	`6a94163`
M2 Ultra	METAL	tiny-q5_1	1	1	8.80	1.47	0.50	0.01	`6a94163`
M2 Ultra	METAL	base	1	1	16.11	1.96	0.74	0.02	`6a94163`
M2 Ultra	METAL	base-q5_0	1	1	16.38	1.99	0.78	0.02	`6a94163`
M2 Ultra	METAL	base-q5_1	1	1	16.72	2.00	0.77	0.02	`6a94163`
M2 Ultra	METAL	small	1	1	41.26	3.88	1.66	0.05	`6a94163`
M2 Ultra	METAL	small-q5_0	1	1	46.91	4.02	1.76	0.06	`6a94163`
M2 Ultra	METAL	small-q5_1	1	1	47.05	4.00	1.73	0.06	`6a94163`
M2 Ultra	METAL	medium	1	1	111.29	7.79	3.63	0.11	`6a94163`
M2 Ultra	METAL	medium-q5_0	1	1	129.78	7.71	3.85	0.13	`6a94163`
M2 Ultra	METAL	medium-q5_1	1	1	129.29	7.71	3.87	0.13	`6a94163`
M2 Ultra	METAL	medium-dis	1	1	99.27	1.09	0.43	0.02	`6a94163`
M2 Ultra	METAL	large-v2	1	1	198.81	11.54	5.59	0.20	`6a94163`
M2 Ultra	METAL	large-v2-q5_0	1	1	236.18	11.12	6.11	0.24	`6a94163`
M2 Ultra	METAL	large-v2-q5_1	1	1	235.88	11.14	6.01	0.24	`6a94163`
M2 Ultra	METAL	large-v2-dis	1	1	177.41	1.21	0.48	0.02	`6a94163`
M2 Ultra	METAL	large-v3-turbo	1	1	178.92	1.89	0.83	0.03	`6a94163`
M2 Ultra	METAL	large-v3-turbo-q5_0	1	1	211.44	1.73	0.90	0.04	`6a94163`

Flash Attention OFF:

GPU	Config	Model	Th	Enc.	Dec.	Bch5	PP	Commit
M2 Ultra	METAL	tiny	1	10.04	1.37	0.50	0.01	`6a94163`
M2 Ultra	METAL	tiny-q5_0	1	10.02	1.36	0.53	0.01	`6a94163`
M2 Ultra	METAL	tiny-q5_1	1	11.08	1.37	0.53	0.01	`6a94163`
M2 Ultra	METAL	base	1	17.84	1.93	0.77	0.02	`6a94163`
M2 Ultra	METAL	base-q5_0	1	18.57	1.92	0.81	0.02	`6a94163`
M2 Ultra	METAL	base-q5_1	1	18.66	1.93	0.82	0.02	`6a94163`
M2 Ultra	METAL	small	1	48.26	3.95	1.73	0.05	`6a94163`
M2 Ultra	METAL	small-q5_0	1	53.68	3.99	1.85	0.06	`6a94163`
M2 Ultra	METAL	small-q5_1	1	53.86	4.00	1.82	0.06	`6a94163`
M2 Ultra	METAL	medium	1	130.09	8.01	3.82	0.13	`6a94163`
M2 Ultra	METAL	medium-q5_0	1	148.18	7.92	4.11	0.14	`6a94163`
M2 Ultra	METAL	medium-q5_1	1	147.95	7.94	4.11	0.14	`6a94163`
M2 Ultra	METAL	medium-dis	1	116.97	1.11	0.42	0.02	`6a94163`
M2 Ultra	METAL	large-v2	1	232.43	12.34	5.87	0.22	`6a94163`
M2 Ultra	METAL	large-v2-q5_0	1	269.72	11.68	6.44	0.26	`6a94163`
M2 Ultra	METAL	large-v2-q5_1	1	269.71	11.82	6.36	0.26	`6a94163`
M2 Ultra	METAL	large-v2-dis	1	209.25	1.25	0.48	0.02	`6a94163`
M2 Ultra	METAL	large-v3-turbo	1	211.09	1.98	0.84	0.03	`6a94163`
M2 Ultra	METAL	large-v3-turbo-q5_0	1	244.23	1.81	0.92	0.04	`6a94163`

Ryzen 9 5950X + RTX 2060

Flash Attention ON:

GPU	Config	Model	Th	FA	Enc.	Dec.	Bch5	PP	Commit
RTX 2060	AVX2 CUDA	tiny	1	1	7.35	0.78	0.24	0.01	`6a94163`
RTX 2060	AVX2 CUDA	tiny-q5_0	1	1	6.45	0.67	0.14	0.01	`6a94163`
RTX 2060	AVX2 CUDA	tiny-q5_1	1	1	6.39	0.66	0.14	0.01	`6a94163`
RTX 2060	AVX2 CUDA	base	1	1	10.20	0.88	0.30	0.01	`6a94163`
RTX 2060	AVX2 CUDA	base-q5_0	1	1	11.38	0.92	0.21	0.02	`6a94163`
RTX 2060	AVX2 CUDA	base-q5_1	1	1	11.76	0.91	0.20	0.02	`6a94163`
RTX 2060	AVX2 CUDA	small	1	1	33.06	2.00	0.56	0.03	`6a94163`
RTX 2060	AVX2 CUDA	small-q5_0	1	1	35.84	1.84	0.43	0.04	`6a94163`
RTX 2060	AVX2 CUDA	small-q5_1	1	1	36.89	1.82	0.42	0.04	`6a94163`
RTX 2060	AVX2 CUDA	medium	1	1	90.65	4.54	1.13	0.08	`6a94163`
RTX 2060	AVX2 CUDA	medium-q5_0	1	1	104.01	3.80	0.91	0.10	`6a94163`
RTX 2060	AVX2 CUDA	medium-q5_1	1	1	107.98	3.72	0.87	0.10	`6a94163`
RTX 2060	AVX2 CUDA	medium-dis	1	1	79.08	0.68	0.17	0.01	`6a94163`
RTX 2060	AVX2 CUDA	large-v2	1	1	162.00	7.52	1.92	0.14	`6a94163`
RTX 2060	AVX2 CUDA	large-v2-q5_0	1	1	184.59	5.64	1.50	0.16	`6a94163`
RTX 2060	AVX2 CUDA	large-v2-q5_1	1	1	193.85	5.55	1.44	0.17	`6a94163`
RTX 2060	AVX2 CUDA	large-v2-dis	1	1	140.75	0.84	0.37	0.02	`6a94163`
RTX 2060	AVX2 CUDA	large-v3-turbo	1	1	143.38	1.29	0.36	0.02	`6a94163`
RTX 2060	AVX2 CUDA	large-v3-turbo-q5_0	1	1	163.30	0.93	0.28	0.03	`6a94163`

Flash Attention OFF:

GPU	Config	Model	Th	Enc.	Dec.	Bch5	PP	Commit
RTX 2060	AVX2 CUDA	tiny	1	12.49	0.87	0.23	0.01	`6a94163`
RTX 2060	AVX2 CUDA	tiny-q5_0	1	10.65	0.78	0.19	0.02	`6a94163`
RTX 2060	AVX2 CUDA	tiny-q5_1	1	10.82	0.77	0.19	0.02	`6a94163`
RTX 2060	AVX2 CUDA	base	1	18.97	1.04	0.34	0.02	`6a94163`
RTX 2060	AVX2 CUDA	base-q5_0	1	20.22	1.09	0.27	0.02	`6a94163`
RTX 2060	AVX2 CUDA	base-q5_1	1	20.48	1.07	0.27	0.02	`6a94163`
RTX 2060	AVX2 CUDA	small	1	59.52	2.37	0.70	0.05	`6a94163`
RTX 2060	AVX2 CUDA	small-q5_0	1	62.98	2.23	0.60	0.06	`6a94163`
RTX 2060	AVX2 CUDA	small-q5_1	1	63.64	2.21	0.59	0.06	`6a94163`
RTX 2060	AVX2 CUDA	medium	1	161.53	5.36	1.53	0.13	`6a94163`
RTX 2060	AVX2 CUDA	medium-q5_0	1	174.96	4.64	1.32	0.15	`6a94163`
RTX 2060	AVX2 CUDA	medium-q5_1	1	178.42	4.57	1.29	0.15	`6a94163`
RTX 2060	AVX2 CUDA	medium-dis	1	149.65	0.75	0.20	0.02	`6a94163`
RTX 2060	AVX2 CUDA	large-v2	1	280.55	8.74	2.51	0.23	`6a94163`
RTX 2060	AVX2 CUDA	large-v2-q5_0	1	306.87	6.92	2.08	0.25	`6a94163`
RTX 2060	AVX2 CUDA	large-v2-q5_1	1	314.25	6.82	2.02	0.26	`6a94163`
RTX 2060	AVX2 CUDA	large-v2-dis	1	259.39	0.91	0.37	0.02	`6a94163`
RTX 2060	AVX2 CUDA	large-v3-turbo	1	261.83	1.44	0.41	0.04	`6a94163`
RTX 2060	AVX2 CUDA	large-v3-turbo-q5_0	1	282.99	1.09	0.33	0.04	`6a94163`

Vulkan:

GPU	Config	Model	Th	FA	Enc.	Dec.	Bch5	PP	Commit
RTX 2060	VULKAN	tiny	1	0	30.38	1.37	1.04	0.05	`9f346d0`
RTX 2060	VULKAN	tiny-q5_0	1	0	20.98	1.38	0.99	0.05	`9f346d0`
RTX 2060	VULKAN	tiny-q5_1	1	0	20.74	1.30	0.96	0.05	`9f346d0`
RTX 2060	VULKAN	base	1	0	44.69	1.59	1.78	0.09	`9f346d0`
RTX 2060	VULKAN	base-q5_0	1	0	39.72	2.11	1.72	0.08	`9f346d0`
RTX 2060	VULKAN	base-q5_1	1	0	39.45	2.01	1.63	0.08	`9f346d0`
RTX 2060	VULKAN	small	1	0	160.02	3.53	4.64	0.23	`9f346d0`
RTX 2060	VULKAN	small-q5_0	1	0	141.52	4.54	4.44	0.20	`9f346d0`
RTX 2060	VULKA...

Contributors

philn, jart, and 25 other contributors

Assets 2

0 Join discussion

27 May 07:36

ggerganov

v1.6.2

c7b6988

v1.6.2

Overview

Bugfix when using multiple whisper_state in parallel: #2182

What's Changed

Update ruby bindings by @taf2 in #2154
Update server.cpp by @dvaldivia in #2181
Revert "whisper : remove extra backend instance (huh?)" by @ggerganov in #2182

New Contributors

@dvaldivia made their first contribution in #2181

Full Changelog: v1.6.1...v1.6.2

Contributors

taf2, ggerganov, and dvaldivia

Assets 2

21 May 15:46

ggerganov

v1.6.1

c10db6e

v1.6.1

Minor release adding initial ffmpeg support in the examples #2133 (thx @WilliamTambellini)

What's Changed

ci: Update build.yml to suppress warnings about node.js versions by @tamo in #2166
node : add flash_attn param by @pprobst in #2170
Add support for decoding input with ffmpeg (Linux) by @WilliamTambellini in #2133

New Contributors

@WilliamTambellini made their first contribution in #2133

Full Changelog: v1.6.0...v1.6.1

Contributors

WilliamTambellini, tamo, and pprobst

Assets 2

15 May 07:13

ggerganov

v1.6.0

08981d1

v1.6.0

Overview

Can optionally enable Flash Attention for faster processing on CUDA and Metal devices (#2152)
Faster ppc64 performance (40aeeee) (not tested)
Fix main slowdown bug (#2070)

Shoutout to @JohannesGaessler for contributing efficient FA CUDA kernels

Some performance numbers for this release:

M1 Pro

CPU	Config	Model	Th	Enc.	Dec.	Bch5	PP	Commit
M1 Pro	METAL	tiny	1	39.21	1.74	0.61	0.04	`22c96b4`
M1 Pro	METAL	base	1	70.76	2.60	0.93	0.06	`22c96b4`
M1 Pro	METAL	small	1	217.28	6.42	2.14	0.17	`22c96b4`
M1 Pro	METAL	medium	1	596.74	14.43	4.75	0.45	`22c96b4`

CPU	Config	Model	Th	FA	Enc.	Dec.	Bch5	PP	Commit
M1 Pro	METAL	tiny	1	1	30.77	1.59	0.54	0.03	`22c96b4`
M1 Pro	METAL	base	1	1	60.42	2.29	0.81	0.05	`22c96b4`
M1 Pro	METAL	small	1	1	183.82	5.12	1.81	0.14	`22c96b4`
M1 Pro	METAL	medium	1	1	517.92	11.60	4.01	0.38	`22c96b4`

M2 Ultra

CPU	Config	Model	Th	Enc.	Dec.	Bch5	PP	Commit
M2 ULTRA	METAL	tiny	1	12.32	1.35	0.49	0.01	`22c96b4`
M2 ULTRA	METAL	tiny-q5_0	1	11.65	1.30	0.51	0.01	`22c96b4`
M2 ULTRA	METAL	tiny-q5_1	1	12.08	1.30	0.51	0.01	`22c96b4`
M2 ULTRA	METAL	base	1	17.58	1.90	0.76	0.02	`22c96b4`
M2 ULTRA	METAL	base-q5_0	1	18.89	1.86	0.79	0.02	`22c96b4`
M2 ULTRA	METAL	base-q5_1	1	20.69	1.88	0.79	0.02	`22c96b4`
M2 ULTRA	METAL	small	1	49.32	3.85	1.71	0.05	`22c96b4`
M2 ULTRA	METAL	small-q5_0	1	54.91	3.81	1.82	0.06	`22c96b4`
M2 ULTRA	METAL	small-q5_1	1	54.92	3.81	1.79	0.06	`22c96b4`
M2 ULTRA	METAL	medium	1	134.34	8.04	3.82	0.13	`22c96b4`
M2 ULTRA	METAL	medium-q5_0	1	151.68	7.59	4.07	0.14	`22c96b4`
M2 ULTRA	METAL	medium-q5_1	1	151.58	7.67	4.07	0.14	`22c96b4`
M2 ULTRA	METAL	medium-dis	1	120.82	1.07	0.41	0.02	`22c96b4`
M2 ULTRA	METAL	large-v2	1	235.63	12.27	5.85	0.22	`22c96b4`
M2 ULTRA	METAL	large-v2-q5_0	1	273.38	11.17	6.40	0.26	`22c96b4`
M2 ULTRA	METAL	large-v2-q5_1	1	272.44	11.32	6.29	0.26	`22c96b4`
M2 ULTRA	METAL	large-v2-dis	1	212.51	1.20	0.47	0.02	`22c96b4`

CPU	Config	Model	Th	FA	Enc.	Dec.	Bch5	PP	Commit
M2 ULTRA	METAL	tiny	1	1	9.07	1.33	0.45	0.01	`22c96b4`
M2 ULTRA	METAL	tiny-q5_0	1	1	9.74	1.33	0.47	0.01	`22c96b4`
M2 ULTRA	METAL	tiny-q5_1	1	1	8.93	1.31	0.46	0.01	`22c96b4`
M2 ULTRA	METAL	base	1	1	15.75	1.87	0.71	0.02	`22c96b4`
M2 ULTRA	METAL	base-q5_0	1	1	17.04	1.83	0.74	0.02	`22c96b4`
M2 ULTRA	METAL	base-q5_1	1	1	17.17	1.83	0.74	0.02	`22c96b4`
M2 ULTRA	METAL	small	1	1	42.33	3.64	1.60	0.05	`22c96b4`
M2 ULTRA	METAL	small-q5_0	1	1	47.61	3.63	1.70	0.05	`22c96b4`
M2 ULTRA	METAL	small-q5_1	1	1	47.70	3.66	1.68	0.05	`22c96b4`
M2 ULTRA	METAL	medium	1	1	114.42	7.53	3.55	0.11	`22c96b4`
M2 ULTRA	METAL	medium-q5_0	1	1	132.63	7.02	3.77	0.13	`22c96b4`
M2 ULTRA	METAL	medium-q5_1	1	1	132.28	7.10	3.76	0.13	`22c96b4`
M2 ULTRA	METAL	medium-dis	1	1	102.34	1.01	0.42	0.01	`22c96b4`
M2 ULTRA	METAL	large-v2	1	1	203.01	11.03	5.45	0.20	`22c96b4`
M2 ULTRA	METAL	large-v2-q5_0	1	1	240.05	10.18	5.98	0.23	`22c96b4`
M2 ULTRA	METAL	large-v2-q5_1	1	1	239.22	10.23	5.87	0.23	`22c96b4`
M2 ULTRA	METAL	large-v2-dis	1	1	181.14	1.14	0.48	0.02	`22c96b4`

Ryzen 9 5950X + RTX 2060

CPU	Config	Model	Th	Enc.	Dec.	Bch5	PP	Commit
Ryzen 9 5950X	AVX2	tiny	8	195.29	1.57	0.51	0.26	`22c96b4`
Ryzen 9 5950X	AVX2	tiny-q5_0	8	213.33	1.10	0.50	0.30	`22c96b4`
Ryzen 9 5950X	AVX2	tiny-q5_1	8	219.38	1.18	0.53	0.32	`22c96b4`
Ryzen 9 5950X	AVX2	base	8	424.85	3.71	1.03	0.46	`22c96b4`
Ryzen 9 5950X	AVX2	base-q5_0	8	473.61	1.81	0.82	0.52	`22c96b4`
Ryzen 9 5950X	AVX2	base-q5_1	8	484.14	1.92	0.85	0.56	`22c96b4`
Ryzen 9 5950X	AVX2	small	8	1458.32	12.66	3.09	1.26	`22c96b4`
Ryzen 9 5950X	AVX2	small-q5_0	8	1673.22	6.42	2.18	1.45	`22c96b4`
Ryzen 9 5950X	AVX2	small-q5_1	8	1724.78	6.72	2.32	1.52	`22c96b4`
Ryzen 9 5950X	AVX2	medium	8	4333.87	36.80	8.56	3.37	`22c96b4`
Ryzen 9 5950X	AVX2	medium-q5_0	8	5194.09	19.21	5.71	3.97	`22c96b4`
Ryzen 9 5950X	AVX2	medium-q5_1	8	5450.39	20.01	5.99	4.17	`22c96b4`
Ryzen 9 5950X	AVX2	medium-dis	8	3995.19	5.08	1.21	0.55	`22c96b4`
Ryzen 9 5950X	AVX2	large-v2	8	8056.16	69.74	16.11	6.13	`22c96b4`
Ryzen 9 5950X	AVX2	large-v2-q5_0	8	9799.58	35.16	10.49	7.28	`22c96b4`
Ryzen 9 5950X	AVX2	large-v2-q5_1	8	ms	36.74	11.02	7.65	`22c96b4`
Ryzen 9 5950X	AVX2	large-v2-dis	8	7490.03	7.40	1.70	0.72	`22c96b4`

GPU	Config	Model	Th	Enc.	Dec.	Bch5	PP	Commit
RTX 2060	AVX2 CUDA	tiny	8	12.54	0.93	0.29	0.02	`22c96b4`
RTX 2060	AVX2 CUDA	tiny-q5_0	8	12.73	0.98	0.24	0.02	`22c96b4`
RTX 2060	AVX2 CUDA	tiny-q5_1	8	12.72	0.99	0.24	0.02	`22c96b4`
RTX 2060	AVX2 CUDA	base	8	24.14	1.28	0.41	0.03	`22c96b4`
RTX 2060	AVX2 CUDA	base-q5_0	8	24.58	1.38	0.35	0.03	`22c96b4`
RTX 2060	AVX2 CUDA	base-q5_1	8	24.58	1.37	0.35	0.03	`22c96b4`
RTX 2060	AVX2 CUDA	small	8	74.70	2.91	0.84	0.07	`22c96b4`
RTX 2060	AVX2 CUDA	small-q5_0	8	76.12	2.84	0.77	0.08	`22c96b4`
RTX 2060	AVX2 CUDA	small-q5_1	8	76.14	2.84	0.76	0.08	`22c96b4`
RTX 2060	AVX2 CUDA	medium	8	200.69	6.46	1.83	0.17	`22c96b4`
RTX 2060	AVX2 CUDA	medium-q5_0	8	204.80	5.90	1.65	0.19	`22c96b4`
RTX 2060	AVX2 CUDA	medium-q5_1	8	205.61	5.85	1.61	0.19	`22c96b4`
RTX 2060	AVX2 CUDA	medium-dis	8	186.17	0.86	0.24	0.02	`22c96b4`
RTX 2060	AVX2 CUDA	large-v2	8	347.22	10.36	2.82	0.29	`22c96b4`
RTX 2060	AVX2 CUDA	large-v2-q5_0	8	357.06	8.81	2.58	0.34	`22c96b4`
RTX 2060	AVX2 CUDA	large-v2-q5_1	8	356.97	8.62	2.49	0.33	`22c96b4`
RTX 2060	AVX2 CUDA	large-v2-dis	8	318.05	1.03	0.34	0.04	`22c96b4`

GPU	Config	Model	Th	FA	Enc.	Dec.	Bch5	PP	Commit
RTX 2060	AVX2 CUDA	tiny	8	1	7.21	0.76	0.29	0.02	`22c96b4`
RTX 2060	AVX2 CUDA	tiny-q5_0	8	1	7.42	0.82	0.18	0.02	`22c96b4`
RTX 2060	AVX2 CUDA	tiny-q5_1	8	1	7.38	0.82	0.18	0.02	`22c96b4`
RTX 2060	AVX2 CUDA	...

Contributors

iboB, przemoc, and 14 other contributors

Assets 10

6 Join discussion

16 Apr 11:14

ggerganov

v1.5.5

7395c70

v1.5.5

Overview

Many small incremental updates + Token level timestamps with DTW by @denersc in #1485
Feedback is welcome!

Full Changelog: v1.5.4...v1.5.5

What's Changed

server : fix server temperature + add temperature_inc by @ggerganov in #1729
main : add cli option to disable system prints by @ggerganov in #1740
server: add request path by @eschmidbauer in #1741
Optional Piper TTS support for talk-llama example. by @RhinoDevel in #1749
fix/1748 by @nank1ro in #1750
Don't compute timestamps when not printing them. by @ghindle in #1755
Add more parameters to server api by @ghindle in #1754
Add SetInitialPrompt method to go bindings by @blib in #1753
ggml : fix 32-bit ARM compat for IQ2_XS by @ggerganov in #1758
refactor: get all scripts to be POSIX Compliant by @sonphantrung in #1725
whisper : load the model into multiple buffers of max size 1GB by @ggerganov in #1763
rebase against your -np changes (thx) and add better python file to be used on the command line or as library by @contractorwolf in #1744
examples/talk-llama: Add optional commandline parameter to set the bot name. by @RhinoDevel in #1764
server : fix building and simplify lib deps on Windows by @przemoc in #1772
talk-llama: optional wake-up command and audio confirmation by @Rakksor in #1765
examples/server: implement "verbose_json" format with token details by @rmmh in #1781
whisper.android: Return output from benchmarks by @luciferous in #1785
libwhisper.so should be position independent by @trixirt in #1792
Docs: try to make model options / model install methods clearer by @mrienstra in #1806
common : fix input buffer check by @ggerganov in #1812
Update Makefile by @jwijffels in #1813
Add fields to verbose_json response and show examples on the home page by @JacobLinCool in #1802
common: fix wav buffer detection by @JacobLinCool in #1819
Add macOS deployment target option to Makefile by @didzis in #1839
Expose CUDA device setting in public API by @didzis in #1840
whisper.android: How to build with CLBlast by @luciferous in #1809
server: Allow CORS request with authorization headers by @valenting in #1850
Embed Metal library source into compiled binary by @didzis in #1842
added audio_ctx argument to main and server examples by @dscripka in #1857
whisper : fix external encoder by @ggerganov in #1860
swift : package no longer use ggml dependency by @ggerganov in #1861
fix openvino setup docs by @jumpers775 in #1874
clean up common code in examples by @felrock in #1871
main : check if input files exist before proceeding by @Theldus in #1872
Linking issue fix via Makefile when CUBLAS enabled in the WSL #1876 by @lbluep in #1878
main : fix file existence check in main.cpp by @Theldus in #1889
openvino : fix convert-whisper-to-openvino.py for v2023.0.0 (#1870) by @st-gr in #1890
ggml : 32-bit arm compat by @ggerganov in #1891
Add SYCL logic in whisper by @abhilash1910 in #1863
talk and talk-llama: Pass text_to_speak as a file by @tamo in #1865
Stream.wasm: Fix invalid memory access when no segments are returned by @Andrews54757 in #1902
Update README to Recommend MacOS Sonoma for Core ML to avoid hallucination by @gavin1818 in #1917
Add library versioning by @kenneth-ge in #1352
Fix SF(segment fault) issue in Android JNI by @zhouwg in #1929
Fix typo in source file whisper.cpp by @zhouwg in #1925
bench:fix typo by @zhouwg in #1933
Auto lowercase language parameter by @F1L1Pv2 in #1928
ggml : try fix 32-bit arm compat by @ggerganov in #1938
whisper : make beam candidate sort more stable by @josharian in #1943
bindings/go : add linker flags to make metal work by @josharian in #1944
whisper : improve beam search candidate diversity by @josharian in #1947
whisper : document whisper_batch.n_seq_id by @josharian in #1942
Rename --audio-context to --audio-ctx, as per help text by @joliss in #1953
[DRAFT] Token level timestamps with DTW (#375) by @denersc in #1485
Fedora dependencies needed (SDL2) by @Man2Dev in #1970
libcuda.so.1 in PATH in Docker Container by @tiagofassoni in #1966
ruby : fix build by @ggerganov in #1980
Improve support for distil-large-v3 by @sanchit-gandhi in #1982
whisper : improve handling of prompts by @ggerganov in #1981
sync : ggml by @ggerganov in #2001
Implemented command-style grammar in the main example. by @ulatekh in #1998
Use pkg-config for OpenBLAS by @przemoc in #1778
ci : add building in MSYS2 environments (Windows) by @przemoc in #1994
Support CUDA versions < 11.1 by @primenko-v in #2020
Create solution folders in the CMake build by @ulatekh in #2004
Allow a regular expression to describe tokens to suppress by @ulatekh in #1997
"main" example now allows a response-file as the sole parameter by @ulatekh in #2019
Support for CPU BLAS build via Intel MKL by @slashlib in #2024
Set stdin to binary mode on Windows. Fixes #2023 by @rotemdan in #2025
Fix file-handle leak in read_wav() by @ulatekh in #2026
Fix DTW memory access by @bradmurray-dt in #2012
whisper: update grammar-parser.cpp by @eltociear in #2058
fix missing reference to "model" variable in actual shell command run in whisper.nvim by @sixcircuit in #2049
build : detect AVX512 in Makefile, add AVX512 option in CMake by @didzis in #2043
feature/no timestamps node by @pprobst in #2048
Update embedded Metal library generation process to include dependency by @didzis in #2045
server.cpp: add dtw by @eschmidbauer in #2044

New Contributors

@eschmidbauer made their first contribution in #1741
@RhinoDevel made their first contribution in #1749
@nank1ro made their first contribution in #1750
@ghindle made their first contribution in #1755
@blib made their first contribution in #1753
@sonphantrung made their first contribution in #1725
@contractorwolf made their first contribution in #1744
@Rakksor made their first contribution in #1765
@rmmh made their f...

Contributors

luciferous, ghindle, and 43 other contributors

Assets 2

8 Join discussion

05 Jan 15:20

ggerganov

v1.5.4

0b9af32

v1.5.4

Overview

Faster Core ML ANE models (#1716)
CUDA bugfix causing random erros in the transcription
Fix SwiftUI example build

Full Changelog: v1.5.3...v1.5.4

Assets 11

03 Jan 17:39

ggerganov

v1.5.3

9962371

v1.5.3

Overview

Minor maintenance release:

Fix CUDA issues where the transcription produces garbage
FIX quantized models to work with CUDA backend
Allow to use whisper.cpp and llama.cpp together in SwiftUI projects

What's Changed

Update bench.py by @ForkedInTime in #1655
cmake : Resolve quantized model issue when CUBLAS enabled by @bobqianic in #1667
examples : Revert CMakeLists.txt for talk-llama by @bobqianic in #1669
CI : Add coverage for talk-llama when WHISPER_CUBLAS=1 by @bobqianic in #1672
ci: build and push docker image by @OpenWaygate in #1674
sync : ggml (ggml_scale, ggml_row_size, etc.) by @ggerganov in #1677
Replace WHISPER_PRINT_DEBUG with WHISPER_LOG_DEBUG by @bobqianic in #1681
download: Fix large q5 model name by @dimopep in #1695
sync : ggml (VMM, sync-ggml-am.sh, dotprod ARM fixes) by @ggerganov in #1691
whisper : replace tensor->n_dims with ggml_n_dims(tensor) by @bobqianic in #1694
Build with CLBlast by @tamo in #1576
docker : Fix the Publishing of the CUDA Docker Image by @bobqianic in #1704
emscripten: fix "Stack Overflow!" by @Huguet57 in #1713
sync : ggml by @ggerganov in #1717
Add error handling to graph_compute by @finnvoor in #1714
Updates Package.swift to use ggml as package dependency by @1-ashraful-islam in #1701

New Contributors

@ForkedInTime made their first contribution in #1655
@OpenWaygate made their first contribution in #1674
@dimopep made their first contribution in #1695
@Huguet57 made their first contribution in #1713
@1-ashraful-islam made their first contribution in #1701

Full Changelog: v1.5.2...v1.5.3

Contributors

tamo, ggerganov, and 7 other contributors

Assets 11

0 Join discussion

Releases: ggerganov/whisper.cpp

v1.7.2

Overview

What's Changed

New Contributors

Contributors

v1.7.2-pre

Overview

What's Changed

New Contributors

Contributors

v1.7.1

Overview

What's Changed

New Contributors

Binaries

Contributors

v1.7.0

Overview

M2 Ultra

Ryzen 9 5950X + RTX 2060

Contributors

v1.6.2

Overview

What's Changed

New Contributors

Contributors

v1.6.1

What's Changed

New Contributors

Contributors

v1.6.0

Overview

M1 Pro

M2 Ultra

Ryzen 9 5950X + RTX 2060

Contributors

v1.5.5

Overview

What's Changed

New Contributors

Contributors

v1.5.4

Overview

v1.5.3

Overview

What's Changed

New Contributors

Contributors