Releases: SYSTRAN/faster-whisper
faster-whisper 1.0.3
Upgrade Silero-Vad model to latest V5 version (#884)
Silero-vad V5 release: https://github.com/snakers4/silero-vad/releases/tag/v5.0
- window_size_samples parameter is fixed at 512.
- Change to use the state variable instead of the existing h and c variables.
- Slightly changed internal logic, now some context (part of previous chunk) is passed along with the current chunk.
- Change the dimensions of the state variable from 64 to 128.
- Replace ONNX file with V5 version
Other changes
faster-whisper 1.0.2
-
Add support for distil-large-v3 (#755)
The latest Distil-Whisper model, distil-large-v3, is intrinsically designed to work with the OpenAI sequential algorithm. -
Benchmarks (#773)
Introduces functionality to measure benchmarking for memory, Word Error Rate (WER), and speed in Faster-whisper. -
Support initializing more whisper model args (#807)
-
Small bug fix:
-
New feature from original openai Whisper project:
faster-whisper 1.0.1
faster-whisper 1.0.0
-
Support distil-whisper model (#557)
Robust knowledge distillation of the Whisper model via large-scale pseudo-labelling.
For more detail: https://github.com/huggingface/distil-whisper -
Upgrade ctranslate2 version to 4.0 to support CUDA 12 (#694)
-
Upgrade PyAV version to 11.* to support Python3.12.x (#679)
-
Small bug fixes
-
New improvements from original OpenAI Whisper project
faster-whisper 0.10.1
Fix the broken tag v0.10.0
faster-whisper 0.10.0
- Support "large-v3" model with
- The ability to load
feature_size/num_mels
and other frompreprocessor_config.json
- A new language token for Cantonese (
yue
)
- The ability to load
- Update
CTranslate2
requirement to include the latest version 3.22.0 - Update
tokenizers
requirement to include the latest version 0.15 - Change the hub to fetch models from Systran organization
faster-whisper 0.9.0
- Add function
faster_whisper.available_models()
to list the available model sizes - Add model property
supported_languages
to list the languages accepted by the model - Improve error message for invalid
task
andlanguage
parameters - Update
tokenizers
requirement to include the latest version 0.14
faster-whisper 0.8.0
Expose new transcription options
Some generation parameters that were available in the CTranslate2 API but not exposed in faster-whisper:
repetition_penalty
to penalize the score of previously generated tokens (set > 1 to penalize)no_repeat_ngram_size
to prevent repetitions of ngrams with this size
Some values that were previously hardcoded in the transcription method:
prompt_reset_on_temperature
to configure after which temperature fallback step the prompt with the previous text should be reset (default value is 0.5)
Other changes
- Fix a possible memory leak when decoding audio with PyAV by forcing the garbage collector to run
- Add property
duration_after_vad
in the returnedTranscriptionInfo
object - Add "large" alias for the "large-v2" model
- Log a warning when the model is English-only but the
language
parameter is set to something else
faster-whisper 0.7.1
- Fix a bug related to
no_speech_threshold
: when the threshold was met for a segment, the next 30-second window reused the same encoder output and was also considered as non speech - Improve selection of the final result when all temperature fallbacks failed by returning the result with the best log probability
faster-whisper 0.7.0
Improve word-level timestamps heuristics
Some recent improvements from openai-whisper are ported to faster-whisper:
- Squash long words at window and sentence boundaries (openai/whisper@255887f)
- Improve timestamp heuristics (openai/whisper@f572f21)
Support download of user converted models from the Hugging Face Hub
The WhisperModel
constructor now accepts any repository ID as argument, for example:
model = WhisperModel("username/whisper-large-v2-ct2")
The utility function download_model
has been updated similarly.
Other changes
- Accept an iterable of token IDs for the argument
initial_prompt
(useful to include timestamp tokens in the prompt) - Avoid computing higher temperatures when
no_speech_threshold
is met (same as openai/whisper@e334ff1) - Fix truncated output when using a prefix without disabling timestamps
- Update the minimum required CTranslate2 version to 3.17.0 to include the latest fixes