From 6a5302b9580f0c9bfe5d82375f14b71b3beab8e9 Mon Sep 17 00:00:00 2001 From: Isaac Chung Date: Sat, 14 Feb 2026 17:37:04 +0000 Subject: [PATCH] [MAEB] Add audio task installation instructions to docs Document FFmpeg and transformers>=4.57.6 requirements for users running audio tasks with datasets>=4. The datasets library v4+ uses torchcodec for audio processing which requires FFmpeg to be installed. Fixes #4023 Co-Authored-By: Claude Opus 4.5 --- docs/installation.md | 40 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+) diff --git a/docs/installation.md b/docs/installation.md index 3973adc4e8..944bc9de83 100644 --- a/docs/installation.md +++ b/docs/installation.md @@ -28,6 +28,46 @@ If you want to run certain models implemented within mteb you will often need so If a specific model requires a dependency it will raise an error with the recommended installation. To see full list of available models you can look at the [models overview](./overview/available_models/text.md). +## Audio Tasks + +If you want to run audio tasks, install the audio dependencies: + +=== "pip" + ```bash + pip install mteb[audio] + ``` + +=== "uv" + ```bash + uv add "mteb[audio]" + ``` + +### Additional Requirements for `datasets>=4` + +If you are using `datasets>=4`, you will need to: + +1. **Install FFmpeg**: The `datasets` library version 4+ uses `torchcodec` for audio processing, which requires FFmpeg to be installed on your system. + + === "macOS" + ```bash + brew install ffmpeg + ``` + + === "Ubuntu/Debian" + ```bash + sudo apt-get install ffmpeg + ``` + + === "Windows" + Download from [ffmpeg.org](https://ffmpeg.org/download.html) and add to your PATH. + +2. **Use `transformers>=4.57.6`**: Due to compatibility issues with `datasets>=4`, you need a recent version of transformers: + ```bash + pip install "transformers>=4.57.6" + ``` + +If you are using `datasets<4`, no additional requirements are needed beyond the `mteb[audio]` installation. + ## Migrating to uv (for Contributors) If you're a contributor currently using pip, here's how to migrate to uv for faster dependency management: