A simple text-to-speech CLI tool using the Kokoro speech synthesis engine.
Note: This repository also includes an experimental version (
chatterbox-reader.py) in teh#chatterboxbranch, using streaming version of Chatterbox instead ofkokoro. However, this version is currently very slow and not suitable for practical use at the moment. The#moshibranch contains a version using Moshi TTS, even more experimental and not recommended for use.
- High-quality text-to-speech synthesis
- Multiple voice options with different languages and genders
- Automatic model download and caching
- Interactive mode with command history
- Speed control
- Support for reading from files, URLs, or standard input
- Install uv for Python package management.
- Clone or download this repository.
git clone https://github.com/yourusername/kokoro-reader.git
cd kokoro-readeruv run kokoro-reader.py [options]The required model files will be automatically downloaded to ~/.cache/kokoro-reader on first run.
| Option | Description |
|---|---|
-f, --file FILE |
Input text file |
-u, --url URL |
URL to extract text from |
-v, --voice VOICE |
Voice to use (default: af_bella) |
-s, --speed SPEED |
Speech speed (default: 0.8) |
-l, --lang LANG |
Language (default: en-us) |
-i, --interactive |
Run in interactive mode |
Note: If neither -f/--file nor -u/--url is provided, text is read from standard input (stdin).
Read text from a file:
uv run kokoro-reader.py -f mytext.txtRead text from a URL (extracts main content from web pages):
uv run kokoro-reader.py -u https://example.com/articleUse a specific voice:
uv run kokoro-reader.py -v bf_emma -f mytext.txtAdjust speech speed:
uv run kokoro-reader.py -s 1.2 -f mytext.txtRead from stdin:
echo "Hello, world!" | uv run kokoro-reader.pyRun in interactive mode:
uv run kokoro-reader.py -iIn interactive mode, you can:
- Enter text directly (end with
/EOT) - Read text from files or URLs without exiting the application
- Use arrow keys to navigate input history
- Edit input with left/right arrow keys
- Use commands to change voice, language and speed settings
| Command | Description |
|---|---|
TEXT |
Enter text directly (must end with /EOT) |
/f PATH |
Read text from file |
/u URL |
Read text from URL |
/v VOICE |
Change voice |
/v? |
Show available voices with grade C or better |
/l LANG |
Change language |
/s SPEED |
Change speed |
/q |
Quit |
You can view all available high-quality voices by using the /v? command in interactive mode or checking the table below. The list includes American English, British English, and Italian voices, organized by gender and sorted by quality.
For additional languages and voice options, see the official documentation: https://huggingface.co/hexgrad/Kokoro-82M/blob/main/VOICES.md
Voices are graded from A (best) to F (worst). Only voices with grade C or better are recommended.
| Voice Name | Gender | Grade | Description |
|---|---|---|---|
| af_heart | Female | A | Best overall voice quality |
| af_bella | Female | A- | Default voice, excellent quality |
| af_nicole | Female | B- | Good quality |
| am_fenrir | Male | C+ | Best male voice for American English |
| Voice Name | Gender | Grade |
|---|---|---|
| bf_emma | Female | B- |
| bf_isabella | Female | C |
| bm_fable | Male | C |
| Voice Name | Gender | Grade |
|---|---|---|
| if_sara | Female | C |
| im_nicola | Male | C |
Voice names follow a specific naming convention:
- First letter: language (a=American, b=British, i=Italian)
- Second letter: gender (f=female, m=male)
- Followed by underscore and a name (e.g., af_bella = American Female Bella)
For complete voice listings and documentation, see https://huggingface.co/hexgrad/Kokoro-82M/blob/main/VOICES.md
- Python 3.12 or higher
- kokoro-onnx - Core speech synthesis library
- sounddevice - Audio output
- requests - Network requests for downloading models
- tqdm - Progress bars for downloads
- prompt_toolkit - Interactive shell interface
- trafilatura - Web content extraction for URL reading
The repository includes chatterbox-reader.py, which is an experimental implementation using the streaming version of Chatterbox (https://github.com/davidbrowne17/chatterbox-streaming). This version offers:
- Different voice generation approach based on Chatterbox-streaming
- Emotion exaggeration controls
- Similar interface to the main Kokoro reader
However, be aware that:
- It is significantly slower than the main Kokoro implementation
- Currently not suitable for practical daily use
- Limited voice options (no voice selection parameter)
To try the experimental version:
uv run chatterbox-reader.py [options]- Kokoro Speech Synthesis Engine - The underlying TTS library
- Kokoro-82M Model - Source of voice data and voice quality information
- Chatterbox Streaming - Used in the experimental version
- Trafilatura - Used for extracting readable content from web pages
- This project was developed with assistance from Claude 3.7 Sonnet (via Copilot) using the Zed Editor's Agentic feature