Skip to content

Conversation

AnkushRathour
Copy link

What is this Python project?

audiomaker is a Python package for local text-to-speech generation, designed specifically to handle long-form input by automatically splitting it into manageable chunks and seamlessly merging the resulting audio files.

Unlike many other TTS tools, audiomaker is:

  • 🆓 Free and open source
  • 💻 Fully offline/local — requires no API keys or cloud dependencies
  • 🔊 Capable of generating hours-long audio from large .txt files (e.g., books, scripts)

This makes it ideal for creating:

  • Audiobooks and narrated blog posts
  • Podcast episodes and interviews
  • Long-form video voiceovers or lectures

PyPI: https://pypi.org/project/audiomaker/
GitHub: https://github.com/AnkushRathour/AudioMaker

🔧 Features

  • Uses Microsoft Edge TTS (edge-tts) for high-quality neural voices, supporting multiple languages and voice styles
  • Automatically chunks large input texts based on configurable parameters (e.g., chunk size, pause duration)
  • Merges generated audio chunks into a single smooth audio file without audible gaps
  • Provides both CLI and Python API interfaces for flexible usage
  • Supports output in common audio formats like .mp3 and .wav
  • Includes built-in progress tracking and error handling for robust long-form synthesis

Successfully tested on very large .txt files, producing over 4 hours of continuous audio output

What's the difference between this Python project and similar ones?

  • Unlike cloud-based TTS services (e.g., Google TTS, AWS Polly), audiomaker is fully offline and local with no API keys or cloud billing
  • Compared to gTTS, it requires no internet connection and supports longer inputs reliably
  • Uses Microsoft Edge TTS with advanced neural voices, unlike basic TTS engines like pyttsx3
  • Automates chunking and merging to handle arbitrarily long texts without manual intervention
  • Provides an end-to-end pipeline minimizing user effort for long-form audio generation

Anyone who agrees with this pull request may submit an Approve review.

[`audiomaker`](https://pypi.org/project/audiomaker/) is a Python package for **local text-to-speech generation**, designed to handle **long-form input** by automatically splitting it into chunks and merging the resulting audio seamlessly.

Unlike many other TTS tools, `audiomaker` is:
- 🆓 Free and open source
- 💻 Fully offline/local (no API keys or usage limits)
- 🔊 Capable of generating **hours-long audio** from large `.txt` files

This makes it ideal for creating audiobooks, narrated content, or long-form podcasts.

**PyPI:** https://pypi.org/project/audiomaker/  
**GitHub:** https://github.com/AnkushRathour/AudioMaker
@Avalux-07
Copy link

  • Audio
    • audioread - Cross-library (GStreamer + Core Audio + MAD + FFmpeg) audio decoding.
    • dejavu - Audio fingerprinting and recognition.
    • matchering - A library for automated reference audio mastering.
    • mingus - An advanced music theory and notation package with MIDI file and playback support.
    • pyAudioAnalysis - Audio feature extraction, classification, segmentation and applications.
    • pydub - Manipulate audio with a simple and easy high level interface.
    • TimeSide - Open web audio processing framework.
  • Metadata
    • beets - A music library manager and MusicBrainz tagger.
    • eyeD3 - A tool for working with audio files, specifically MP3 files containing ID3 metadata.
    • mutagen - A Python module to handle audio metadata.
    • tinytag - A library for reading music meta data of MP3, OGG, FLAC and Wave files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants