Skip to content

olibuijr/whisperSSTis

ย 
ย 

Repository files navigation

๐ŸŽ™๏ธ Norรฐlenski hreimurinn

Real-time Icelandic Speech Recognition powered by Whisper AI

๐ŸŒŸ Overview

WhisperSST.is is a 100% local web application that provides real-time Icelandic speech recognition using a fine-tuned version of OpenAI's Whisper model. This tool runs entirely on your machine - no cloud services or internet connection required for processing (only needed for initial model download). Your audio data never leaves your computer, ensuring complete privacy and security.

Note: This application is currently in development, so bugs are expected.

โœจ Features

  • ๐ŸŽค Record and transcribe audio directly from your microphone
  • ๐Ÿ“ Upload and process audio files (WAV, MP3, M4A, FLAC)
  • ๐Ÿ”’ 100% local processing - no cloud or internet needed
  • ๐Ÿš€ Fast, efficient transcription
  • ๐Ÿ”Š Instant audio playback
  • ๐Ÿ“ฑ User-friendly interface
  • ๐Ÿ‡ฎ๐Ÿ‡ธ Specialized for Icelandic language
  • ๐Ÿ’ป Runs on your hardware (CPU/GPU)
  • ๐Ÿ“ Timestamped transcriptions
  • ๐Ÿ’พ Export to TXT and SRT formats

๐Ÿš€ Future Development

  • ๐ŸŽ™๏ธ Live transcription feature for real-time speech-to-text conversion
  • ๐Ÿ“Š Support for more audio formats
  • ๐Ÿง  Improved accuracy through model fine-tuning
  • ๐Ÿ“š Batch processing for multiple files
  • ๐Ÿ“– Custom vocabulary support
  • ๐Ÿ‘ฅ Speaker diarization
  • โฑ๏ธ Word-level timestamps
  • ๐Ÿ“„ Export to more formats (DOCX, PDF)
  • ๐Ÿ‡ฎ๐Ÿ‡ธ Icelandic translation of the user interface
  • ๐ŸŽต Add sample audio files for testing and demonstration
  • ๐Ÿงช Added test audio file located at tests/demo/test_vedur.mp3

๐Ÿ› ๏ธ Setup Instructions

Prerequisites

  • Python 3.8+
  • CUDA-capable GPU (recommended, but CPU works too)
  • Microphone access
  • Internet connection (only for initial model download)
  • ~4GB disk space for models

Privacy & Security

  • ๐Ÿ”’ 100% local processing - your audio never leaves your computer
  • ๐Ÿšซ No cloud services or API calls
  • ๐Ÿ’ป All transcription happens on your machine
  • ๐Ÿ” No internet needed after model download
  • ๐ŸŽฏ No external dependencies for core functionality

System Dependencies

Ubuntu/Debian

sudo apt-get update
sudo apt-get install portaudio19-dev python3-pyaudio

macOS

brew install portaudio

Windows

The required libraries are typically included with Python packages.

Installation

  1. Clone the repository:
git clone https://github.com/Magnussmari/whisperSSTis.git
cd whisperSSTis
  1. Create and activate a virtual environment:
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
  1. Install Python dependencies:
pip install -r requirements.txt
  1. Start the application:
python launcher.py

Development Setup

For developers who want to contribute or modify the application:

  1. Set up your development environment:
# Clone the repository
git clone https://github.com/Magnussmari/whisperSSTis.git
cd whisperSSTis

# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt
  1. Project Structure:
  • app.py: Main Streamlit application
  • launcher.py: GUI launcher for the application
  • whisperSSTis/: Core module containing audio and transcription logic
  • setup_dependencies.sh/bat: System dependency installation scripts
  • TODO.md: Current development tasks and future plans
  1. Running in Development Mode:
# Run with launcher GUI
python launcher.py

# Run Streamlit directly
streamlit run app.py
  1. Development Guidelines:
  • Follow PEP 8 style guidelines
  • Add docstrings for new functions
  • Update TODO.md for new features/fixes
  • Test changes with different audio inputs

Running Tests

To run the unit tests:

# Install test dependencies
pip install pytest pytest-mock

# Run tests from project root
pytest

Troubleshooting

Common Issues

  • Application won't start:

    • Run the setup script for your platform
    • Make sure you have extracted all files from the downloaded package
    • Try running as administrator
    • Check your antivirus isn't blocking the application
  • No audio input:

    • Run the setup script to install audio dependencies
    • Check your microphone is properly connected
    • Allow microphone access in your system settings
    • Select the correct input device in the application
  • Slow transcription:

    • A GPU is recommended but not required
    • First launch may be slow while loading the model
    • Try adjusting chunk size for better performance
    • Models are cached locally for faster subsequent runs
  • PortAudio Error:

    • Run setup_dependencies.sh (macOS/Linux) or setup_dependencies.bat (Windows)
    • Windows: Install Visual C++ Redistributable if prompted
    • Linux: Run sudo apt-get install portaudio19-dev python3-pyaudio
    • macOS: Run brew install portaudio
  • Missing Dependencies:

    • Run the setup script for your platform
    • Check the error message for specific missing packages
    • For Windows, ensure Visual C++ Redistributable is installed
    • For Linux, install required system packages using your package manager

For more help, check the issues page or create a new issue.

๐Ÿ’ป Technical Details

  • Frontend: Streamlit (local web interface)
  • Speech Recognition: Fine-tuned Whisper model (runs locally)
  • Audio Processing: PortAudio, PyAudio
  • ML Framework: PyTorch, Transformers
  • Privacy: All processing done locally on your machine

๐Ÿ‘ฅ Credits

Developer

  • Magnus Smari Smarason

Model Credits

Technologies

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿค Contributing

Contributions, issues, and feature requests are welcome! Feel free to check the issues page.

๐Ÿ”’ Security Considerations

External Model

The application relies on a pre-trained model from Hugging Face. While Hugging Face is generally reputable, there's always a risk with using third-party models.

FFmpeg

The use of ffmpeg-python and pydub introduces a dependency on FFmpeg, which is a complex library with a history of vulnerabilities.

unsafe_allow_html=True

Although used only for styling in app.py, this flag could be a vulnerability if user input is ever incorporated into the HTML without sanitization.

Library Versions

It is unknown if the libraries are using a vulnerable version.

Note: When in doubt, use the application offline to minimize security risks.

๐Ÿ”ง Recommendations

Monitor for Vulnerabilities

Regularly check for vulnerabilities in the listed dependencies, especially ffmpeg-python, pydub, transformers, and streamlit. Update to newer versions if vulnerabilities are found.

Consider Model Verification

If possible, implement a mechanism to verify the integrity of the downloaded model (e.g., by checking its hash) before loading it.

Review unsafe_allow_html Usage

Ensure that unsafe_allow_html=True is only used for trusted content (like static styles) and never for user-provided data. If user data needs to be displayed, use proper sanitization techniques.

Input Validation

Although the app is local, it's good practice to validate user inputs. For example, check the file type and size of uploaded audio files.

Error Handling

Ensure that temporary files are always deleted, even in case of errors. The current code seems to handle this correctly, but it's worth double-checking.

Developed with โค๏ธ for the Icelandic language community

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 95.5%
  • Shell 2.7%
  • Batchfile 1.8%