Real-time Icelandic Speech Recognition powered by Whisper AI
WhisperSST.is is a 100% local web application that provides real-time Icelandic speech recognition using a fine-tuned version of OpenAI's Whisper model. This tool runs entirely on your machine - no cloud services or internet connection required for processing (only needed for initial model download). Your audio data never leaves your computer, ensuring complete privacy and security.
Note: This application is currently in development, so bugs are expected.
- ๐ค Record and transcribe audio directly from your microphone
- ๐ Upload and process audio files (WAV, MP3, M4A, FLAC)
- ๐ 100% local processing - no cloud or internet needed
- ๐ Fast, efficient transcription
- ๐ Instant audio playback
- ๐ฑ User-friendly interface
- ๐ฎ๐ธ Specialized for Icelandic language
- ๐ป Runs on your hardware (CPU/GPU)
- ๐ Timestamped transcriptions
- ๐พ Export to TXT and SRT formats
- ๐๏ธ Live transcription feature for real-time speech-to-text conversion
- ๐ Support for more audio formats
- ๐ง Improved accuracy through model fine-tuning
- ๐ Batch processing for multiple files
- ๐ Custom vocabulary support
- ๐ฅ Speaker diarization
- โฑ๏ธ Word-level timestamps
- ๐ Export to more formats (DOCX, PDF)
- ๐ฎ๐ธ Icelandic translation of the user interface
- ๐ต Add sample audio files for testing and demonstration
- ๐งช Added test audio file located at
tests/demo/test_vedur.mp3
- Python 3.8+
- CUDA-capable GPU (recommended, but CPU works too)
- Microphone access
- Internet connection (only for initial model download)
- ~4GB disk space for models
- ๐ 100% local processing - your audio never leaves your computer
- ๐ซ No cloud services or API calls
- ๐ป All transcription happens on your machine
- ๐ No internet needed after model download
- ๐ฏ No external dependencies for core functionality
sudo apt-get update
sudo apt-get install portaudio19-dev python3-pyaudio
brew install portaudio
The required libraries are typically included with Python packages.
- Clone the repository:
git clone https://github.com/Magnussmari/whisperSSTis.git
cd whisperSSTis
- Create and activate a virtual environment:
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
- Install Python dependencies:
pip install -r requirements.txt
- Start the application:
python launcher.py
For developers who want to contribute or modify the application:
- Set up your development environment:
# Clone the repository
git clone https://github.com/Magnussmari/whisperSSTis.git
cd whisperSSTis
# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
- Project Structure:
app.py
: Main Streamlit applicationlauncher.py
: GUI launcher for the applicationwhisperSSTis/
: Core module containing audio and transcription logicsetup_dependencies.sh/bat
: System dependency installation scriptsTODO.md
: Current development tasks and future plans
- Running in Development Mode:
# Run with launcher GUI
python launcher.py
# Run Streamlit directly
streamlit run app.py
- Development Guidelines:
- Follow PEP 8 style guidelines
- Add docstrings for new functions
- Update TODO.md for new features/fixes
- Test changes with different audio inputs
To run the unit tests:
# Install test dependencies
pip install pytest pytest-mock
# Run tests from project root
pytest
-
Application won't start:
- Run the setup script for your platform
- Make sure you have extracted all files from the downloaded package
- Try running as administrator
- Check your antivirus isn't blocking the application
-
No audio input:
- Run the setup script to install audio dependencies
- Check your microphone is properly connected
- Allow microphone access in your system settings
- Select the correct input device in the application
-
Slow transcription:
- A GPU is recommended but not required
- First launch may be slow while loading the model
- Try adjusting chunk size for better performance
- Models are cached locally for faster subsequent runs
-
PortAudio Error:
- Run
setup_dependencies.sh
(macOS/Linux) orsetup_dependencies.bat
(Windows) - Windows: Install Visual C++ Redistributable if prompted
- Linux: Run
sudo apt-get install portaudio19-dev python3-pyaudio
- macOS: Run
brew install portaudio
- Run
-
Missing Dependencies:
- Run the setup script for your platform
- Check the error message for specific missing packages
- For Windows, ensure Visual C++ Redistributable is installed
- For Linux, install required system packages using your package manager
For more help, check the issues page or create a new issue.
- Frontend: Streamlit (local web interface)
- Speech Recognition: Fine-tuned Whisper model (runs locally)
- Audio Processing: PortAudio, PyAudio
- ML Framework: PyTorch, Transformers
- Privacy: All processing done locally on your machine
- Magnus Smari Smarason
- Original Whisper Model: OpenAI
- Icelandic Fine-tuned Model: Carlos Daniel Hernandez Mena
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions, issues, and feature requests are welcome! Feel free to check the issues page.
The application relies on a pre-trained model from Hugging Face. While Hugging Face is generally reputable, there's always a risk with using third-party models.
The use of ffmpeg-python
and pydub
introduces a dependency on FFmpeg, which is a complex library with a history of vulnerabilities.
Although used only for styling in app.py
, this flag could be a vulnerability if user input is ever incorporated into the HTML without sanitization.
It is unknown if the libraries are using a vulnerable version.
Note: When in doubt, use the application offline to minimize security risks.
Regularly check for vulnerabilities in the listed dependencies, especially ffmpeg-python
, pydub
, transformers
, and streamlit
. Update to newer versions if vulnerabilities are found.
If possible, implement a mechanism to verify the integrity of the downloaded model (e.g., by checking its hash) before loading it.
Ensure that unsafe_allow_html=True
is only used for trusted content (like static styles) and never for user-provided data. If user data needs to be displayed, use proper sanitization techniques.
Although the app is local, it's good practice to validate user inputs. For example, check the file type and size of uploaded audio files.
Ensure that temporary files are always deleted, even in case of errors. The current code seems to handle this correctly, but it's worth double-checking.
Developed with โค๏ธ for the Icelandic language community