Voice Activity Detection API 🎤

A Flask-based REST API that detects human voice activity in audio files. Perfect for those moments when you need to know if someone's actually speaking or if it's just your neighbour's cat knocking things over.

Features

Audio classification into three categories:
- Blank (No sound)
- Background noise only
- Human voice with background
Continuous learning through user feedback
JWT-based authentication
Support for WAV, MP3, and AAC audio formats
Noise reduction preprocessing
Feature extraction using MFCCs, spectral centroids, and chroma features

Technical Architecture

The system follows a modular architecture with these core components:

Feature Extraction: Uses librosa to extract meaningful audio features (MFCCs, spectral centroids, chroma)
Model: RandomForest classifier trained on the extracted features
API: Flask-based REST endpoints for training, classification, and feedback
Authentication: JWT-based token system for secure access

Prerequisites

Python 3.8+
FFmpeg (for audio processing)

Installation

Clone the repository

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```

Environment Setup

Set your JWT secret key in create_jwt_token.py:

SECRET_KEY = "your_secure_secret_key"

API Endpoints

Training Endpoint

POST /train

Train the model with new audio files and labels
Can also use accumulated feedback data if no new files are provided

Classification Endpoint

POST /upload-audio

Upload an audio file for classification
Returns classification result and confidence score

Feedback Endpoint

POST /feedback

Submit feedback for improving model accuracy
Stores audio file and correct label for future retraining

Request Examples

Training with new files:

curl -X POST \
  http://localhost:5000/train \
  -H 'Authorization: Bearer YOUR_JWT_TOKEN' \
  -F '[email protected]' \
  -F '[email protected]' \
  -F 'labels=1' \
  -F 'labels=2'

Classifying an audio file:

curl -X POST \
  http://localhost:5000/upload-audio \
  -H 'Authorization: Bearer YOUR_JWT_TOKEN' \
  -F '[email protected]'

Model Details

The system uses a Random Forest Classifier with:

100 estimators
Feature set combining MFCCs, spectral centroids, and chroma features
Noise reduction preprocessing using the noisereduce library

Project Structure

├── main.py                 # Flask application and API endpoints
├── voice_activity_detector.py  # Core ML functionality
├── create_jwt_token.py     # JWT token generation
├── requirements.txt        # Project dependencies
└── feedback_audio/        # Directory for feedback audio files

Error Handling

The API implements comprehensive error handling for:

Invalid file types
Missing JWT tokens
Model not found scenarios
Invalid feedback formats
File processing errors

Errors are logged to errors.log for debugging and monitoring.

Best Practices

Always validate audio files before processing
Use error handling for robust production deployment
Regularly retrain the model with feedback data
Monitor the feedback queue size
Back up the trained model regularly

Future Improvements

Add model versioning
Implement batch processing for large audio files
Add confidence scores to classifications
Implement real-time streaming classification
Add more granular voice activity categories

Security Notes

Change the default SECRET_KEY
Implement rate limiting
Add request size limitations
Implement proper file cleanup
Consider adding API key rotation

Contributing

Feel free to open issues and pull requests. Just make sure your code doesn't classify heavy metal as "background noise" - we've been there, it wasn't pretty.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Note: This software depends on other packages that may be licensed under different terms. The MIT license above applies only to the original code in this repository. Notably, this project uses FFmpeg which is licensed under the LGPL/GPL license.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
create_jwt_token.py		create_jwt_token.py
errors.log		errors.log
main.py		main.py
request_format.txt		request_format.txt
requirements.txt		requirements.txt
voice_activity_detector.py		voice_activity_detector.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Voice Activity Detection API 🎤

Features

Technical Architecture

Prerequisites

Installation

Environment Setup

API Endpoints

Training Endpoint

Classification Endpoint

Feedback Endpoint

Request Examples

Model Details

Project Structure

Error Handling

Best Practices

Future Improvements

Security Notes

Contributing

License

About

Uh oh!

Releases

Packages

Languages

License

roshanmishra86/VoiceActivityDetection

Folders and files

Latest commit

History

Repository files navigation

Voice Activity Detection API 🎤

Features

Technical Architecture

Prerequisites

Installation

Environment Setup

API Endpoints

Training Endpoint

Classification Endpoint

Feedback Endpoint

Request Examples

Model Details

Project Structure

Error Handling

Best Practices

Future Improvements

Security Notes

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages