Skip to content

An app exploring two different methods of speech recognition/transcription.

Notifications You must be signed in to change notification settings

ichbtrv/svelte-openai-whisper-speech-recognition-api

Repository files navigation

Svelte Voice Notes Transcription

A modern web application built with SvelteKit that demonstrates two different approaches to speech-to-text transcription: browser-native Speech Recognition API and OpenAI's Whisper API. This app allows users to record, transcribe, and manage voice notes using either transcription method.

Voice Notes App Interface

Demo

Watch the demo video

Features

  • Dual transcription methods:
    • Browser's native Speech Recognition API for real-time transcription
    • OpenAI's Whisper API for high-accuracy transcription
  • Voice recording controls (start, stop, pause, resume)
  • Note management system (create, save, load, delete)
  • Real-time transcription display
  • Persistent storage of notes using localStorage

Architecture

Core Components

  1. Speech Handlers:

    • SpeechHandler: Manages browser-native speech recognition
    • SpeechHandlerOpenAi: Handles Whisper API integration
  2. State:

    • VoiceNotesHandler: Manages note storage and retrieval
    • Svelte context API for state sharing
  3. UI Components:

    • Recorder: Controls for voice recording
    • CreateDialog: Note creation interface
    • LoadNoteDialog: Note loading interface

Implementation Details

Browser Speech Recognition

// Uses the Web Speech API
const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
this.recognition = new SpeechRecognition();
this.recognition.continuous = true;
this.recognition.interimResults = true;

OpenAI Whisper Integration

// Handles audio chunks and sends to Whisper API
async transcribeAudio(audioBlob: Blob): Promise<string> {
    const file = new File([audioBlob], 'recording.webm', { type: MIME_TYPE });
    const formData = new FormData();
    formData.append('file', file);

    const response = await fetch('/api/transcribe', {
        method: 'POST',
        body: formData
    });
}

Key Features Implementation

Recording Controls

The app provides a comprehensive set of recording controls:

  • Start/Stop recording
  • Pause/Resume recording
  • Real-time transcription display
  • Error handling and user feedback

Note Management

Notes are managed through the VoiceNotesHandler class:

  • Create new notes with titles and transcriptions
  • Update existing notes
  • Delete notes
  • Load and display saved notes
  • Persistent storage using localStorage

Usage

  1. Starting a Recording:

    • Click the "Start Recording" button
    • Grant microphone permissions when prompted
    • Speak into your microphone
  2. Managing Recordings:

    • Use the pause/resume button to temporarily stop recording
    • Click "Stop Recording" to finish
    • Save the transcription as a note
  3. Managing Notes:

    • Create new notes with the "+" button
    • Load existing notes using the load dialog
    • Edit transcriptions directly in the textarea
    • Copy transcriptions to clipboard
    • Delete unwanted notes

Setup

  1. Clone the repository
  2. Install dependencies:
    pnpm install
  3. Set up environment variables:
    OPENAI_API_KEY=your_api_key_here
  4. Start the development server:
    pnpm dev

Configuration

The app includes configurable parameters:

  • MIN_CHUNK_SIZE: Minimum size for audio chunks
  • DEFAULT_INTERVAL: Default recording interval
  • DEFAULT_CONFIDENCE: Default confidence threshold for transcription

About

An app exploring two different methods of speech recognition/transcription.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published