Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Moonshine ASR #1099

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Add support for Moonshine ASR #1099

wants to merge 2 commits into from

Conversation

xenova
Copy link
Collaborator

@xenova xenova commented Dec 14, 2024

This PR adds support for Moonshine, a family of speech-to-text models optimized for fast and accurate automatic speech recognition (ASR) on resource-constrained devices. They are well-suited to real-time, on-device applications like live transcription and voice command recognition, and will be perfect for in-browser usage. This PR is using a dev branch of transformers by @eustlb (huggingface/transformers#34784), and a dev branch of Optimum for ONNX conversion.

Example usage:

With pipeline API:

import { pipeline } from "@huggingface/transformers";

const transcriber = await pipeline("automatic-speech-recognition", "onnx-community/moonshine-tiny-ONNX");
const output = await transcriber("https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav");
console.log(output);
// { text: 'And so my fellow Americans ask not what your country can do for you as what you can do for your country.' }

Without pipeline API:

import { MoonshineForConditionalGeneration, AutoProcessor, read_audio } from "@huggingface/transformers";

// Load model and processor
const model_id = "onnx-community/moonshine-tiny-ONNX";
const model = await MoonshineForConditionalGeneration.from_pretrained(model_id, {
    dtype: "q4",
});
const processor = await AutoProcessor.from_pretrained(model_id);

// Load audio and prepare inputs
const audio = await read_audio("https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav", 16000);
const inputs = await processor(audio);

// Generate outputs
const outputs = await model.generate({ ...inputs, max_new_tokens: 100 });

// Decode outputs
const decoded = processor.batch_decode(outputs, { skip_special_tokens: true });
console.log(decoded[0]);
// And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.

closes #990

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@xenova
Copy link
Collaborator Author

xenova commented Dec 14, 2024

Model works with WebGPU too, and I've adapted this real-time demo to work with model. Significantly faster than the whisper version. 🔥

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for moonshine ASR models
2 participants