-
Notifications
You must be signed in to change notification settings - Fork 310
Nemotron ASR Support for Streaming #1997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 24 commits
Commits
Show all changes
70 commits
Select commit
Hold shift + click to select a range
31d6779
nemotron support
nenad1002 9026781
ONNX 2 good version
nenad1002 8c6f4ed
Nemotron support
nenad1002 9dd6212
Support 4
nenad1002 8b0de45
First stream
nenad1002 0d83168
Overlap support
nenad1002 d2ff912
Nemotron support stream 3
nenad1002 c7ed0c9
Mi fix
nenad1002 b83b84f
Move mel stuff to separate file
nenad1002 ff275b4
Remove mel spectogram
nenad1002 32001f1
Revert non-needed changes
nenad1002 131db0c
Make sure genai_config.json defines model params
nenad1002 b262003
Point to latest extensions
nenad1002 5cc511d
Add tests
nenad1002 5cf0c59
Add a better test
nenad1002 f670d36
Remove text tokenizer and sr to genaiconfig
nenad1002 6e23d87
Remove dead code
nenad1002 fd47344
Abstract streaming ASR class
nenad1002 98d6e54
remove processor
nenad1002 06d05d0
Fix merge conflict
nenad1002 46a166d
Clean more code
nenad1002 8a5e912
Clean up examples
nenad1002 e10086f
Performance optimizations
nenad1002 5eb10ff
More cleaning
nenad1002 89b8bd5
Try removing warning
nenad1002 880143b
Add flag to tests
nenad1002 092d212
fix formatting
nenad1002 67e649c
Resolve Copilot comments
nenad1002 5be81c9
Fix formatting issue
nenad1002 b3c6411
Merge branch 'main' into nebanfic/nemotron-support-stream-3
nenad1002 165037b
Remove soundfile
nenad1002 5097afd
Remove dead tokenzier code
nenad1002 98e81b7
Adjust genai config to our exported models
nenad1002 0a6d87b
Resolve more comments
nenad1002 70f4e23
Avoid memset, memcpy and manual copy on GPU and whenever possible, ri…
nenad1002 4d8a0f5
Add consistency
nenad1002 96dafca
Big improvement - cache locality for frames
nenad1002 2499ab4
Csharp support
nenad1002 51a61c7
Add a check to the factory for StreamingASR
nenad1002 9e28df9
nemotron generator
nenad1002 8a1bef0
remove ProcessChunk from model.h
nenad1002 154b5aa
remove generate_next_tokens()
nenad1002 571f300
Rename processor
nenad1002 f61fc0a
C# sample and remove unnecessary files
nenad1002 8596fc1
Fix all
nenad1002 b762766
more fixes
nenad1002 e2ab1e7
samples change
nenad1002 4059341
Introduce NamedTensors on streaming processor
nenad1002 ca9a9f3
Remove speech section in genai_config
nenad1002 282a9f0
Reverse NativeMethods.cs formatting
nenad1002 dc46428
Some refactoring
nenad1002 7c636ba
Make streaming processor abstract class
nenad1002 33f809b
set_inputs
nenad1002 66ee360
Copilot suggestions
nenad1002 27ce2e5
Examples changes
nenad1002 8723d3b
More comments resolved
nenad1002 96ce812
SubStates
nenad1002 df8cb9b
More changes
nenad1002 c5ed7df
Resolvimg more comments
nenad1002 02c5fde
Mass copy
nenad1002 101113d
Copilot fixes
nenad1002 640e9af
Merge conflict fix
nenad1002 7d023ef
Potential fix for code scanning alert no. 798: Unused local variable
nenad1002 7283dd1
Fix clang
nenad1002 a3f77e4
Run clang
nenad1002 658e8de
fix tests
nenad1002 78b84a2
Resolve comments
nenad1002 ff890a9
Add C++ example
nenad1002 9473ab8
Semicolon on another line
nenad1002 8e683f3
Add C# sample readme
nenad1002 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,99 @@ | ||
| # Copyright (c) Microsoft Corporation. All rights reserved. | ||
| # Licensed under the MIT License. | ||
|
|
||
| import argparse | ||
| import os | ||
| import sys | ||
| import time | ||
| import re | ||
| import numpy as np | ||
| import onnxruntime_genai as og | ||
|
|
||
| SAMPLE_RATE = 16000 | ||
| CHUNK_SAMPLES = 8960 | ||
| CHUNK_DURATION = CHUNK_SAMPLES / SAMPLE_RATE | ||
|
|
||
|
|
||
| def load_audio(audio_path): | ||
| import soundfile as sf | ||
| audio, sr = sf.read(audio_path, dtype="float32") | ||
| if len(audio.shape) > 1: | ||
| audio = audio.mean(axis=1) | ||
| if sr != SAMPLE_RATE: | ||
| import scipy.signal | ||
| num_samples = int(len(audio) * SAMPLE_RATE / sr) | ||
| audio = scipy.signal.resample(audio, num_samples).astype(np.float32) | ||
| return audio | ||
|
|
||
|
|
||
| def load_tokenizer(model_path): | ||
| import sentencepiece as spm | ||
| path = os.path.join(model_path, "tokenizer.model") | ||
|
nenad1002 marked this conversation as resolved.
Outdated
|
||
| if not os.path.exists(path): | ||
| return None | ||
| sp = spm.SentencePieceProcessor() | ||
| sp.Load(path) | ||
| return sp | ||
|
|
||
|
|
||
| def parse_token_ids(raw_text): | ||
| return [int(m.group(1)) for m in re.finditer(r'<(\d+)>', raw_text)] | ||
|
nenad1002 marked this conversation as resolved.
Outdated
|
||
|
|
||
|
|
||
| def simulate_microphone(model_path, audio_path): | ||
| audio = load_audio(audio_path) | ||
| duration = len(audio) / SAMPLE_RATE | ||
| num_chunks = (len(audio) + CHUNK_SAMPLES - 1) // CHUNK_SAMPLES | ||
| print(f"Audio: {duration:.1f}s | {num_chunks} chunks × {CHUNK_DURATION*1000:.0f}ms") | ||
|
|
||
| config = og.Config(model_path) | ||
| model = og.Model(config) | ||
| sp = load_tokenizer(model_path) | ||
| asr = og.StreamingASR(model) | ||
|
nenad1002 marked this conversation as resolved.
Outdated
|
||
|
|
||
| print("-" * 60) | ||
| stream_start = time.time() | ||
|
|
||
| for i in range(0, len(audio), CHUNK_SAMPLES): | ||
| chunk = audio[i:i + CHUNK_SAMPLES] | ||
| if len(chunk) < CHUNK_SAMPLES: | ||
| chunk = np.pad(chunk, (0, CHUNK_SAMPLES - len(chunk))) | ||
| chunk = chunk.astype(np.float32) | ||
| raw_text = asr.transcribe_chunk(chunk) | ||
|
nenad1002 marked this conversation as resolved.
Outdated
|
||
| if raw_text: | ||
| print(raw_text, end="", flush=True) | ||
|
|
||
| for _ in range(4): | ||
| silence = np.zeros(CHUNK_SAMPLES, dtype=np.float32) | ||
|
nenad1002 marked this conversation as resolved.
Outdated
|
||
| raw_text = asr.transcribe_chunk(silence) | ||
| if raw_text: | ||
| print(raw_text, end="", flush=True) | ||
|
|
||
| total_wall = time.time() - stream_start | ||
|
|
||
| full_raw = asr.get_transcript() | ||
|
nenad1002 marked this conversation as resolved.
Outdated
|
||
| if sp: | ||
| all_ids = parse_token_ids(full_raw) | ||
| final_text = sp.Decode(all_ids) if all_ids else full_raw | ||
| else: | ||
| final_text = full_raw | ||
|
|
||
| print(f"\n{'=' * 60}") | ||
| print(f" {final_text.strip()}") | ||
| print(f"{'=' * 60}") | ||
| print(f" Audio: {duration:.2f}s | Wall: {total_wall:.2f}s | RTF: {duration/total_wall:.2f}x") | ||
|
|
||
|
|
||
| def main(): | ||
| parser = argparse.ArgumentParser() | ||
| parser.add_argument("--model_path", type=str, required=True) | ||
| parser.add_argument("--audio_file", type=str, required=True) | ||
| args = parser.parse_args() | ||
| if not os.path.exists(args.audio_file): | ||
| print(f"Error: {args.audio_file} not found") | ||
| sys.exit(1) | ||
| simulate_microphone(args.model_path, args.audio_file) | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| main() | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.