Skip to content

Nemotron ASR Support for Streaming#1997

Merged
kunal-vaishnavi merged 70 commits into
mainfrom
nebanfic/nemotron-support-stream-3
Mar 17, 2026
Merged

Nemotron ASR Support for Streaming#1997
kunal-vaishnavi merged 70 commits into
mainfrom
nebanfic/nemotron-support-stream-3

Conversation

@nenad1002
Copy link
Copy Markdown
Contributor

@nenad1002 nenad1002 commented Mar 2, 2026

image image

Comment thread test/python/test_onnxruntime_genai_api.py Fixed
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 28 out of 28 changed files in this pull request and generated 9 comments.


You can also share your feedback on Copilot code review. Take the survey.

Comment thread src/generators.cpp
Comment thread src/models/nemotron_speech.cpp
Comment thread src/models/nemotron_speech.cpp
Comment thread src/models/nemotron_streaming_processor.cpp
Comment thread test/c_api_tests.cpp
Comment thread test/python/test_onnxruntime_genai_api.py
Comment thread test/python/test_onnxruntime_genai_e2e.py
Comment thread src/python/python.cpp
Comment thread src/models/nemotron_speech.cpp Outdated
Comment thread examples/python/nemotron_speech.py Fixed
Comment thread examples/python/nemotron_speech.py Fixed
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Comment thread examples/python/nemotron_speech.py Fixed
rui-ren
rui-ren previously approved these changes Mar 13, 2026
@nenad1002
Copy link
Copy Markdown
Contributor Author

@kunal-vaishnavi can we sign this off?

Comment thread src/ort_genai_c.h Outdated
Comment thread nuget.config Outdated
Comment thread CMakeLists.txt Outdated
Comment thread src/generators.cpp
Comment thread src/generators.cpp Outdated
Comment thread src/config.cpp
@kunal-vaishnavi
Copy link
Copy Markdown
Contributor

@kunal-vaishnavi can we sign this off?

There are a few issues with the ONNX models that may change the input/output structure.

Decoder

image

From inspecting the graph, target_length_orig and target_length are not used at all. This is an identity operation to go from target_length_orig to target_length. The input and the output can be removed from the graph.

Joiner

image image

This is the end of the encoder and the decoder. The decoder contains a Squeeze op with axes = [1] before the final Transpose op. The Transpose op has perm = [1, 2, 0] so this means that its input (i.e. the output from the Squeeze op) must be 3D then. Because the axes are specified on the Squeeze op, the operation is only applied on those axes. This implies that the input to the Squeeze op (i.e. the output from the LSTM op) is 4D. Therefore, the Squeeze op is removing a dimension of size 1 to go from [A, 1, B, C] to [A, B, C].

image

Here is the joiner model. The joiner is receiving 3D inputs from both the encoder and decoder and returns 4D outputs. The decoder input is 3D and the Unsqueeze op in the decoder path has axes = [1]. This means that the Unsqueeze op is going from [A, B, C] to [A, 1, B, C].

That shape of [A, 1, B, C] is already obtained in the decoder. If the Squeeze op is removed and the Transpose op is changed to handle a 4D output, then the output of the decoder can be 4D. The decoder input to the joiner model can be 4D and then the Unsqueeze op in the decoder path can be removed. These changes can eliminate two ops across two ONNX models.

If we want to keep the input dimensions for the joiner model as the same, then we can move the Unsqueeze op in the encoder path to the output of the encoder. Then, the joiner model will take in two 4D inputs and produce one 4D output.

@nenad1002
Copy link
Copy Markdown
Contributor Author

nenad1002 commented Mar 13, 2026

@kunal-vaishnavi can we sign this off?

There are a few issues with the ONNX models that may change the input/output structure.

Decoder

image From inspecting the graph, `target_length_orig` and `target_length` are not used at all. This is an identity operation to go from `target_length_orig` to `target_length`. The input and the output can be removed from the graph.

Joiner

image image
This is the end of the encoder and the decoder. The decoder contains a Squeeze op with axes = [1] before the final Transpose op. The Transpose op has perm = [1, 2, 0] so this means that its input (i.e. the output from the Squeeze op) must be 3D then. Because the axes are specified on the Squeeze op, the operation is only applied on those axes. This implies that the input to the Squeeze op (i.e. the output from the LSTM op) is 4D. Therefore, the Squeeze op is removing a dimension of size 1 to go from [A, 1, B, C] to [A, B, C].

image Here is the joiner model. The joiner is receiving 3D inputs from both the encoder and decoder and returns 4D outputs. The decoder input is 3D and the `Unsqueeze` op in the decoder path has `axes = [1]`. This means that the `Unsqueeze` op is going from `[A, B, C]` to `[A, 1, B, C]`.

That shape of [A, 1, B, C] is already obtained in the decoder. If the Squeeze op is removed and the Transpose op is changed to handle a 4D output, then the output of the decoder can be 4D. The decoder input to the joiner model can be 4D and then the Unsqueeze op in the decoder path can be removed. These changes can eliminate two ops across two ONNX models.

If we want to keep the input dimensions for the joiner model as the same, then we can move the Unsqueeze op in the encoder path to the output of the encoder. Then, the joiner model will take in two 4D inputs and produce one 4D output.

Right, we are aware for the length, still waiting for the newest model, this is not related to this PR. I assume this PR will deal with ONNX model finalization and if there is any model signature change:
#2010

I can wait for the team's changes though if you prefer.

baijumeswani
baijumeswani previously approved these changes Mar 16, 2026
@kunal-vaishnavi kunal-vaishnavi enabled auto-merge (squash) March 16, 2026 22:12
@kunal-vaishnavi kunal-vaishnavi merged commit dc3f30b into main Mar 17, 2026
16 of 17 checks passed
@kunal-vaishnavi kunal-vaishnavi deleted the nebanfic/nemotron-support-stream-3 branch March 17, 2026 07:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants