Add C# binding for GetNextTokens by kunal-vaishnavi · Pull Request #1865 · microsoft/onnxruntime-genai

kunal-vaishnavi · 2025-11-12T06:42:00Z

Description

This PR adds GetNextTokens as an API to the C# bindings.

Motivation and Context

Currently in C#, the user has to get the entire sequence first and then access the last element of the sequence to extract the next token (i.e. generator.GetSequence(0)[^1]). In Python, the user just has to call generator.get_next_tokens()[0] to get a list of the next tokens and only access the first token in that list.

With this PR, the generation loop in the C# and Python examples now looks and runs similarly.

## Version 2 ### Description This PR updates the examples to show how EOS token id detection is handled with ONNX Runtime GenAI when generating tokens. With the addition of the [C# binding for GetNextTokens()](#1865), all of the published examples now cover the cases listed below in version 1 of this PR. Previously, the earlier PR mentioned different variations of the generation loop and all of the variations had an issue. This PR also introduces new APIs for tracking token count and querying the generator params: - `Generator.TokenCount` - `Params.GetSearchNumber` - `Params.GetSearchBool` Additionally, this PR adds some missing Tokenizer APIs for Objective-C. - `Tokenizer.GetBosTokenId` - `Tokenizer.GetEosTokenIds` - `Tokenizer.GetPadTokenId` ### Motivation and Context This PR is a follow-up to the issue fixed in [an earlier PR](#1849). These APIs can be used by users to distinguish between the cases that `Generator.IsDone()` covers. For example: ```cpp bool hit_eos = generator->IsDone() && (generator->TokenCount() < static_cast<int>(params->GetSearchNumber("max_length"))); bool hit_max_length = generator->IsDone() && (generator->TokenCount() == static_cast<int>(params->GetSearchNumber("max_length"))); ``` ## Version 1 ### Description This PR updates how EOS token id detection is handled with ONNX Runtime GenAI when generating tokens. A new API called `Generator.HitEOS()` is introduced to detect whether an EOS token id has been generated. Another API called `Generator.HitMaxLength()` is also introduced to detect whether the max length has been hit before the generation loop has completed. ### Motivation and Context This PR is a follow-up to the issue fixed in [an earlier PR](#1849). The earlier PR mentions different variations of the generation loop but all of the variations have an issue. There are two scenarios for terminating the generation loop: 1) hitting the EOS token id and completing the generation loop or 2) hitting the max length before the generation loop has completed. However, none of the variations adequately cover the two scenarios for terminating the generation loop. #### 1. Original Generation Loop ``` while not IsDone(): GenerateToken() GetLastToken() PrintLastToken() ``` Consider scenario 1 with this loop. After `GenerateToken()` produces the EOS token id, `GetLastToken()` will attempt to retrieve that token. However, ORT GenAI does not append the EOS token id to the list of sequences returned to the user (see the earlier PR for why). Instead, the second-to-last token will still be the last token in the list of sequences. Thus, `GetLastToken()` and `PrintLastToken()` will retrieve and again print the last token that the user saw. #### 2. Return Early Generation Loop ``` while not IsDone(): GenerateToken() if IsDone(): break GetLastToken() PrintLastToken() ``` Consider scenario 2 with this loop. After `GenerateToken()` produces a token and the max length has been reached, the generator's state is marked as done. Then `IsDone()` will be true and the newest token won't be retrieved and printed since the loop is exited early. #### 3. Infinite Generation Loop ``` while True: GenerateToken() if IsDone(): break GetLastToken() PrintLastToken() ``` Consider scenario 2 with this loop. The same issue as the prior loop still applies. `GenerateToken()` will generate all of the tokens but once the max length is hit, `IsDone()` is true and the last token won't be retrieved and printed. #### Conclusion The reason that none of these generation loop variants work is because `IsDone()` currently covers both scenarios in one API and does not distinguish between them. One check needs to be in place in the condition of the while loop so that the loop continues, and another check needs to be after token generation to decide whether retrieving the last token should be done or not. #### Solution To fix this, a new API called `Generator.HitEOS()` is introduced. It returns `true` when the EOS token id is generated. The generation loop should be modified to the following. ``` while not IsDone(): GenerateToken() if HitEOS(): break GetLastToken() PrintLastToken() ``` If scenario 1 occurs in this loop, `HitEOS()` is `true` and the generation loop will exit early. If scenario 2 occurs in this loop, `HitEOS()` is `false` when the max length is reached. The last generated token can still be retrieved and printed. Then because the generator's state is done, `IsDone()` is `true` and the generation loop ends. Here is a full end-to-end example demonstrating its usage. ```py import onnxruntime_genai as og model = og.Model("/path/to/model/folder") tokenizer = og.Tokenizer(model) tokenizer_stream = tokenizer.create_stream() params = og.GeneratorParams(model) params.set_search_options(max_length=25) generator = og.Generator(model, params) tokens = tokenizer.encode("<|system|>You are a helpful AI assistant.<|end|><|user|>What color is the sky?<|end|><|assistant|>") print(f"Prompt: {len(tokens)}") generator.append_tokens(tokens) count = 0 while not generator.is_done(): generator.generate_next_token() count += 1 if generator.hit_eos(): break new_token = generator.get_next_tokens()[0] print(tokenizer_stream.decode(new_token), end="", flush=True) print() print(f"Generated: {count}") print(f"Total: {len(tokens) + count}") ``` #### Scenario 1 Before with loop version 1: ``` Prompt: 18 The color of the sky can vary depending on the viewing conditions and the presence of particles and moisture in the atmosphere. On a clear day, the sky appears blue due to Rayleigh scattering, where the atmosphere scatters sunlight in all directions and blue wavelengths are scattered more than other colors because they travel as shorter, smaller waves. This scattering causes the sky to appear blue to an observer on the ground. However, the sky can also appear various shades of blue, gray, or even take on vibrant hues like red or orange just before or just after sunrise or sunset, due to the scattering of sunlight by particles and moisture in the atmosphere.. Generated: 128 Total: 146 ``` After with `generator.hit_eos()`: ``` Prompt: 18 The color of the sky can vary depending on the viewing conditions and the presence of particles and moisture in the atmosphere. On a clear day, the sky appears blue due to Rayleigh scattering, where the atmosphere scatters sunlight in all directions and blue wavelengths are scattered more than other colors because they travel as shorter, smaller waves. This scattering causes the sky to appear blue to an observer on the ground. However, the sky can also appear various shades of blue, gray, or even take on vibrant hues like red or orange just before or just after sunrise or sunset, due to the scattering of sunlight by particles and moisture in the atmosphere. Generated: 128 Total: 146 ``` #### Scenario 2 Before with loop version 2: ``` Prompt: 18 The color of the sky can Generated: 7 Total: 25 ``` After with `generator.hit_eos()`: ``` Prompt: 18 The color of the sky can vary Generated: 7 Total: 25 ```

kunal-vaishnavi added 4 commits November 12, 2025 02:12

Add C# API for GetNextTokens

73e7ef9

Add native methods call

e4e2523

Remove extra newline

a56003c

Use GetNextTokens in C# examples

8f1f2ef

hanbitmyths previously approved these changes Nov 12, 2025

View reviewed changes

kunal-vaishnavi enabled auto-merge (squash) November 12, 2025 06:55

Merge branch 'main' into kvaishnavi/get-next-token

5dde034

kunal-vaishnavi dismissed hanbitmyths’s stale review via 5dde034 November 12, 2025 19:18

kunal-vaishnavi requested a review from hanbitmyths November 12, 2025 19:24

hanbitmyths approved these changes Nov 12, 2025

View reviewed changes

kunal-vaishnavi merged commit 9d7fe70 into main Nov 12, 2025
15 checks passed

kunal-vaishnavi deleted the kvaishnavi/get-next-token branch November 12, 2025 21:15

kunal-vaishnavi mentioned this pull request Jan 21, 2026

Update handling EOS token id detection #1925

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add C# binding for GetNextTokens#1865

Add C# binding for GetNextTokens#1865
kunal-vaishnavi merged 5 commits into
mainfrom
kvaishnavi/get-next-token

kunal-vaishnavi commented Nov 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kunal-vaishnavi commented Nov 12, 2025

Description

Motivation and Context

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants