Skip to content

Add C# binding for GetNextTokens#1865

Merged
kunal-vaishnavi merged 5 commits into
mainfrom
kvaishnavi/get-next-token
Nov 12, 2025
Merged

Add C# binding for GetNextTokens#1865
kunal-vaishnavi merged 5 commits into
mainfrom
kvaishnavi/get-next-token

Conversation

@kunal-vaishnavi
Copy link
Copy Markdown
Contributor

Description

This PR adds GetNextTokens as an API to the C# bindings.

Motivation and Context

Currently in C#, the user has to get the entire sequence first and then access the last element of the sequence to extract the next token (i.e. generator.GetSequence(0)[^1]). In Python, the user just has to call generator.get_next_tokens()[0] to get a list of the next tokens and only access the first token in that list.

With this PR, the generation loop in the C# and Python examples now looks and runs similarly.

hanbitmyths
hanbitmyths previously approved these changes Nov 12, 2025
@kunal-vaishnavi kunal-vaishnavi enabled auto-merge (squash) November 12, 2025 06:55
@kunal-vaishnavi kunal-vaishnavi merged commit 9d7fe70 into main Nov 12, 2025
15 checks passed
@kunal-vaishnavi kunal-vaishnavi deleted the kvaishnavi/get-next-token branch November 12, 2025 21:15
baijumeswani pushed a commit that referenced this pull request Jan 27, 2026
## Version 2

### Description

This PR updates the examples to show how EOS token id detection is
handled with ONNX Runtime GenAI when generating tokens. With the
addition of the [C# binding for
GetNextTokens()](#1865),
all of the published examples now cover the cases listed below in
version 1 of this PR. Previously, the earlier PR mentioned different
variations of the generation loop and all of the variations had an
issue.

This PR also introduces new APIs for tracking token count and querying
the generator params:
- `Generator.TokenCount`
- `Params.GetSearchNumber`
- `Params.GetSearchBool`

Additionally, this PR adds some missing Tokenizer APIs for Objective-C.
- `Tokenizer.GetBosTokenId`
- `Tokenizer.GetEosTokenIds`
- `Tokenizer.GetPadTokenId`

### Motivation and Context

This PR is a follow-up to the issue fixed in [an earlier
PR](#1849). These
APIs can be used by users to distinguish between the cases that
`Generator.IsDone()` covers.

For example:

```cpp
bool hit_eos = generator->IsDone() && (generator->TokenCount() < static_cast<int>(params->GetSearchNumber("max_length")));
bool hit_max_length = generator->IsDone() && (generator->TokenCount() == static_cast<int>(params->GetSearchNumber("max_length")));
```

## Version 1

### Description

This PR updates how EOS token id detection is handled with ONNX Runtime
GenAI when generating tokens. A new API called `Generator.HitEOS()` is
introduced to detect whether an EOS token id has been generated. Another
API called `Generator.HitMaxLength()` is also introduced to detect
whether the max length has been hit before the generation loop has
completed.

### Motivation and Context

This PR is a follow-up to the issue fixed in [an earlier
PR](#1849). The
earlier PR mentions different variations of the generation loop but all
of the variations have an issue.

There are two scenarios for terminating the generation loop: 1) hitting
the EOS token id and completing the generation loop or 2) hitting the
max length before the generation loop has completed. However, none of
the variations adequately cover the two scenarios for terminating the
generation loop.

#### 1. Original Generation Loop

```
while not IsDone():
    GenerateToken()
    GetLastToken()
    PrintLastToken()
```

Consider scenario 1 with this loop. After `GenerateToken()` produces the
EOS token id, `GetLastToken()` will attempt to retrieve that token.
However, ORT GenAI does not append the EOS token id to the list of
sequences returned to the user (see the earlier PR for why). Instead,
the second-to-last token will still be the last token in the list of
sequences. Thus, `GetLastToken()` and `PrintLastToken()` will retrieve
and again print the last token that the user saw.

#### 2. Return Early Generation Loop

```
while not IsDone():
    GenerateToken()
    if IsDone():
        break
    GetLastToken()
    PrintLastToken()
```

Consider scenario 2 with this loop. After `GenerateToken()` produces a
token and the max length has been reached, the generator's state is
marked as done. Then `IsDone()` will be true and the newest token won't
be retrieved and printed since the loop is exited early.

#### 3. Infinite Generation Loop

```
while True:
    GenerateToken()
    if IsDone():
        break
    GetLastToken()
    PrintLastToken()
```

Consider scenario 2 with this loop. The same issue as the prior loop
still applies. `GenerateToken()` will generate all of the tokens but
once the max length is hit, `IsDone()` is true and the last token won't
be retrieved and printed.

#### Conclusion

The reason that none of these generation loop variants work is because
`IsDone()` currently covers both scenarios in one API and does not
distinguish between them. One check needs to be in place in the
condition of the while loop so that the loop continues, and another
check needs to be after token generation to decide whether retrieving
the last token should be done or not.

#### Solution

To fix this, a new API called `Generator.HitEOS()` is introduced. It
returns `true` when the EOS token id is generated. The generation loop
should be modified to the following.

```
while not IsDone():
    GenerateToken()
    if HitEOS():
        break
    GetLastToken()
    PrintLastToken()
```

If scenario 1 occurs in this loop, `HitEOS()` is `true` and the
generation loop will exit early. If scenario 2 occurs in this loop,
`HitEOS()` is `false` when the max length is reached. The last generated
token can still be retrieved and printed. Then because the generator's
state is done, `IsDone()` is `true` and the generation loop ends.

Here is a full end-to-end example demonstrating its usage.

```py
import onnxruntime_genai as og

model = og.Model("/path/to/model/folder")
tokenizer = og.Tokenizer(model)
tokenizer_stream = tokenizer.create_stream()

params = og.GeneratorParams(model)
params.set_search_options(max_length=25)

generator = og.Generator(model, params)

tokens = tokenizer.encode("<|system|>You are a helpful AI assistant.<|end|><|user|>What color is the sky?<|end|><|assistant|>")
print(f"Prompt: {len(tokens)}")
generator.append_tokens(tokens)

count = 0
while not generator.is_done():
    generator.generate_next_token()
    count += 1
    if generator.hit_eos():
        break

    new_token = generator.get_next_tokens()[0]
    print(tokenizer_stream.decode(new_token), end="", flush=True)

print()
print(f"Generated: {count}")
print(f"Total: {len(tokens) + count}")
```

#### Scenario 1

Before with loop version 1:
```
Prompt: 18
The color of the sky can vary depending on the viewing conditions and the presence of particles and moisture in the atmosphere. On a clear day, the sky appears blue due to Rayleigh scattering, where the atmosphere scatters sunlight in all directions and blue wavelengths are scattered more than other colors because they travel as shorter, smaller waves. This scattering causes the sky to appear blue to an observer on the ground. However, the sky can also appear various shades of blue, gray, or even take on vibrant hues like red or orange just before or just after sunrise or sunset, due to the scattering of sunlight by particles and moisture in the atmosphere..
Generated: 128
Total: 146
```

After with `generator.hit_eos()`:

```
Prompt: 18
The color of the sky can vary depending on the viewing conditions and the presence of particles and moisture in the atmosphere. On a clear day, the sky appears blue due to Rayleigh scattering, where the atmosphere scatters sunlight in all directions and blue wavelengths are scattered more than other colors because they travel as shorter, smaller waves. This scattering causes the sky to appear blue to an observer on the ground. However, the sky can also appear various shades of blue, gray, or even take on vibrant hues like red or orange just before or just after sunrise or sunset, due to the scattering of sunlight by particles and moisture in the atmosphere.
Generated: 128
Total: 146
```

#### Scenario 2

Before with loop version 2:
```
Prompt: 18
The color of the sky can
Generated: 7
Total: 25
```

After with `generator.hit_eos()`:
```
Prompt: 18
The color of the sky can vary
Generated: 7
Total: 25
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants