tokenization: pool should report unrecoverable failures by evacchi · Pull Request #210 · llm-d/llm-d-kv-cache

evacchi · 2025-12-11T13:14:17Z

while working on llm-d/llm-d-router#505 I noticed that a misconfigured Tokenizers might report an error, but the error would not bubble up, causing the test to hang indefinitely, waiting on the internal task queue.

In this PR:

we add a FatalInitError wrapper, representing a nonrecoverable error (e.g. initialization error of the tokenizer)
we add an err field to tokenizationResponse

on error:

if task.ResultCh != nil we send a task { err }
Pool#processTask() in addition to checking err != nil, checks the type of the error; if it is unrecoverable, the task is forgotten instead of being rate limited.
Pool#Tokenize() now returns ([]uint32, error), handled in GetPodScores()

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

Copilot

Pull request overview

This PR enhances the tokenization pool to properly handle and report unrecoverable initialization failures. Previously, misconfigured tokenizers would fail silently, causing tests to hang indefinitely while waiting on internal task queues. The changes introduce a FatalInitError wrapper to distinguish fatal initialization errors from transient failures, ensuring they're immediately reported to callers rather than being indefinitely retried.

Key changes:

Introduced FatalInitError type to represent unrecoverable tokenizer initialization errors
Modified Pool#Tokenize() to return errors to callers via the new err field in tokenizationResponse
Updated worker loop to forget tasks with fatal errors instead of rate-limiting them for retry

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
pkg/tokenization/pool.go	Added `FatalInitError` type, error reporting in `processTask()`, and error handling in worker loop; updated `Tokenize()` signature to return errors
pkg/tokenization/tokenizer.go	Wrapped tokenizer initialization errors with `FatalInitError` in `Encode()` method
pkg/tokenization/pool_test.go	Added `TestPool_RunIntegrationFailed` to verify error handling for misconfigured tokenizers; updated benchmark to handle new error return
pkg/kvcache/indexer.go	Updated `GetPodScores()` to handle and propagate tokenization errors

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-11T13:19:11Z

+	err error
+}
+
+func (fe FatalInitError) Error() string {


The Error method could panic if fe.err is nil. While this may not happen in normal operation, defensive programming suggests adding a nil check to prevent potential panics.

Suggested change

func (fe FatalInitError) Error() string {

func (fe FatalInitError) Error() string {

if fe.err == nil {

return "fatal init error: <nil>"

}

I don't think this will happen in practice... 🤔

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

vMaroon · 2025-12-13T12:34:24Z

Hi @evacchi - thank you for starting this.

I think a part of this PR will have a conflict with the ongoing #192 - perhaps we can focus this one on reporting encoding errors, and leave loading for after #192 if needed?

evacchi · 2025-12-29T10:05:18Z

#192 definitely solves the issue with loading, raising an error at instantiation time instead of delaying it to task processing time. I think this issue can be considered solved.

…iles (llm-d#210)

tokenization: pool should report unrecoverable failures

41cd06f

Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

Copilot AI review requested due to automatic review settings December 11, 2025 13:14

evacchi requested review from dannyharnik, elevran, kfirtoledo and vMaroon as code owners December 11, 2025 13:14

Copilot started reviewing on behalf of evacchi December 11, 2025 13:14 View session

evacchi mentioned this pull request Dec 11, 2025

test: add precise_prefix_cache_test llm-d/llm-d-router#505

Merged

Copilot AI reviewed Dec 11, 2025

View reviewed changes

Apply suggestions from code review

4bbd294

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>

evacchi closed this Dec 29, 2025

evacchi deleted the test-tokenization-pool-error-reporting branch December 29, 2025 10:05

guygir pushed a commit to guygir/llm-d-kv-cache-manager that referenced this pull request Apr 20, 2026

fix: rename pd-profile-handler to use underscore to match other src f…

9cba539

…iles (llm-d#210)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tokenization: pool should report unrecoverable failures#210

tokenization: pool should report unrecoverable failures#210
evacchi wants to merge 2 commits into
llm-d:mainfrom
evacchi:test-tokenization-pool-error-reporting

evacchi commented Dec 11, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Dec 11, 2025

Uh oh!

evacchi Dec 12, 2025

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

vMaroon commented Dec 13, 2025

Uh oh!

evacchi commented Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

evacchi commented Dec 11, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

evacchi Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

vMaroon commented Dec 13, 2025

Uh oh!

evacchi commented Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants