tokenization: pool should report unrecoverable failures#210
Conversation
Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR enhances the tokenization pool to properly handle and report unrecoverable initialization failures. Previously, misconfigured tokenizers would fail silently, causing tests to hang indefinitely while waiting on internal task queues. The changes introduce a FatalInitError wrapper to distinguish fatal initialization errors from transient failures, ensuring they're immediately reported to callers rather than being indefinitely retried.
Key changes:
- Introduced
FatalInitErrortype to represent unrecoverable tokenizer initialization errors - Modified
Pool#Tokenize()to return errors to callers via the newerrfield intokenizationResponse - Updated worker loop to forget tasks with fatal errors instead of rate-limiting them for retry
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| pkg/tokenization/pool.go | Added FatalInitError type, error reporting in processTask(), and error handling in worker loop; updated Tokenize() signature to return errors |
| pkg/tokenization/tokenizer.go | Wrapped tokenizer initialization errors with FatalInitError in Encode() method |
| pkg/tokenization/pool_test.go | Added TestPool_RunIntegrationFailed to verify error handling for misconfigured tokenizers; updated benchmark to handle new error return |
| pkg/kvcache/indexer.go | Updated GetPodScores() to handle and propagate tokenization errors |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| err error | ||
| } | ||
|
|
||
| func (fe FatalInitError) Error() string { |
There was a problem hiding this comment.
The Error method could panic if fe.err is nil. While this may not happen in normal operation, defensive programming suggests adding a nil check to prevent potential panics.
| func (fe FatalInitError) Error() string { | |
| func (fe FatalInitError) Error() string { | |
| if fe.err == nil { | |
| return "fatal init error: <nil>" | |
| } |
There was a problem hiding this comment.
I don't think this will happen in practice... 🤔
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>
|
#192 definitely solves the issue with loading, raising an error at instantiation time instead of delaying it to task processing time. I think this issue can be considered solved. |
while working on llm-d/llm-d-router#505 I noticed that a misconfigured Tokenizers might report an error, but the error would not bubble up, causing the test to hang indefinitely, waiting on the internal task queue.
In this PR:
FatalInitErrorwrapper, representing a nonrecoverable error (e.g. initialization error of the tokenizer)errfield totokenizationResponseon error:
task.ResultCh != nilwe send atask { err }Pool#processTask()in addition to checkingerr != nil, checks the type of the error; if it is unrecoverable, the task is forgotten instead of being rate limited.Pool#Tokenize()now returns([]uint32, error), handled inGetPodScores()