QVAC-13536 Fixing benchmark workflow errors (Emdeddings + LLM)#690
Merged
gianni-cor merged 53 commits intoMar 5, 2026
Conversation
Try #1. Adding tokenizer proxy to provide vocab size.
Try #2. More fixes and logs.
Try #3. Limit device to only cpu or gpu.
This reverts commit a461e69.
This reverts commit 9951195.
Fix/benchmark pipeline errors
Fix/benchmark pipeline errors
Fix/benchmark pipeline errors
Improve error handling on the server. Added retry in case of context …
Make retries self-adjustable
Adding some more checks and limiting the datasets temporarily
Test: trying to narrow down the error
Exclude failing datasets from embed benchmark
Changing bench model for LLM
57ef730 to
f9e3d83
Compare
gianni-cor
requested changes
Mar 4, 2026
gianni-cor
requested changes
Mar 5, 2026
Contributor
|
/review |
gianni-cor
approved these changes
Mar 5, 2026
Contributor
Tier-based Approval Status |
dev-nid
approved these changes
Mar 5, 2026
Contributor
|
/review |
Proletter
pushed a commit
that referenced
this pull request
May 24, 2026
* Try #1. Adding tokenizer proxy to provide vocab size. * Try #2. More fixes and logs. * Try #3. Limit device to only cpu or gpu. * Revert "Try #2. More fixes and logs." This reverts commit a461e69. * Revert "Try #1. Adding tokenizer proxy to provide vocab size." This reverts commit 9951195. * Fixing pipeline logging * Add more logs * Fixing bench logging * Add more error handling and logging * Improve error handling on the server. Added retry in case of context overflow. * Make retries self-adjustable * Adding some more checks and limiting the datasets temporarily * Test: trying to narrow down the error * Exclude failing datasets from embed benchmark * Clean up the code * Changing bench model for LLM * Try #1. Adding tokenizer proxy to provide vocab size. * Try #2. More fixes and logs. * Try #3. Limit device to only cpu or gpu. * Revert "Try #2. More fixes and logs." This reverts commit a461e69. * Revert "Try #1. Adding tokenizer proxy to provide vocab size." This reverts commit 9951195. * Fixing pipeline logging * Add more logs * Fixing bench logging * Add more error handling and logging * Improve error handling on the server. Added retry in case of context overflow. * Make retries self-adjustable * Adding some more checks and limiting the datasets temporarily * Test: trying to narrow down the error * Exclude failing datasets from embed benchmark * Clean up the code * Changing bench model for LLM * Minor fixes for clarity * Removing unused vars * Removing unused imports * Removing unused python deps --------- Co-authored-by: gianni <gianfranco.cordella@tether.io>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Note
Embed datasets TRECCOVID and FiQA2018 are still failing with a generic message with no meaningful server logs. This suggests that a crash may be happening somewhere in the CPP layer (segfault?).
For now, we are disabling these datasets.
🎯 What problem does this PR solve?
📝 How does it solve it?
HTTP 422,code: CONTEXT_OVERFLOW, retryable details including sequence/context metadata).cpuexplicit, otherwisegpu).ArguAna,NFCorpus,SciFact,SCIDOCS,🧪 How was it tested?