fix: mistral embedding regression fix by dougyster · Pull Request #21913 · sgl-project/sglang

dougyster · 2026-04-02T06:03:25Z

Motivation

Embedding models using LlamaTokenizerFast (e.g. intfloat/e5-mistral-7b-instruct) regressed in cosine similarity from ~1.0 to ~0.33 after the transformers v5.3.0 upgrade (#17784). The regression was identified by bisecting between v0.5.9 (good) and v0.5.10rc0 (bad), pinning the first bad commit to d1e95af ("Upgrade transformers==5.3.0 (#17784)").

Root cause

The transformers v5 upgrade introduced _fix_v5_add_bos_eos_token() to restore add_bos_token/add_eos_token flags that v5 strips when a tokenizer.json post-processor is present. The fix correctly handles add_bos_token for generative models like DeepSeek-V3, but incorrectly also restored add_eos_token for fast tokenizers.

For fast tokenizers (PreTrainedTokenizerFast), add_eos_token was never applied via the Python-level attribute in transformers v4 — EOS behavior was defined in tokenizer.json's post-processor. In v5, both sglang and HF vanilla end up with add_eos_token=False. By restoring add_eos_token=True only in sglang (but not in the HF reference), the tokenizers diverge:

sglang (with fix): EOS token appended → last-token pooling selects the EOS embedding
HF reference: no EOS → last-token pooling selects the last content token embedding
Result: cosine similarity ~0.33 instead of ~1.0

The model's tokenizer_config.json has "add_eos_token": true because Mistral uses EOS as a generation stop token — but for LlamaTokenizerFast, this was always handled by the post-processor, not the Python attribute. The attribute existed in v4 but was a no-op for fast tokenizers.

Modifications

In _fix_v5_add_bos_eos_token(), skip restoring add_eos_token when the loaded tokenizer is a PreTrainedTokenizerFast instance. This matches v5 vanilla behavior (which also doesn't restore it) and keeps sglang consistent with the HF reference for fast tokenizers. add_bos_token restoration is unaffected.

# fast tokenizers never applied add_eos_token via the Python attribute in v4
# — restoring it diverges from HF vanilla v5 and breaks embedding model accuracy
if attr == "add_eos_token" and isinstance(tokenizer, PreTrainedTokenizerFast):
    config_val = _V4_DEFAULTS["add_eos_token"]  # False

Accuracy Tests

Tested on intfloat/e5-mistral-7b-instruct with a full clean reinstall (sglang + sglang-kernel uninstalled and reinstalled from scratch):

Test set	Similarities (before fix)	Similarities (after fix)
Set 1	0.334, 0.387, 0.431	1.000, 1.000, 1.000
Set 2	0.334, 0.387, 0.431	1.000, 1.000, 1.000
Set 3	0.334, 0.387, 0.431	1.000, 1.000, 1.000

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review and Merge Process

Ping Merge Oncalls to start the process. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

gemini-code-assist · 2026-04-02T06:03:29Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

JustinTong0323

LGTM. Root cause analysis is clear and the fix is minimal and correct — for fast tokenizers, should remain to match HF reference behavior. Good bisection work.

JustinTong0323

LGTM. Root cause analysis is clear and the fix is minimal and correct. For fast tokenizers, add_eos_token should remain False to match HF reference behavior. Good bisection work.

Kangyan-Zhou · 2026-04-02T07:13:26Z

/tag-and-rerun-ci

Kangyan-Zhou added the high priority label Apr 2, 2026

Kangyan-Zhou requested a review from JustinTong0323 April 2, 2026 06:18

JustinTong0323 approved these changes Apr 2, 2026

View reviewed changes

github-actions bot added the run-ci label Apr 2, 2026

embedding regression fix

619e344

dougyster force-pushed the embedding-regression-fix branch from 091dce5 to 619e344 Compare April 3, 2026 05:38

Kangyan-Zhou and others added 2 commits April 3, 2026 12:11

Merge branch 'main' into embedding-regression-fix

4ecaf30

Merge branch 'main' into embedding-regression-fix

96861b1

Fridge003 merged commit a94c380 into main Apr 4, 2026
360 of 425 checks passed

Fridge003 deleted the embedding-regression-fix branch April 4, 2026 07:11

sundar24295s pushed a commit to sundar24295s/sglang that referenced this pull request Apr 4, 2026

fix: mistral embedding regression fix (sgl-project#21913)

135bea2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: mistral embedding regression fix#21913

fix: mistral embedding regression fix#21913
Fridge003 merged 3 commits intomainfrom
embedding-regression-fix

dougyster commented Apr 2, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Apr 2, 2026

Uh oh!

JustinTong0323 left a comment

Uh oh!

JustinTong0323 left a comment

Uh oh!

Kangyan-Zhou commented Apr 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

dougyster commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Root cause

Modifications

Accuracy Tests

Checklist

Review and Merge Process

Uh oh!

gemini-code-assist bot commented Apr 2, 2026

Uh oh!

JustinTong0323 left a comment

Choose a reason for hiding this comment

Uh oh!

JustinTong0323 left a comment

Choose a reason for hiding this comment

Uh oh!

Kangyan-Zhou commented Apr 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dougyster commented Apr 2, 2026 •

edited

Loading