fix: add missing bos_token to example templates#11432
fix: add missing bos_token to example templates#11432toslunar wants to merge 12 commits intovllm-project:mainfrom
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
Signed-off-by: Toshiki Kataoka <kataoka@pelements.jp>
43f2c32 to
1a7f9e3
Compare
|
Thanks for fixing, can you add some unit tests to ensure that the BOS token isn't added twice in this case? |
|
Added a test. Is it OK to place it to a new directory ( |
Signed-off-by: Toshiki Kataoka <kataoka@pelements.jp>
13f56c1 to
263ab89
Compare
|
I would just name the directory |
Signed-off-by: Toshiki Kataoka <kataoka@pelements.jp>
Signed-off-by: Toshiki Kataoka <kataoka@pelements.jp>
Signed-off-by: Toshiki Kataoka <kataoka@pelements.jp>
|
Please update the failing tests to consider the BOS token. |
|
Oops, this is unexpected to me. I'll fix the enumeration of Jinja files. https://buildkite.com/vllm/fastcheck/builds/10539#0193fc47-aa1b-4af4-8c53-a27c0af8deb1 |
Signed-off-by: Toshiki Kataoka <kataoka@pelements.jp>
Signed-off-by: Toshiki Kataoka <kataoka@pelements.jp>
|
I think you also need to update the entrypoints tests to consider the BOS token. |
|
Actually now that I think more about it... wouldn't it be better to instead update |
Signed-off-by: Toshiki Kataoka <kataoka@pelements.jp>
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Toshiki Kataoka <kataoka@pelements.jp>
Signed-off-by: Toshiki Kataoka <kataoka@pelements.jp>
|
I suppose this PR could be incremental and it only resolves the inconsistency among example templates.
Sorry. I didn't get the concrete suggestion. Do you mean making
|
I mean that we can remove the BOS token before passing the prompt text into |
BREAKING CHANGE: chat_template usually needs BOS now. Signed-off-by: Toshiki Kataoka <kataoka@pelements.jp>
|
Let me run tests with |
Signed-off-by: Toshiki Kataoka <kataoka@pelements.jp>
|
It seems the current unittest doesn't have opinion on BOS in My concerns are what algorithm to decide the need of BOS for backward compatibility and how to notify the choice to users. I'd ask for comments. My random thoughts are below. For users who trust the HF chat behavior, I'd like to see at least a warning message if vLLM encodes inputs differently than For users who trust vLLM's existing behavior, fallback BOS is necessary and I'd like to be notified on release notes. I suppose it's too noisy to emit a warning message if the fallback occurs and emit another warning message otherwise. In another experience of mine, I had to rerun a bunch of API chat generation because I naively updated the vLLM version and the change by #4688 was almost silent. |
|
Personally, I prefer aligning with HF's behavior since that's what most people are used to. If we find that BOS would be missing using the original code, we can emit a warning + release notes as you stated. @robertgshaw2-neuralmagic @njhill any thoughts on this? To unify the tokenization code, I think we should actually tokenize the prompt inside |
@toslunar To clarify, are you talking about |
|
Hi, I would appreciate if there would be at least This inconsistency between Ideally I'd stick to Andrej's suggestions to strictly disable every "convenience" shortcuts that HF tokenizers can do. In this case |
|
This pull request has merge conflicts that must be resolved before it can be |
|
This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you! |
|
This pull request has been automatically closed due to inactivity. Please feel free to reopen if you intend to continue working on it. Thank you! |
This PR updates chat template examples to align with the latest (>=0.4.3)
/v1/chat/completionsspec on BOS.Some chat templates don't contain
bos_token, especially when it's written before v0.4.3. According to the excellent investigation #9519, the chat serving does not add BOS again, while awkwardness remains in the offline chat interface (to be fixed).