Skip to content

Conversation

@schoennenbeck
Copy link
Contributor

For a lot of tokenizers in Tokenizer.apply_chat_template with continue_final_message=True we get a "ValueError: substring not found" if the final message starts or ends in some whitespace. Here is some example code that exhibits the issue:

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4")
messages=[
    {"role": "user", "content": "What is the capital of France?"},
    {"role": "assistant", "content": "Great question! The capital of France is "}
]
tokenizer.apply_chat_template(
    messages, add_generation_prompt=False, 
    continue_final_message=True
)

This is due to the fact that the apply_chat_template-method looks for the full final message in the rendered chat but many modern chat templates (in particular the Llama3.1-chat-template) actually trim messages before rendering.

This PR strips the final message before looking it up in the rendered string. This fixes the issue. However, this means that the continuation can now happen at a slightly different spot than the user intended. I believe this is the best way to address this issue but I would also be open to simply raise a more descriptive error in case this happens so the user can strip the last message themselves to handle this.

Either way I think the current failure mode is not ideal.

If required I could also add a test for this behaviour.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Potential reviewers (based on affected area)

@schoennenbeck
Copy link
Contributor Author

Alternative or possibly more robust way would be to only do the stripping if the full final message string cannot be found in the rendered chat as is.

Copy link
Member

@Rocketknight1 Rocketknight1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@schoennenbeck this is a really good fix, thank you! I think we don't need the more robust solution - because of how whitespace is tokenized, I think it will be quite hard to end your message in multiple spaces and still get a good continuation.

(We're having some CI issues, but hopefully we'll resolve them later today and then I can merge this)

@Rocketknight1 Rocketknight1 merged commit f2846ad into huggingface:main Oct 17, 2024
@Rocketknight1
Copy link
Member

@schoennenbeck merged! Thanks again for a clean and helpful PR!

BernardZach pushed a commit to BernardZach/transformers that referenced this pull request Dec 5, 2024
huggingface#34214)

* Strip final message

* Do full strip instead of rstrip

* Retrigger CI

---------

Co-authored-by: Matt <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants