Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Format prompts Using Chat Templates in SequenceGeneratorAdapter #987

Open
lapp0 opened this issue Jun 19, 2024 · 0 comments
Open

Format prompts Using Chat Templates in SequenceGeneratorAdapter #987

lapp0 opened this issue Jun 19, 2024 · 0 comments

Comments

@lapp0
Copy link
Contributor

lapp0 commented Jun 19, 2024

Related: #756

What behavior of the library made you think about the improvement?

Currently when using outlines.generate, chat templates aren't applied by default. It's awkward and unintuitive to structure your prompts as chat templates. For example, a well structured input for a llama-3 model might look like

generator = outlines.generate.json(...)

my_prompt = """<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nProvide me JSON Data<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"""
generator(my_prompt)

I'd prefer

generator("Provide me JSON Data")

Why We Should Apply Chat Templates by Default

Without the application of chat templates, the model emulates the continuation of a monologue. Where-as chat template format generally follows a query-response structure.

No Chat Template
>>> output = model.generate(**tokenizer("What is 1 + 1?", return_tensors="pt"), max_length=32)                                                                                                                    
>>> tokenizer.decode(output[0])
"<s> What is 1 + 1?\n\nThis question has been asked by many people, but I don't understand the answer.\n\nCould"
>>> output = model.generate(**tokenizer("Give me a random color:", return_tensors="pt"), max_length=32)
>>> tokenizer.decode(output[0])
'<s> Give me a random color:\n\n- Response: A random color can be represented in hexadecimal format as #RRGGBB,'
With Chat Template
output = model.generate(**tokenizer('<s><|user|> What is 1 + 1?<|end|><|assistant|>', return_tensors="pt"), max_length=32)
tokenizer.decode(output[0])
'<s><s><|user|> What is 1 + 1?<|end|><|assistant|> 1 + 1 equals 2. This is a basic arithmetic addition problem. When you'
>>> output = model.generate(**tokenizer('<s><|user|> Give me a random color:<|end|><|assistant|>', return_tensors="pt"), max_length=32)
>>> tokenizer.decode(output[0])
"<s><s><|user|> Give me a random color:<|end|><|assistant|> The random color I'll describe for you is a vibrant shade of teal, with"

How would you like it to behave?

By default generator(prompt) applies the chat template.

Current behavior should remain available via generator(prompt, raw=True)

Alternatively it might make sense to have the raw argument in the generator constructing function (e.g. outlines.generate.text(model, raw=True)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants