-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update outlines support to v0.1.4 #10490
base: main
Are you sure you want to change the base?
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
Hi @Treparme could you please share how you tested? I was also working on this upgrade yesterday and ran into a breaking change due to the introduction of outlines-core. Here is how I set up and then my error.
Client: from pydantic import BaseModel
from openai import OpenAI
class Info(BaseModel):
name: str
age: int
client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="dummy")
model_id = client.models.list().data[0].id
print("Model ID:", model_id)
completion = client.beta.chat.completions.parse(
model=model_id,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "My name is Cameron, I'm 28. What's my name and age?"},
],
response_format=Info,
extra_body=dict(guided_decoding_backend="outlines"),
)
message = completion.choices[0].message
print(message)
assert message.parsed
print("Name:", message.parsed.name)
print("Age:", message.parsed.age) Error (truncated):
|
Hi @mgoin We run it in a different way, something similar as below works self.outlines_tokenizer = TransformerTokenizer(
AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3.1-8B-Instruct")
)
logits_processor = JSONLogitsProcessor(schema=json_schema, tokenizer=self.outlines_tokenizer)
logits_processor = build_vllm_logits_processor(self.tokenizer_data, parser)
sampling_params = SamplingParams(
temperature=1e-6,
max_tokens=2000,
logits_processors=[logits_processor],
logprobs=5,
)
results_generator = self.engine.generate(final_prompt, sampling_params, request_id, lora_request=lora) This works |
I believe |
Hmmm annoying
This works and skips the dependency issue (ignores it) |
Here's the PR that changed the interface: dottxt-ai/outlines-core#40 I'll sort out what change we need on the vllm side. |
The change is trivial. but with this change in place, I hit dottxt-ai/outlines#1274 It sounds like we just need to wait for another release with a fix, and then we can move forward. |
I tested outlines with their fix (which was to just remove the cache usage). It worked after I removed vllm's usage of the same API. I updated my branch with that change. |
Upgrades (or loosens) the outlines dependency.
It out of the box supports a higher version which improves speed.
FIX #10489