Update outlines support to v0.1.4 #10490

Treparme · 2024-11-20T10:28:42Z

Upgrades (or loosens) the outlines dependency.
It out of the box supports a higher version which improves speed.

github-actions · 2024-11-20T10:28:56Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

mgoin · 2024-11-20T15:05:09Z

Hi @Treparme could you please share how you tested? I was also working on this upgrade yesterday and ran into a breaking change due to the introduction of outlines-core.

Here is how I set up and then my error.
Server:

vllm serve meta-llama/Llama-3.2-1B-Instruct

Client:

from pydantic import BaseModel
from openai import OpenAI


class Info(BaseModel):
    name: str
    age: int


client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="dummy")
model_id = client.models.list().data[0].id
print("Model ID:", model_id)
completion = client.beta.chat.completions.parse(
    model=model_id,
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "My name is Cameron, I'm 28. What's my name and age?"},
    ],
    response_format=Info,
    extra_body=dict(guided_decoding_backend="outlines"),
)

message = completion.choices[0].message
print(message)
assert message.parsed
print("Name:", message.parsed.name)
print("Age:", message.parsed.age)

Error (truncated):

  File "/home/mgoin/code/vllm/vllm/engine/multiprocessing/client.py", line 582, in _process_request
    params = await \
             ^^^^^^^
  File "/home/mgoin/code/vllm/vllm/engine/async_llm_engine.py", line 539, in build_guided_decoding_logits_processor_async
    processor = await get_guided_decoding_logits_processor(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgoin/code/vllm/vllm/model_executor/guided_decoding/__init__.py", line 16, in get_guided_decoding_logits_processor
    return await get_outlines_guided_decoding_logits_processor(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgoin/code/vllm/vllm/model_executor/guided_decoding/outlines_decoding.py", line 72, in get_outlines_guided_decoding_logits_processor
    return await loop.run_in_executor(global_thread_pool,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgoin/.local/share/uv/python/cpython-3.12.4-linux-x86_64-gnu/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgoin/code/vllm/vllm/model_executor/guided_decoding/outlines_decoding.py", line 127, in _get_logits_processor
    return JSONLogitsProcessor(guide, tokenizer, whitespace_pattern)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgoin/code/vllm/vllm/model_executor/guided_decoding/outlines_logits_processors.py", line 151, in __init__
    super().__init__(regex_string, tokenizer)
  File "/home/mgoin/code/vllm/vllm/model_executor/guided_decoding/outlines_logits_processors.py", line 115, in __init__
    guide = RegexLogitsProcessor._get_guide(regex_string, tokenizer)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgoin/venvs/vllm/lib/python3.12/site-packages/outlines/caching.py", line 122, in wrapper
    result = cached_function(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgoin/code/vllm/vllm/model_executor/guided_decoding/outlines_logits_processors.py", line 102, in _get_guide
    return RegexGuide(regex_string, tokenizer)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: RegexGuide.__init__() missing 2 required positional arguments: 'eos_tensor' and 'initial_state'

Treparme · 2024-11-21T16:59:42Z

Hi @mgoin

We run it in a different way, something similar as below works

self.outlines_tokenizer = TransformerTokenizer(
    AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3.1-8B-Instruct")
    )
    logits_processor = JSONLogitsProcessor(schema=json_schema, tokenizer=self.outlines_tokenizer)
    logits_processor = build_vllm_logits_processor(self.tokenizer_data, parser)

sampling_params = SamplingParams(
  temperature=1e-6,
  max_tokens=2000,
  logits_processors=[logits_processor],
  logprobs=5,
)

results_generator = self.engine.generate(final_prompt, sampling_params, request_id, lora_request=lora)

This works
I want to add that the < 0.1.4 is intentional, and 0.1.3 is the highest one working.
0.1.4 has a caching bug :)
dottxt-ai/outlines#1274

mgoin · 2024-11-21T17:07:09Z

I believe 0.1.0-0.1.3 will have a larger issue with vLLM because the version of outlines-core that requires doesn't have a wheel for python 3.12, which we use in our testing image

Treparme · 2024-11-21T17:58:11Z

Hmmm annoying
At the moment I install vllm this way

pip3 install --user outlines==0.1.3 
pip3 install --user vllm==0.6.4.post1 --no-deps
pip3 install --user $(pip show vllm | grep "Requires:" | sed 's/Requires: //g' | tr ',' '\n' | grep -v "outlines" | tr '\n' ' ')
pip3 install --user ray[serve] boto3 distance

This works and skips the dependency issue (ignores it)
Outlines does work this version (if you set it up in a certain way, see above comment) and runs nicely.
I think maybe waiting for the caching bug to be solved (in 0.1.4) and having wheels for python 3.12 would make this mergeable?

russellb · 2024-11-21T21:21:40Z

Hi @Treparme could you please share how you tested? I was also working on this upgrade yesterday and ran into a breaking change due to the introduction of outlines-core.

Here's the PR that changed the interface: dottxt-ai/outlines-core#40

I'll sort out what change we need on the vllm side.

russellb · 2024-11-21T21:27:15Z

Hi @Treparme could you please share how you tested? I was also working on this upgrade yesterday and ran into a breaking change due to the introduction of outlines-core.

Here's the PR that changed the interface: dottxt-ai/outlines-core#40

I'll sort out what change we need on the vllm side.

The change is trivial.

https://github.com/vllm-project/vllm/compare/main...russellb:vllm:outlines-0.1.4?expand=1

but with this change in place, I hit dottxt-ai/outlines#1274

It sounds like we just need to wait for another release with a fix, and then we can move forward.

russellb · 2024-11-21T23:07:29Z

https://github.com/vllm-project/vllm/compare/main...russellb:vllm:outlines-0.1.4?expand=1

but with this change in place, I hit dottxt-ai/outlines#1274

It sounds like we just need to wait for another release with a fix, and then we can move forward.

I tested outlines with their fix (which was to just remove the cache usage). It worked after I removed vllm's usage of the same API. I updated my branch with that change.

Update outlines support to v0.1.4

3e677ba

mergify bot added the ci/build label Nov 20, 2024

aarnphm mentioned this pull request Nov 21, 2024

[Core][Frontend] Add faster-outlines as guided decoding backend #10277

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update outlines support to v0.1.4 #10490

Update outlines support to v0.1.4 #10490

Treparme commented Nov 20, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Nov 20, 2024

mgoin commented Nov 20, 2024 •

edited

Loading

Treparme commented Nov 21, 2024

mgoin commented Nov 21, 2024

Treparme commented Nov 21, 2024

russellb commented Nov 21, 2024

russellb commented Nov 21, 2024

russellb commented Nov 21, 2024

Update outlines support to v0.1.4 #10490

Are you sure you want to change the base?

Update outlines support to v0.1.4 #10490

Conversation

Treparme commented Nov 20, 2024 • edited by github-actions bot Loading

github-actions bot commented Nov 20, 2024

mgoin commented Nov 20, 2024 • edited Loading

Treparme commented Nov 21, 2024

mgoin commented Nov 21, 2024

Treparme commented Nov 21, 2024

russellb commented Nov 21, 2024

russellb commented Nov 21, 2024

russellb commented Nov 21, 2024

Treparme commented Nov 20, 2024 •

edited by github-actions bot

Loading

mgoin commented Nov 20, 2024 •

edited

Loading