-
Notifications
You must be signed in to change notification settings - Fork 541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accelerate not installed in Docker Image #1106
Comments
Thank you, happy to review a PR! |
Additionally, |
@rlouf I would be happy to create a PR for the docker setup, first I want to fully answer @lapp0's question, as this might be important for why I would like to use accelerate. I would prefer to use vLLM. I created a simple pydantic use case for vllm, transformers and serve, this is what the code was for each, and what was output. Since running these, I added params to the vLLM example and it started returning valid json, but I was expecting it to work out of the box as vLLM has default parameters + outlines should be restricting to correct schema(?) Server Call Code:import requests
from pydantic import BaseModel
# Define the Book model
class Book(BaseModel):
title: str
author: str
year: int
# Define the request parameters
ip_address = "localhost"
port = "8000"
prompt = "Create a book entry with the fields title, author, and year"
schema = Book.model_json_schema()
# Create the request body
outlines_request = {
"prompt": prompt,
"schema": schema
}
print("Prompt: ", prompt)
# Make the API call
response = requests.post(f"http://{ip_address}:{port}/generate/", json=outlines_request)
# Check if the request was successful
if response.status_code == 200:
result = response.json()
print("Result:", result["text"])
else:
print(f"Error: {response.status_code}, {response.text}") Server command: Output:
VLLM Code:from outlines import models, generate
from pydantic import BaseModel
from vllm import SamplingParams
class Book(BaseModel):
title: str
author: str
year: int
print("\n\npydantic_vllm_example\n\n")
model = models.vllm("microsoft/Phi-3-mini-128k-instruct", max_model_len= 25000)
params = SamplingParams(temperature=0, top_k=-1)
generator = generate.json(model, Book)
prompt = "Create a book entry with the fields title, author, and year"
result = generator(prompt)
print("Prompt:",prompt)
print("Result:",result) Output:
Transformers Code:from outlines import models, generate
from pydantic import BaseModel
from outlines.samplers import greedy
class Book(BaseModel):
title: str
author: str
year: int
model = models.transformers("microsoft/Phi-3-mini-128k-instruct", device="cuda:0")
print("\n\npydantic_transformers_example\n\n")
generator = generate.json(model, Book)
prompt = "Create a book entry with the fields title, author, and year"
result = generator(prompt)
print("Prompt:",prompt)
print("Result:",result) Output:
|
I'll look into the bug with json handling in vLLM. |
Hi, just checking in here. Any updates on when this/the relevant PR's might be finished? Mainly asking as it affects a content schedule where we would be talking about outlines. Thanks! |
Have you tried using vLLM's structured output feature in their OpenAI-compatible API? They use outlines under the hood. |
I plan on getting there at some point soon but was waiting on this. I don't view using Outlines and Outlines via vLLM as mutually exclusive for our purposes as we were looking to make pieces about both :). I was thinking the original outlines post would be a good intro for both of them. Also, I saw the release of Outlines-core, which could be another cool thing to put into the post as well. I'm happy to go down the path of vLLM for this in the meantime! |
Happy to review a PR that adds |
Describe the issue as clearly as possible:
Not sure if this is a bug or a feature request. But
accelerate
apparently isn't installed in the docker image. Which means that one can either use transformers with no GPU acceleration, or vLLM. vLLM currently doesn't have feature parity with transformers from what I can tell (likegenerate.json()
).Running the code outside of the image with the library +
accelerate
works.pip install accelerate
in the container works to solve the issue as well, and it seems like the marginal download was very small.Steps/code to reproduce the bug:
Expected result:
Error message:
Outlines/Python version information:
Docker Image Version Hash:
98c8512bd46f
Version information
Context for the issue:
I'm trying to write a post on using Outlines with Vast, and Vast needs everything to be based in a docker container to run, it would be great if users could start their workloads in the container without needing to install accelerate first.
The text was updated successfully, but these errors were encountered: