-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
Closed as not planned
Closed as not planned
Copy link
Labels
bugSomething isn't workingSomething isn't workingstaleOver 90 days of inactivityOver 90 days of inactivitystructured-output
Description
Your current environment
Output of `pip list`
Package Version --------------------------------- ------------- absl-py 2.1.0 accelerate 1.1.1 aiofiles 23.2.1 aiohappyeyeballs 2.4.4 aiohttp 3.11.9 aiohttp-cors 0.7.0 aiosignal 1.3.1 airportsdata 20241001 annotated-types 0.7.0 anyio 4.6.2.post1 astor 0.8.1 attrs 24.2.0 bert-score 0.3.13 bitsandbytes 0.44.1 blake3 1.0.1 cachetools 5.5.0 certifi 2024.8.30 charset-normalizer 3.4.0 chex 0.1.87 click 8.1.7 cloudpickle 3.1.0 colorful 0.5.6 compressed-tensors 0.8.1 contourpy 1.3.1 cycler 0.12.1 datasets 3.1.0 demjson3 3.0.6 depyf 0.18.0 dill 0.3.8 diskcache 5.6.3 distlib 0.3.9 distro 1.9.0 einops 0.8.0 etils 1.11.0 evaluate 0.4.3 fastapi 0.115.5 fbgemm_gpu 1.0.0 ffmpy 0.4.0 filelock 3.16.1 flash-attn 2.7.0.post2 flax 0.10.2 fonttools 4.55.0 frozenlist 1.5.0 fsspec 2024.9.0 gguf 0.10.0 google-api-core 2.24.0 google-auth 2.37.0 googleapis-common-protos 1.66.0 gradio 5.7.1 gradio_client 1.5.0 grpcio 1.69.0 h11 0.14.0 httpcore 1.0.7 httptools 0.6.4 httpx 0.27.2 huggingface-hub 0.26.3 humanize 4.11.0 idna 3.10 importlib_metadata 8.5.0 importlib_resources 6.4.5 iniconfig 2.0.0 interegular 0.3.3 jax 0.4.35 jaxlib 0.4.35 Jinja2 3.1.4 jiter 0.8.0 joblib 1.4.2 jsonschema 4.23.0 jsonschema-specifications 2024.10.1 kiwisolver 1.4.7 lark 1.2.2 Levenshtein 0.26.1 linkify-it-py 2.0.3 litellm 1.53.7 llvmlite 0.43.0 lm-format-enforcer 0.10.9 markdown-it-py 3.0.0 MarkupSafe 2.1.5 matplotlib 3.9.3 mdit-py-plugins 0.4.2 mdurl 0.1.2 memray 1.15.0 mistral_common 1.5.1 ml_dtypes 0.5.0 more-itertools 10.5.0 mpmath 1.3.0 msgpack 1.1.0 msgspec 0.18.6 multidict 6.1.0 multiprocess 0.70.16 nest-asyncio 1.6.0 networkx 3.4.2 nltk 3.9.1 numba 0.60.0 numpy 1.26.4 nvidia-cublas-cu12 12.4.5.8 nvidia-cuda-cupti-cu12 12.4.127 nvidia-cuda-nvrtc-cu12 12.4.127 nvidia-cuda-runtime-cu12 12.4.127 nvidia-cudnn-cu12 9.1.0.70 nvidia-cufft-cu12 11.2.1.3 nvidia-curand-cu12 10.3.5.147 nvidia-cusolver-cu12 11.6.1.9 nvidia-cusparse-cu12 12.3.1.170 nvidia-ml-py 12.560.30 nvidia-nccl-cu12 2.21.5 nvidia-nvjitlink-cu12 12.4.127 nvidia-nvtx-cu12 12.4.127 openai 1.56.0 opencensus 0.11.4 opencensus-context 0.1.3 opencv-python-headless 4.10.0.84 opt_einsum 3.4.0 optax 0.2.4 orbax-checkpoint 0.10.1 orjson 3.10.12 outlines 0.1.11 outlines_core 0.1.26 packaging 24.2 pandas 2.2.3 partial-json-parser 0.2.1.1.post4 pillow 10.4.0 pip 24.3.1 platformdirs 4.3.6 pluggy 1.5.0 prometheus_client 0.21.0 prometheus-fastapi-instrumentator 7.0.0 propcache 0.2.1 proto-plus 1.25.0 protobuf 3.20.3 psutil 6.1.0 py-cpuinfo 9.0.0 py-spy 0.4.0 pyairports 2.1.1 pyarrow 18.1.0 pyasn1 0.6.1 pyasn1_modules 0.4.1 pybind11 2.13.6 pycountry 24.6.1 pydantic 2.10.2 pydantic_core 2.27.1 pydub 0.25.1 Pygments 2.18.0 pyinfer 0.0.3 pyparsing 3.2.0 pytest 8.3.4 python-dateutil 2.9.0.post0 python-dotenv 1.0.1 python-multipart 0.0.12 pytz 2024.2 PyYAML 6.0.2 pyzmq 26.2.0 RapidFuzz 3.10.1 ray 2.39.0 referencing 0.35.1 regex 2024.11.6 requests 2.32.3 rich 13.9.4 rouge_score 0.1.2 rpds-py 0.22.0 rsa 4.9 ruff 0.8.1 sacremoses 0.1.1 safehttpx 0.1.6 safetensors 0.4.5 ScandEval 14.2.0.dev0 scikit-learn 1.5.2 scipy 1.14.1 semantic-version 2.10.0 sentencepiece 0.2.0 seqeval 1.2.2 setuptools 75.6.0 shellingham 1.5.4 simplejson 3.19.3 six 1.16.0 smart-open 7.1.0 sniffio 1.3.1 starlette 0.41.3 sympy 1.13.1 tabulate 0.9.0 tenacity 9.0.0 tensorstore 0.1.69 termcolor 2.5.0 textual 1.0.0 threadpoolctl 3.5.0 tiktoken 0.7.0 tokenizers 0.21.0 tomlkit 0.12.0 toolz 1.0.0 torch 2.5.1 torchvision 0.20.1 tqdm 4.67.1 transformers 4.48.0 triton 3.1.0 typer 0.14.0 typing_extensions 4.12.2 tzdata 2024.2 uc-micro-py 1.0.3 urllib3 2.2.3 uvicorn 0.32.1 uvloop 0.21.0 virtualenv 20.28.1 vllm 0.6.6.post1 watchfiles 1.0.0 websockets 12.0 wheel 0.45.1 wrapt 1.17.1 xformers 0.0.28.post3 xgrammar 0.1.9 xxhash 3.5.0 yarl 1.18.3 zipp 3.21.0Output of `lsb_release -a`
No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 24.04.1 LTS Release: 24.04 Codename: nobleOutput of `nvidia-smi -L`
GPU 0: NVIDIA GeForce RTX 3090 Ti (UUID: GPU-555d27a4-2596-131d-330e-1d8aecba86f6)Model Input Dumps
N/A
🐛 Describe the bug
from vllm import LLM, SamplingParams
from vllm.sampling_params import GuidedDecodingParams
from pydantic import BaseModel
from transformers import AutoTokenizer
class Person(BaseModel):
name: str
description: str
prompt = """<s>[INST] <<SYS>>
You are a json text extractor. return the following json {"name": "the game name", "description": "description of the game in around 400 words"}
<</SYS>>
{ CD Projekt Red is ramping up production on The Witcher 4, and of course it's looking into using AI } [/INST]"""
llm = LLM(model="meta-llama/Llama-3.1-8B-Instruct")
result = llm.generate(
prompts=[prompt] * 2000,
sampling_params=SamplingParams(
temperature=0.6,
max_tokens=1024,
guided_decoding=GuidedDecodingParams(json=Person.model_json_schema(), backend="outlines"),
),
)
print(result)Running the above with vLLM <= 0.6.4 and vLLM > 0.6.4, there are massive differences in both time and (CPU) memory consumption. With the older versions it consumes ~4GB RAM and the newer ones it consumes >50GB RAM, and also takes orders of magnitude longer to even start generating.
Related to this Outlines issue: dottxt-ai/outlines#1351.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Rictus
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingstaleOver 90 days of inactivityOver 90 days of inactivitystructured-output
Type
Projects
Status
Done