-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Issues: vllm-project/vllm
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Bug]: IBM Granite 3.1 tool parser fails
bug
Something isn't working
#11402
opened Dec 22, 2024 by
K-Mistele
1 task done
[RFC]: Fully SPMD Execution for Offline Inference
RFC
#11400
opened Dec 21, 2024 by
eric-haibin-lin
1 task done
[Installation]: cannot install vllm with openvino backend
installation
Installation problems
#11398
opened Dec 21, 2024 by
yuzisun
1 task done
[Bug]: [v0.6.5] Streaming tool call responses with the hermes template is inconsistent with the non-stream version.
bug
Something isn't working
#11392
opened Dec 21, 2024 by
elementary-particle
1 task done
Where does the default number 43328 of KV cache come from and How can I change it?
usage
How to use vllm
#11391
opened Dec 21, 2024 by
george66s
1 task done
[Usage]: How do I run offline batch inference with Llama 405B BF16 across multinode (via SLURM)
usage
How to use vllm
#11379
opened Dec 20, 2024 by
aflah02
1 task done
[Bug]: Guided decoding crashes for GLM-4 model
bug
Something isn't working
#11377
opened Dec 20, 2024 by
frankang
1 task done
[Bug]: vLLM crashes on tokenized embedding input
bug
Something isn't working
#11375
opened Dec 20, 2024 by
FriedrichBethke
1 task done
[Bug]: Something isn't working
vllm serve
fails when passing --skip-tokenizer-init
flag
bug
#11374
opened Dec 20, 2024 by
ishitamed19
1 task done
[Feature]: Will vLLM support flash-attention 3 ?
feature request
#11372
opened Dec 20, 2024 by
jorgeantonio21
1 task done
[Bug]: Prefix caching doesn't work for LlavaOneVision
bug
Something isn't working
#11371
opened Dec 20, 2024 by
sleepwalker2017
1 task done
[Bug]: The service operation process results in occasional exception errors RuntimeError: CUDA error: an illegal memory access was encountered
bug
Something isn't working
#11366
opened Dec 20, 2024 by
pangr
1 task done
[Feature]: Add support for attention score output
feature request
#11365
opened Dec 20, 2024 by
WoutDeRijck
1 task done
[Misc]: What is 'residual' used for in the IntermediateTensor class?
misc
#11364
opened Dec 20, 2024 by
oldcpple
1 task done
[Bug]: priority scheduling doesn't work according to token_per_s. The token_per_s of requests with higher priorities is not higher than that of requests without priority settings.
bug
Something isn't working
#11361
opened Dec 20, 2024 by
kar9999
1 task done
[Feature]: meta-llama/Prompt-Guard-86M Usage Value Error.
feature request
#11360
opened Dec 20, 2024 by
burakaktas35
1 task done
[Bug]: vllm 0.6.3.post1 crash when deploy qwen2vl 72b
bug
Something isn't working
#11356
opened Dec 20, 2024 by
xxlight
1 task done
[Bug]: V100 cannot use the -enable-chunked-prefill method with dtype float16, but it can be used with dtype float32
bug
Something isn't working
#11352
opened Dec 20, 2024 by
warlockedward
1 task done
[New Model]: answerdotai/ModernBERT-large
new model
Requests to new models
#11347
opened Dec 19, 2024 by
pooyadavoodi
1 task
[Bug]: no output of profile when VLLM_TORCH_PROFILER_DIR is enabled for vllm serve
bug
Something isn't working
#11346
opened Dec 19, 2024 by
ziyang-arch
1 task done
[Performance]: 1P1D Disaggregation performance
performance
Performance-related issues
#11345
opened Dec 19, 2024 by
Jeffwan
1 task done
[Bug]: Paligemma 2 model loading error
bug
Something isn't working
#11343
opened Dec 19, 2024 by
mmderakhshani
1 task done
Previous Next
ProTip!
What’s not been updated in a month: updated:<2024-11-21.