-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Issues: vllm-project/vllm
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Bug]: When using tp for inference, an error occurs: Worker VllmWorkerProcess pid 3283517 died, exit code: -15.
bug
Something isn't working
#6145
opened Jul 4, 2024 by
B-201
[Usage]: Internal server error when serving LoRA adapters with Open-AI compatible vLLM server
usage
How to use vllm
#6141
opened Jul 4, 2024 by
ebi64
[Bug]: Spec. decode fails for requests with n>1 or best_of>1
bug
Something isn't working
#6137
opened Jul 4, 2024 by
tdoublep
[Bug]: Phi-3 long context (longrope) doesn't work with fp8 kv cache
bug
Something isn't working
#6135
opened Jul 4, 2024 by
jphme
[Installation]: Couldn't find CUDA library root.
installation
Installation problems
#6134
opened Jul 4, 2024 by
CodexDive
[Bug]: Disable log requests and disable log stats do not work
bug
Something isn't working
#6129
opened Jul 4, 2024 by
wufxgtihub123
[Usage]: vllm现在支持embedding输入吗,没有发现相关接口
usage
How to use vllm
#6128
opened Jul 4, 2024 by
zhanghang-official
[Bug]: RuntimeError: No suitable kernel. h_in=16 h_out=7392 dtype=Float out_dtype=BFloat16
bug
Something isn't working
#6126
opened Jul 4, 2024 by
JJJJerry
[Feature]: multi-lora support older nvidia gpus.
feature request
#6123
opened Jul 4, 2024 by
wuisawesome
[Bug]: Mixtral 8x7b FP8 encounters illegal memory access in custom_all_reduce.cuh
bug
Something isn't working
#6116
opened Jul 3, 2024 by
ferdiko
[Bug]: ray cluster Segmentation fault
bug
Something isn't working
#6106
opened Jul 3, 2024 by
warlockedward
GPU utilization going down on increasing concurrent request
bug
Something isn't working
#6105
opened Jul 3, 2024 by
jerin-scalers-ai
[Bug]: fused_moe_kernel compile bug
bug
Something isn't working
#6103
opened Jul 3, 2024 by
jeejeelee
[Feature]: Add LoRA support for BloomForCausalLM
feature request
#6100
opened Jul 3, 2024 by
wangzhe258369
[Bug]: enable_prefix_caching cause a triron crash
bug
Something isn't working
#6099
opened Jul 3, 2024 by
sweetning0809
[Bug]: Flashinfer stuck with CUDA Graph
bug
Something isn't working
#6086
opened Jul 3, 2024 by
Juelianqvq
[Misc]: Best practice for accelerating and deploying Llava series & Phi3-Vision using vLLM
misc
#6084
opened Jul 3, 2024 by
Jintao-Huang
[Feature]: Add support for interchangable radix attention
feature request
#6078
opened Jul 2, 2024 by
yifan1130
[Feature]: Add readiness endpoint /ready and return /health earlier (vLLM on Kubernetes)
feature request
#6073
opened Jul 2, 2024 by
frittentheke
Previous Next
ProTip!
Updated in the last three days: updated:>2024-07-01.