CodeGen Converter#229
Closed
michaelfeil wants to merge 36 commits intovllm-project:mainfrom
Closed
Conversation
Member
|
Thank you for your contribution! Please let us know when this PR is ready for review! |
Co-authored-by: neubig <neubig@gmail.com>
Member
|
@michaelfeil Are you still working on the CodeGen model support? Do you need any help from our side? |
Contributor
Author
|
I'll try to get back to you in the coming days! |
Collaborator
|
Closed as this PR is too old. |
yukavio
pushed a commit
to yukavio/vllm
that referenced
this pull request
Jul 3, 2024
jikunshang
pushed a commit
to jikunshang/vllm
that referenced
this pull request
Sep 11, 2024
This fixes a very silly issue where mismatching values of `warmup_mode` flag could cause graph recompilations and eventually memory leaks.
amy-why-3459
pushed a commit
to amy-why-3459/vllm
that referenced
this pull request
Sep 15, 2025
This PR added pooling support for vllm-ascend
Tested with `bge-base-en-v1.5` by encode:
```
from vllm import LLM
# Sample prompts.
prompts = [
"Hello, my name is",
"The president of the United States is",
"The capital of France is",
"The future of AI is",
]
# Create an LLM.
model = LLM(model="./bge-base-en-v1.5", enforce_eager=True)
# Generate embedding. The output is a list of EmbeddingRequestOutputs.
outputs = model.encode(prompts)
# Print the outputs.
for output in outputs:
print(output.outputs.embedding) # list of 4096 floats
```
Tested by embedding:
```
from vllm import LLM, SamplingParams
llm = LLM(model="./bge-base-en-v1.5", task="embed")
(output,) = llm.embed("Hello, my name is")
embeds = output.outputs.embedding
print(f"Embeddings: {embeds!r} (size={len(embeds)})")
```
Related: vllm-project/vllm-ascend#200
## Known issue
The accuracy is not correct since this feature rely on `enc-dec`
support. It'll be done in the following PR by @MengqingCao
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
iwooook
pushed a commit
to moreh-dev/vllm
that referenced
this pull request
Nov 29, 2025
…uired (vllm-project#232) Extend vLLM v0 by adding automatic compat sampling fallbacks to device sampling. This offers substantial performance benefits when not all requests require advanced features like structured outputs. always_compat_sampling should now never be set in production - whenever something otherwise unsupported, such as structured outputs non-greedy sampling for models that don't support it on device non-uniform top-k top-p on host or with a model that doesn't support it is detected, the system will switch to host compat sampling. This also means that we never override the temperature, p or k to force it to be uniform across batch. Also contains a fix for vllm-project#229 (it will fallback to compat sampling if non-uniform is not supported on device or device sampling is disabled).
dtrifiro
added a commit
to dtrifiro/vllm
that referenced
this pull request
Dec 11, 2025
Matching the upstream Dockerfile as of v0.9.0: https://github.com/vllm-project/vllm/blob/v0.9.0/docker/Dockerfile.rocm?plain=1#L124
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR aims to integrate CodeGen. Work in progress, not ready.