forked from opendatahub-io/vllm
-
Notifications
You must be signed in to change notification settings - Fork 15
[do not merge] pr test for nm changes into 2.20 #107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Alexander Matveev <[email protected]>
Signed-off-by: Alexander Matveev <[email protected]>
Signed-off-by: Chengji Yao <[email protected]>
Signed-off-by: Matthew Vine <[email protected]>
Signed-off-by: ElizaWszola <[email protected]> Signed-off-by: ElizaWszola <[email protected]> Co-authored-by: Lucas Wilkinson <[email protected]>
Signed-off-by: Chenyaaang <[email protected]>
Signed-off-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]>
Signed-off-by: weizeng <[email protected]>
Signed-off-by: Mengqing Cao <[email protected]>
Signed-off-by: Cody Yu <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]>
add retries and get rid of progress meter
Signed-off-by: Gregory Shtrasberg <[email protected]>
…QS (vllm-project#15583) Signed-off-by: Chengji Yao <[email protected]>
Signed-off-by: Bella kira <[email protected]>
…oject#15587) Signed-off-by: ElizaWszola <[email protected]> Signed-off-by: ElizaWszola <[email protected]> Signed-off-by: [email protected] <[email protected]> Co-authored-by: ElizaWszola <[email protected]> Co-authored-by: Lucas Wilkinson <[email protected]> Co-authored-by: ElizaWszola <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: <[email protected]> Signed-off-by: youkaichao <[email protected]> Co-authored-by: youkaichao <[email protected]>
…lm-project#15616) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
… vllm/v1 (vllm-project#15211) Signed-off-by: h-sugi <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]>
…6071) Signed-off-by: Bill Nell <[email protected]>
Signed-off-by: Michael Goin <[email protected]>
Signed-off-by: Woosuk Kwon <[email protected]>
…put queue (vllm-project#15906)" This reverts commit 651cf0f.
Author
|
/build-from-odh |
Author
|
/build-from-odh |
* add PR pipeline * add correct default value for additional build secret * update pull request pipeline to use remote pipeline ref * add 4h timeout * call out the remote build platform * rename pipeline, disable making an image index for PR pipeline
c23a05b to
0fed6b0
Compare
Author
|
/build-from-odh |
3 similar comments
Author
|
/build-from-odh |
Author
|
/build-from-odh |
Author
|
/build-from-odh |
Author
|
/build-from-odh |
1 similar comment
Author
|
/build-from-odh |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
pre-commit'sisortversion to remove warnings (Updatepre-commit'sisortversion to remove warnings vllm-project/vllm#13614)requirements-test.txt(Use pre-commit to updaterequirements-test.txtvllm-project/vllm#13617)mm_processor_kwargsto chat-related protocols ([Bugfix] Addmm_processor_kwargsto chat-related protocols vllm-project/vllm#13644)vllm:cache_config_info([V1][Metrics] Supportvllm:cache_config_infovllm-project/vllm#13299)--show-hidden-metrics-for-versionCLI arg ([Metrics] Add--show-hidden-metrics-for-versionCLI arg vllm-project/vllm#13295)--datasetfrombenchmark_serving.py([Misc] Deprecate--datasetfrombenchmark_serving.pyvllm-project/vllm#13708)VLLM_ATTENTION_BACKENDis set ([Misc][Docs] Raise error when flashinfer is not installed andVLLM_ATTENTION_BACKENDis set vllm-project/vllm#12513)EngineArgs.create_engine_config([Misc] Clean UpEngineArgs.create_engine_configvllm-project/vllm#13734)AsyncOutputProcessingLogs ([Misc][Chore] Clean UpAsyncOutputProcessingLogs vllm-project/vllm#13780)/v1/audio/transcriptionsBad Request Error (Fix/v1/audio/transcriptionsBad Request Error vllm-project/vllm#13811)MyGemma2Embeddingtest (Fix failingMyGemma2Embeddingtest vllm-project/vllm#13820)lm_headon last pp rank (DeepSeek V2/V3/R1 only placelm_headon last pp rank vllm-project/vllm#13833)kv_cacheandattn_metadata(Add comments on accessingkv_cacheandattn_metadatavllm-project/vllm#13887)mergify([CI/Build] Add examples/ directory to be labelled bymergifyvllm-project/vllm#13944).pre-commit-config.yaml'sexclude(Deduplicate.pre-commit-config.yaml'sexcludevllm-project/vllm#13967)SupportsV0Onlyprotocol for model definitions ([V1]SupportsV0Onlyprotocol for model definitions vllm-project/vllm#13959)pipeline_parallel_sizeto optimization docs ([Docs] Addpipeline_parallel_sizeto optimization docs vllm-project/vllm#14059)whisperandflorence2examples ([Doc] Consolidatewhisperandflorence2examples vllm-project/vllm#14050)__repr__to KVCacheBlock to avoid recursive print ([v1] Add__repr__to KVCacheBlock to avoid recursive print vllm-project/vllm#14081)TransformersModel(Improve the docs forTransformersModelvllm-project/vllm#14147)head_dimnot existing in all model configs (Transformers backend) (Fixhead_dimnot existing in all model configs (Transformers backend) vllm-project/vllm#14141)vllm:tokens_total([V0][Metrics] Remove unimplementedvllm:tokens_totalvllm-project/vllm#14134)--generation-configis notNone(Fix performance when--generation-configis notNonevllm-project/vllm#14223)prompt_logprobsclamping for chat as well as completions ([Frontend] Doprompt_logprobsclamping for chat as well as completions vllm-project/vllm#14225)envs.VLLM_USE_V1in mm processing ([Misc][V1] Avoid usingenvs.VLLM_USE_V1in mm processing vllm-project/vllm#14256)best_ofSampling Parameter in anticipation for vLLM V1 (Deprecatebest_ofSampling Parameter in anticipation for vLLM V1 vllm-project/vllm#13997)ray list nodescommand to troubleshoot ray issues ([misc] Mentionray list nodescommand to troubleshoot ray issues vllm-project/vllm#14318)QKVParallelLinearcomputation ([Core] Optimizing cross-attentionQKVParallelLinearcomputation vllm-project/vllm#12325)best_offor V0 (Reinstatebest_offor V0 vllm-project/vllm#14356)cudaProfilerStopin benchmarks script ([Bugfix] Correctly callcudaProfilerStopin benchmarks script vllm-project/vllm#14183)kv_cachesandattn_metadatainOpenVINOCausalLM(Fix missingkv_cachesandattn_metadatainOpenVINOCausalLMvllm-project/vllm#14271)extra_argstoSamplingParams([core] addextra_argstoSamplingParamsvllm-project/vllm#13300)generation_configfrom model (Default togeneration_configfrom model vllm-project/vllm#12622)use_tqdm_on_loadto reduce logs ([Misc] adduse_tqdm_on_loadto reduce logs vllm-project/vllm#14407)model_implarg when explaining Transformers fallback ([Docs] Mentionmodel_implarg when explaining Transformers fallback vllm-project/vllm#14552)Github->GitHub(Correct capitalisation:Github->GitHubvllm-project/vllm#14561)second_per_grid_tsfor Qwen2-VL & Qwen2.5-VL ([V1][Bugfix] Fix handing ofsecond_per_grid_tsfor Qwen2-VL & Qwen2.5-VL vllm-project/vllm#14548)VLLM->vLLM(Correct capitalisation:VLLM->vLLMvllm-project/vllm#14562)--hf-overridesforAlibaba-NLP/gte-Qwen2([Bugfix] Update--hf-overridesforAlibaba-NLP/gte-Qwen2vllm-project/vllm#14609)vllm benchCLI ([Feature] Addvllm benchCLI vllm-project/vllm#13993)QKVCrossParallelLinearimplementation to support BNB 4-bit quantization ([Core] RefactorQKVCrossParallelLinearimplementation to support BNB 4-bit quantization vllm-project/vllm#14545)not include_stop_str_in_outputvllm-project/vllm#14624)VLLM_CPU_MOE_PREPACKto allow disabling MoE prepack when CPU does not support it ([Bugfix][IPEX] AddVLLM_CPU_MOE_PREPACKto allow disabling MoE prepack when CPU does not support it vllm-project/vllm#14681)SamplingParams.__post_init__()([Misc][Minor] SimplifySamplingParams.__post_init__()vllm-project/vllm#14772)SupportsMultiModal([Misc] Clean up type annotation forSupportsMultiModalvllm-project/vllm#14794)/is_sleeping([Core] Expose API endpoint/is_sleepingvllm-project/vllm#14312)ccachewithpip install -e .in doc ([Doc] Add guidance for usingccachewithpip install -e .in doc vllm-project/vllm#14901)mainbranch (#14692)--seedoption to offline multi-modal examples (#14934)AutoModelForImageTextToTextto load VLMs in tests (#14945)logprobsinChatCompletionRequest(#14352)do_rescalewarning when passing dummy data (#15107)tokenizer_mode(#15040)merge_async_iteratorsfast-path for single-prompt requests (#15150)miscissues with link to forum (#15226)extra_bodyas a way top pass vLLM only parameters using the OpenAI client (#15240)max_num_seqsis between cudagraph capture sizes (#15308)disable-any-whitespaceoption support for xgrammar (#15316)generation_configby default (#15281)/v1/audio/transcriptionsOpenAI API endpoint (#12909)/v1/audio/transcriptionsBad Request Error (Fix/v1/audio/transcriptionsBad Request Error vllm-project/vllm#13811)fastsafetensorsloader for loading model weights (#10647)TransformersModel(#12832)autofallback mode (#14779)num_embeds(#15443)SchedulerInterfacetype for engine scheduler field (#15499)TransformersModel(#15467)scatter_patch_features(#15559)is_encoder_decoder_inputswithsplit_enc_dec_inputs(#15620)mm_registryincompute_encoder_budget(#15621)tpulabel. (#15634)mm_hashesforgetting to be passed (#15668)embed_is_patchfor Idefics3 (#15696)mm_countsfor dummy data creation (#15703)embed_is_patchmask for fuyu model (#15731)_try_schedule_encoder_inputsfor every request (#15778)transformerstov4.50.3(#13905)MultiModalDataParser(#15828)format.shas it's been unsupported >70 days (#15884)format.shand makepre-commitinstallation simpler (#15890)k_indexis int64 forapply_top_k_only(#15907)huggingface_hubto enable Xet downloads (#15873)async_request_deepspeed_miiuses the OpenAI choices key (#15926)tool_choice='required'(#13483)huggingface-cli[hf-xet]->huggingface-cli[hf_xet](#15969)