Add vLLM to local-apps #744

mgoin · 2024-06-07T17:32:19Z

vLLM is a high-throughput and memory-efficient open-source serving engine for LLMs.

vLLM is fast with:

State-of-the-art serving throughput
Efficient management of attention key and value memory with PagedAttention
Continuous batching of incoming requests
Fast model execution with CUDA/HIP graph
Quantization: FP8, GPTQ, AWQ, SqueezeLLM, FP8 KV Cache
Optimized CUDA kernels

Transparent Logo:

julien-c · 2025-02-05T20:28:07Z

this was done in #693, closing

Add vLLM to local-apps

bed4332

mgoin requested review from SBrandeis, Wauplin, gary149, julien-c, osanseviero and pcuenca as code owners June 7, 2024 17:32

mgoin and others added 3 commits June 13, 2024 16:51

Merge branch 'main' into vllm-local-app

87b6b40

Merge branch 'main' into xyc

2cc641a

fixup: snippet format

3ae705b

krampstudio mentioned this pull request Sep 2, 2024

Adds vLLM as Option for Local App #693

Merged

Use vllm serve

6f980a0

coyotte508 requested a review from ngxson as a code owner November 14, 2024 22:22

julien-c closed this Feb 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add vLLM to local-apps #744

Add vLLM to local-apps #744

Uh oh!

mgoin commented Jun 7, 2024 •

edited

Loading

Uh oh!

julien-c commented Feb 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Add vLLM to local-apps #744

Add vLLM to local-apps #744

Uh oh!

Conversation

mgoin commented Jun 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

julien-c commented Feb 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mgoin commented Jun 7, 2024 •

edited

Loading