AI Infra 学习会议 主题 时间 预习资料 录频 文档 问题反馈 & 课后思考题 vLLM Quickstart 2025-05-11 Doc: vLLM AI INFRA 学习 01 - LLM 全景图介绍/vLLM 快速入门 01-vllm-quickstart PagedAttention 2025-05-25 Blog: vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttentionPaper: Efficient Memory Management for Large Language Model Serving with PagedAttentionVideo: Fast LLM Serving with vLLM and PagedAttention AI INFRA 学习 02 - vLLM PagedAttention 论文精读 02-pagedattention 02-PagedAttention 问题反馈 Prefix Caching 2025-06-08 Doc: Automatic Prefix CachingDesign Doc: Automatic Prefix CachingPaper: SGLang: Efficient Execution of Structured Language Model Programs AI INFRA 学习 03 - Prefix Caching 原理详解 03-prefix-caching Speculative Decoding 2025-06-22 Doc: Speculative DecodingBlog: How Speculative Decoding Boosts vLLM Performance by up to 2.8xVideo: Hacker's Guide to Speculative Decoding in VLLMVideo: Speculative Decoding in vLLMPaper: Accelerating Large Language Model Decoding with Speculative SamplingPaper: Fast Inference from Transformers via Speculative Decoding AI INFRA 学习 04 - Speculative Decoding 实现方案 04-speculative-decoding Chunked-Prefills 2025-07-13 Doc: vLLM Chunked Prefill Paper: SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked PrefillsPaper: DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-InferencePaper: Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve AI INFRA 学习 05 - Chunked-Prefills 分块预填充 05-chunked-prefills 05-Chunked-Prefills 问题反馈 & 课后思考题 Disaggregating Prefill and Decoding 2025-09-21 Doc: Disaggregated PrefillingDoc: vLLM Production Stack Disaggregated PrefillPaper: DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model ServingPaper: Splitwise: Efficient generative LLM inference using phase splittingVideo: vLLM Office Hours - Disaggregated Prefill and KV Cache Storage in vLLM AI INFRA 学习 06 - PD 分离推理架构详解 06-disaggregating-prefill-and-decoding 06-PD 分离问题反馈 LoRA Adapters Doc: LoRA AdaptersPaper: LoRA: Low-Rank Adaptation of Large Language Models Quantization Distributed Inference and Serving Doc: Distributed Inference and Serving 交流群(加群请备注来意) 微信公众号