GitHub - cr7258/ai-infra-learning: This repository organizes materials, recordings, and schedules related to AI-infra learning meetings.

AI Infra 学习会议

主题	时间	预习资料	录频	文档	问题反馈 & 课后思考题
vLLM Quickstart	2025-05-11	Doc: vLLM	AI INFRA 学习 01 - LLM 全景图介绍/vLLM 快速入门	01-vllm-quickstart
PagedAttention	2025-05-25	Blog: vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention Paper: Efficient Memory Management for Large Language Model Serving with PagedAttention Video: Fast LLM Serving with vLLM and PagedAttention	AI INFRA 学习 02 - vLLM PagedAttention 论文精读	02-pagedattention	02-PagedAttention 问题反馈
Prefix Caching	2025-06-08	Doc: Automatic Prefix Caching Design Doc: Automatic Prefix Caching Paper: SGLang: Efficient Execution of Structured Language Model Programs	AI INFRA 学习 03 - Prefix Caching 原理详解	03-prefix-caching
Speculative Decoding	2025-06-22	Doc: Speculative Decoding Blog: How Speculative Decoding Boosts vLLM Performance by up to 2.8x Video: Hacker's Guide to Speculative Decoding in VLLM Video: Speculative Decoding in vLLM Paper: Accelerating Large Language Model Decoding with Speculative Sampling Paper: Fast Inference from Transformers via Speculative Decoding	AI INFRA 学习 04 - Speculative Decoding 实现方案	04-speculative-decoding
Chunked-Prefills	2025-07-13	Doc: vLLM Chunked Prefill Paper: SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills Paper: DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference Paper: Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve	AI INFRA 学习 05 - Chunked-Prefills 分块预填充	05-chunked-prefills	05-Chunked-Prefills 问题反馈 & 课后思考题
Disaggregating Prefill and Decoding	2025-09-21	Doc: Disaggregated Prefilling Doc: vLLM Production Stack Disaggregated Prefill Paper: DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving Paper: Splitwise: Efficient generative LLM inference using phase splitting Video: vLLM Office Hours - Disaggregated Prefill and KV Cache Storage in vLLM	AI INFRA 学习 06 - PD 分离推理架构详解	06-disaggregating-prefill-and-decoding	06-PD 分离问题反馈
LoRA Adapters		Doc: LoRA Adapters Paper: LoRA: Low-Rank Adaptation of Large Language Models
Quantization
Distributed Inference and Serving		Doc: Distributed Inference and Serving

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
lesson		lesson
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI Infra 学习会议

交流群（加群请备注来意）

微信公众号

About

Uh oh!

Releases

Packages

cr7258/ai-infra-learning

Folders and files

Latest commit

History

Repository files navigation

AI Infra 学习会议

交流群（加群请备注来意）

微信公众号

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages