NVIDIA · QiJune · Dec 1, 2025 · Dec 1, 2025 · Dec 1, 2025
@@ -33,18 +33,18 @@ TensorRT LLM supports the latest and most popular LLM [architectures](https://nv
 TensorRT LLM strives to support the most popular models on **Day 0**.
 
 ### 🚀 **Advanced Optimization & Production Features**
-- **In-Flight Batching & Paged Attention**: {ref}`inflight-batching` eliminates wait times by dynamically managing request execution, processing context and generation phases together for maximum GPU utilization and reduced latency.
-- **Multi-GPU Multi-Node Inference**: Seamless distributed inference with tensor, pipeline, and expert parallelism across multiple GPUs and nodes through the Model Definition API.
-- **Advanced Quantization**: 
+- **[In-Flight Batching & Paged Attention](./features/paged-attention-ifb-scheduler.md)**: In-flight batching eliminates wait times by dynamically managing request execution, processing context and generation phases together for maximum GPU utilization and reduced latency.
+- **[Multi-GPU Multi-Node Inference](./features/parallel-strategy.md)**: Seamless distributed inference with tensor, pipeline, and expert parallelism across multiple GPUs and nodes through the Model Definition API.
+- **[Advanced Quantization](./features/quantization.md)**: 
   - **FP4 Quantization**: Native support on NVIDIA B200 GPUs with optimized FP4 kernels
   - **FP8 Quantization**: Automatic conversion on NVIDIA H100 GPUs leveraging Hopper architecture
-- **Speculative Decoding**: Multiple algorithms including EAGLE, MTP and NGram
-- **KV Cache Management**: Paged KV cache with intelligent block reuse and memory optimization
-- **Chunked Prefill**: Efficient handling of long sequences by splitting context into manageable chunks
-- **LoRA Support**: Multi-adapter support with HuggingFace and NeMo formats, efficient fine-tuning and adaptation
-- **Checkpoint Loading**: Flexible model loading from various formats (HuggingFace, NeMo, custom)
-- **Guided Decoding**: Advanced sampling with stop words, bad words, and custom constraints
-- **Disaggregated Serving (Beta)**: Separate context and generation phases across different GPUs for optimal resource utilization
+- **[Speculative Decoding](./features/speculative-decoding.md)**: Multiple algorithms including EAGLE, MTP and NGram
+- **[KV Cache Management](./features/kvcache.md)**: Paged KV cache with intelligent block reuse and memory optimization
+- **[Chunked Prefill](./features/paged-attention-ifb-scheduler.md)**: Efficient handling of long sequences by splitting context into manageable chunks
+- **[LoRA Support](./features/lora.md)**: Multi-adapter support with HuggingFace and NeMo formats, efficient fine-tuning and adaptation
+- **[Checkpoint Loading](./features/checkpoint-loading.md)**: Flexible model loading from various formats (HuggingFace, NeMo, custom)
+- **[Guided Decoding](./features/guided-decoding.md)**: Advanced sampling with stop words, bad words, and custom constraints
+- **[Disaggregated Serving (Beta)](./features/disagg-serving.md)**: Separate context and generation phases across different GPUs for optimal resource utilization
 
 ### 🔧 **Latest GPU Architecture Support**