NVIDIA
diff --git a/‎docs/source/blogs/media/tech_blog5_Picture15.png‎
341 KB b/‎docs/source/blogs/media/tech_blog5_Picture15.png‎
341 KB
diff --git a/‎docs/source/blogs/tech_blog/blog5_Disaggregated_Serving_in_TensorRT-LLM.md‎
Lines changed: 15 additions & 0 deletions b/‎docs/source/blogs/tech_blog/blog5_Disaggregated_Serving_in_TensorRT-LLM.md‎
Lines changed: 15 additions & 0 deletions
@@ -18,6 +18,8 @@ By NVIDIA TensorRT-LLM Team
       - [ISL 4400 - OSL 1200 (Machine Translation Dataset)](#ISL-4400---OSL-1200-Machine-Translation-Dataset)
       - [ISL 8192 - OSL 256 (Synthetic Dataset)](#ISL-8192---OSL-256-Synthetic-Dataset)
       - [ISL 4096 - OSL 1024 (Machine Translation Dataset)](#ISL-4096---OSL-1024-Machine-Translation-Dataset)
+    - [Qwen 3](#Qwen-3)
+      - [ISL 8192 - OSL 1024 (Machine Translation Dataset)](#ISL-8192---OSL-1024-Machine-Translation-Dataset)
     - [Reproducing Steps](#Reproducing-Steps)
   - [Future Work](#Future-Work)
   - [Acknowledgement](#Acknowledgement)
@@ -260,6 +262,19 @@ In Figure 13 and 14, the E2E Pareto curves for aggregated serving and disaggrega
 
 For Pareto curves with MTP = 1, 2, 3, it can be observed that disaggregated results show a **1.7x** improvement over aggregated results at 50 tokens/sec/user (20 ms latency). Enabling MTP provides a larger speedup at higher concurrencies.
 
+### Qwen 3
+
+#### ISL 8192 - OSL 1024 (Machine Translation Dataset)
+
+<div align="center">
+<figure>
+  <img src="https://github.com/Shixiaowei02/TensorRT-LLM/blob/user/xiaoweis/blog/docs/source/blogs/media/tech_blog5_Picture15.png" width="640" height="auto" alt="Qwen 3 Pareto curves">
+</figure>
+</div>
+<p align="center"><sub><em>Figure 15. Qwen 3 Pareto curves.</em></sub></p>
+
+We also conducted performance evaluations of Qwen 3 on GB200 GPUs. The data indicate that the speedups achieved by disaggregation over aggregation range from 1.7x to 6.11x.
+
 ### Reproducing Steps
 
 We provide a set of scripts to reproduce the performance data presented in this paper. Please refer to the usage instructions described in [this document](https://github.com/NVIDIA/TensorRT-LLM/tree/main/docs/source/scripts/disaggregated).