From 49f3fecda56239d7175bbb7ac46570cbe7b30e5f Mon Sep 17 00:00:00 2001
From: simone-dotolo <simonedotolo@libero.it>
Date: Wed, 4 Mar 2026 12:43:38 +0100
Subject: [PATCH 1/2] Fix GPU Worker count in Process Count Summary

Signed-off-by: simone-dotolo <simonedotolo@libero.it>
---
 docs/design/arch_overview.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/design/arch_overview.md b/docs/design/arch_overview.md
index 9c25368e5b25..08bebe68a962 100644
--- a/docs/design/arch_overview.md
+++ b/docs/design/arch_overview.md
@@ -122,7 +122,7 @@ For a deployment with `N` GPUs, `TP` tensor parallel size, `DP` data parallel si
 |---|---|---|
 | API Server | `A` (default `DP`) | Handles HTTP requests and input processing |
 | Engine Core | `DP` (default 1) | Scheduler and KV cache management |
-| GPU Worker | `N` (= `DP x TP`) | One per GPU, executes model forward passes |
+| GPU Worker | `N` (= `PP x TP`) | One per GPU, executes model forward passes |
 | DP Coordinator | 1 if `DP > 1`, else 0 | Load balancing across DP ranks |
 | **Total** | **`A + DP + N` (+ 1 if DP > 1)** | |
 

From 8f45087091c40e6fa9e4a94918e72dffd98bac93 Mon Sep 17 00:00:00 2001
From: simone-dotolo <84937474+simone-dotolo@users.noreply.github.com>
Date: Wed, 4 Mar 2026 12:53:01 +0100
Subject: [PATCH 2/2] Update docs/design/arch_overview.md

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: simone-dotolo <84937474+simone-dotolo@users.noreply.github.com>
---
 docs/design/arch_overview.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/design/arch_overview.md b/docs/design/arch_overview.md
index 08bebe68a962..143cffc2655d 100644
--- a/docs/design/arch_overview.md
+++ b/docs/design/arch_overview.md
@@ -122,7 +122,7 @@ For a deployment with `N` GPUs, `TP` tensor parallel size, `DP` data parallel si
 |---|---|---|
 | API Server | `A` (default `DP`) | Handles HTTP requests and input processing |
 | Engine Core | `DP` (default 1) | Scheduler and KV cache management |
-| GPU Worker | `N` (= `PP x TP`) | One per GPU, executes model forward passes |
+| GPU Worker | `N` (= `DP x PP x TP`) | One per GPU, executes model forward passes |
 | DP Coordinator | 1 if `DP > 1`, else 0 | Load balancing across DP ranks |
 | **Total** | **`A + DP + N` (+ 1 if DP > 1)** | |