From b0c69f3782e2a49ab1b9103ed941484fd9a50a78 Mon Sep 17 00:00:00 2001
From: Ryan O'Leary <ryanaoleary@google.com>
Date: Wed, 29 Oct 2025 11:00:03 +0000
Subject: [PATCH 01/13] [Docs] Add guide for RayService Incremental Upgrade
 KubeRay feature

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
---
 .../advanced-guides/incremental-upgrade.md    | 211 ++++++++++++++++++
 doc/source/serve/advanced-guides/index.md     |   2 +
 2 files changed, 213 insertions(+)
 create mode 100644 doc/source/serve/advanced-guides/incremental-upgrade.md

diff --git a/doc/source/serve/advanced-guides/incremental-upgrade.md b/doc/source/serve/advanced-guides/incremental-upgrade.md
new file mode 100644
index 000000000000..8ed4e3d38948
--- /dev/null
+++ b/doc/source/serve/advanced-guides/incremental-upgrade.md
@@ -0,0 +1,211 @@
+(rayservice-incremental-upgrade)=
+# RayService Zero-Downtime Incremental Upgrades
+
+This guide details how to configure and use the `NewClusterWithIncrementalUpgrade` strategy for a `RayService` with KubeRay. This feature was proposed in a [Ray Enhancement Proposal (REP)](https://github.com/ray-project/enhancements/blob/main/reps/2024-12-4-ray-service-incr-upgrade.md) and implemented with alpha support in KubeRay v1.5.0. If unfamiliar with RayServices and KubeRay, see the [RayService Quickstart](https://docs.ray.io/en/latest/cluster/kubernetes/getting-started/rayservice-quick-start.html).
+
+In previous versions of KubeRay, zero-downtime upgrades were supported only through the `NewCluster` strategy. This upgrade strategy involved scaling up a pending RayCluster with equal capacity as the active cluster, waiting until the updated Serve applications were healthy, and then switching traffic to the new RayCluster. While this upgrade strategy is reliable, it required users to scale 200% of their original clusters compute which can be prohibitive when dealing with expensive accelerator resources.
+
+The `NewClusterWithIncrementalUpgrade` strategy is designed for large-scale deployments, such as LLM serving, where duplicating resources for a standard blue/green deployment is not feasible due to resource constraints. Rather than creating a new `RayCluster` at 100% capacity, this strategy creates a new cluster and gradually scales its capacity up while simultaneously shifting user traffic from the old cluster to the new one. This gradual traffic migration enables users to safely scale their updated RayService while the old cluster auto-scales down, enabling users to save expensive compute resources and exert fine-grained control over the pace of their upgrade. This process relies on the Kubernetes Gateway API for fine-grained traffic splitting.
+
+## Quickstart: Performing an Incremental Upgrade
+
+### 1. Prerequisites
+
+Before you can use this feature, you **must** have the following set up in your Kubernetes cluster:
+
+1.  **Gateway API CRDs:** The K8s Gateway API resources must be installed. You can typically install them with:
+    ```bash
+    kubectl apply --server-side -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.4.0/standard-install.yaml
+    ```
+
+    The RayService controller utilizes GA Gateway API resources such as a [Gateway](https://kubernetes.io/docs/concepts/services-networking/gateway/#api-kind-gateway) and [HTTPRoute](https://kubernetes.io/docs/concepts/services-networking/gateway/#api-kind-httproute) to safely split traffic during the upgrade.
+
+2.  **A Gateway Controller:** Users must install a Gateway controller that implements the Gateway API, such as Istio, Contour, or a cloud-native implementation like GKE's Gateway controller. This feature should support any controller that implements Gateway API with support for `Gateway` and `HTTPRoute` CRDs, but is an alpha feature that's primarily been tested utilizing [Istio](https://istio.io/latest/docs/tasks/traffic-management/ingress/gateway-api/).
+3.  **A `GatewayClass` Resource:** Your cluster admin must create a `GatewayClass` resource that defines which controller to use. KubeRay will use this to create `Gateway` and `HTTPRoute` objects.
+
+    **Example: Istio `GatewayClass`**
+    ```yaml
+    apiVersion: gateway.networking.k8s.io/v1
+    kind: GatewayClass
+    metadata:
+        name: istio
+    spec:
+        controllerName: istio.io/gateway-controller
+    ```
+    You will need to use the `metadata.name` (e.g. `istio`) in the `gatewayClassName` field of the `RayService` spec.
+
+4.  **Ray Autoscaler:** Incremental upgrades require the Ray Autoscaler to be enabled in your `RayCluster` spec, as KubeRay manages the upgrade by adjusting the `target_capacity` for Ray Serve which adjusts the number of Serve replicas for each deployment. These Serve replicas are translated into a resource load which the Ray autoscaler considers when determining the number of Pods to provision with KubeRay. For information on enabling and configuring Ray autoscaling on Kubernetes, see [KubeRay Autoscaling](https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/configuring-autoscaling.html).
+
+### 2. How it Works: The Upgrade Process
+
+Understanding the lifecycle of an incremental upgrade helps in monitoring and configuration.
+
+1.  **Trigger:** You trigger an upgrade by updating the `RayService` spec, such as changing the container `image` or updating the `serveConfigV2`.
+2.  **Pending Cluster Creation:** KubeRay detects the change and creates a new, *pending* `RayCluster`. It sets this cluster's initial `target_capacity` (the percentage of serve replicas it should run) to `0%`.
+3.  **Gateway and Route Creation:** KubeRay creates a `Gateway` resource for your `RayService` and an `HTTPRoute` resource that initially routes 100% of traffic to the old, *active* cluster and 0% to the new, *pending* cluster.
+4.  **The Upgrade Loop Begins:**
+    The KubeRay controller now enters a loop that repeats three phases until the upgrade is complete. This loop ensures that the total cluster capacity only exceeds 100% by at most `maxSurgePercent`, preventing resource starvation.
+
+    Let's use an example: `maxSurgePercent: 20` and `stepSizePercent: 5`.
+
+    * **Initial State:**
+        * Active Cluster `target_capacity`: 100%
+        * Pending Cluster `target_capacity`: 0%
+        * **Total Capacity: 100%**
+
+    ---
+
+    **The Upgrade Cycle**
+
+    * **Phase 1: Scale Up Pending Cluster (Capacity)**
+        * KubeRay checks the total capacity (100%) and sees it's $\le$ 100%. It increases the **pending** cluster's `target_capacity` by `maxSurgePercent`.
+        * Active `target_capacity`: 100%
+        * Pending `target_capacity`: 0% $\rightarrow$ **20%**
+        * **Total Capacity: 120%**
+        * The Ray Autoscaler begins provisioning pods for the pending cluster to handle 20% of the target load.
+
+    * **Phase 2: Shift Traffic (HTTPRoute)**
+        * KubeRay waits for the pending cluster's new pods to be ready.
+        * Once ready, it begins to *gradually* shift traffic. Every `intervalSeconds`, it updates the `HTTPRoute` weights, moving `stepSizePercent` (5%) of traffic from the active to the pending cluster.
+        * This continues until the *actual* traffic (`trafficRoutedPercent`) "catches up" to the *pending* cluster's `target_capacity` (20% in this example).
+
+    * **Phase 3: Scale Down Active Cluster (Capacity)**
+        * Once Phase 2 is complete (`trafficRoutedPercent` == 20%), the loop runs again.
+        * KubeRay checks the total capacity (120%) and sees it's > 100%. It decreases the **active** cluster's `target_capacity` by `maxSurgePercent`.
+        * Active `target_capacity`: 100% $\rightarrow$ **80%**
+        * Pending `target_capacity`: 20%
+        * **Total Capacity: 100%**
+        * The Ray Autoscaler terminates pods on the active cluster as they become idle.
+
+    ---
+
+5.  **Completion & Cleanup:**
+    This cycle of **(Scale Up Pending $\rightarrow$ Shift Traffic $\rightarrow$ Scale Down Active)** continues until the pending cluster is at 100% `target_capacity` and 100% `trafficRoutedPercent`, and the active cluster is at 0%.
+
+    KubeRay then promotes the pending cluster to active, updates the `HTTPRoute` to send 100% of traffic to it, and safely terminates the old `RayCluster`.
+
+### 3. Example `RayService` Configuration
+
+To use the feature, set the `upgradeStrategy.type` to `NewClusterWithIncrementalUpgrade` and provide the required options.
+
+```yaml
+apiVersion: ray.io/v1
+kind: RayService
+metadata:
+  name: example-rayservice
+spec:
+  # This is the main configuration block for the upgrade
+  upgradeStrategy:
+    # 1. Set the type to NewClusterWithIncrementalUpgrade
+    type: "NewClusterWithIncrementalUpgrade"
+    clusterUpgradeOptions:
+      # 2. The name of your K8s GatewayClass
+      gatewayClassName: "istio"
+
+      # 3. Capacity scaling: Increase new cluster's target_capacity
+      #    by 20% in each scaling step.
+      maxSurgePercent: 20
+
+      # 4. Traffic shifting: Move 5% of traffic from old to new
+      #    cluster every intervalSeconds.
+      stepSizePercent: 5
+
+      # 5. Interval seconds controls the pace of traffic migration during the upgrade.
+      intervalSeconds: 10
+
+  # This is your Serve config
+  serveConfigV2: |
+    applications:
+      - name: my_app
+        import_path: my_model:app
+        route_prefix: /
+        deployments:
+          - name: MyModel
+            num_replicas: 10
+            ray_actor_options:
+              resources: { "GPU": 1 }
+            autoscaling_config:
+              min_replicas: 0
+              max_replicas: 20
+
+  # This is your RayCluster config (autoscaling must be enabled)
+  rayClusterSpec:
+    enableInTreeAutoscaling: true
+    headGroupSpec:
+      # ... head spec ...
+    workerGroupSpecs:
+    - groupName: gpu-worker
+      replicas: 0
+      minReplicas: 0
+      maxReplicas: 20
+      template:
+        # ... pod spec with GPU requests ...
+```
+
+### 4. Monitoring the Upgrade
+
+You can monitor the progress of the upgrade by inspecting the `RayService` status and the `HTTPRoute` object.
+
+1.  **Check `RayService` Status:**
+    ```bash
+    kubectl describe rayservice example-rayservice
+    ```
+    Look at the `Status` section. You will see both `Active Service Status` and `Pending Service Status`, which show the state of both clusters. Pay close attention to these two new fields:
+    * **`Target Capacity`:** The percentage of replicas KubeRay is *telling* this cluster to scale to.
+    * **`Traffic Routed Percent`:** The percentage of traffic KubeRay is *currently* sending to this cluster via the Gateway.
+
+    During an upgrade, you will see `Target Capacity` on the pending cluster increase in steps (e.g., 20%, 40%) and `Traffic Routed Percent` gradually climb to meet it.
+
+2.  **Check `HTTPRoute` Weights:**
+    You can also see the traffic weights directly on the `HTTPRoute` resource KubeRay manages.
+    ```bash
+    kubectl get httproute example-rayservice-httproute -n <your-namespace> -o yaml
+    ```
+    Look at the `spec.rules.backendRefs`. You will see the `weight` for the old and new services change in real-time as the traffic shift (Phase 2) progresses.
+
+### 5. Rollback Support
+
+To roll back a failing or poorly performing upgrade, simply **update the `RayService` manifest back to the original configuration** (e.g., change the `image` back to the old tag).
+
+KubeRay's controller will detect that the "goal state" now matches the *active* (old) cluster. It will reverse the process:
+1.  Scale the active cluster's `target_capacity` back to 100%.
+2.  Shift all traffic back to the active cluster.
+3.  Scale down and terminate the *pending* (new) cluster.
+
+---
+
+## API Overview (Reference)
+
+This section details the new and updated fields in the `RayService` CRD.
+
+### `RayService.spec.upgradeStrategy`
+
+| Field | Type | Description | Required | Default |
+| :--- | :--- | :--- | :--- | :--- |
+| `type` | `string` | The strategy to use for upgrades. Can be `NewCluster`, `None`, or `NewClusterWithIncrementalUpgrade`. | No | `NewCluster` |
+| `clusterUpgradeOptions` | `object` | Container for incremental upgrade settings. **Required if `type` is `NewClusterWithIncrementalUpgrade`.** The `RayServiceIncrementalUpgrade` feature gate must be enabled. | No | `nil` |
+
+### `RayService.spec.upgradeStrategy.clusterUpgradeOptions`
+
+This block is required *only* if `type` is set to `NewClusterWithIncrementalUpgrade`.
+
+| Field | Type | Description | Required | Default |
+| :--- | :--- | :--- | :--- | :--- |
+| `maxSurgePercent` | `int32` | The percentage of *capacity* (Serve replicas) to add to the new cluster in each scaling step. For example, a value of `20` means the new cluster's `target_capacity` will increase in 20% increments (0% -> 20% -> 40%...). Must be between 0 and 100. | No | `100` |
+| `stepSizePercent` | `int32` | The percentage of *traffic* to shift from the old to the new cluster during each interval. Must be between 0 and 100. | **Yes** | N/A |
+| `intervalSeconds` | `int32` | The time in seconds to wait between shifting traffic by `stepSizePercent`. | **Yes** | N/A |
+| `gatewayClassName` | `string` | The `metadata.name` of the `GatewayClass` resource KubeRay should use to create `Gateway` and `HTTPRoute` objects. | **Yes** | N/A |
+
+### `RayService.status.activeServiceStatus` & `RayService.status.pendingServiceStatus`
+
+Three new fields are added to both the `activeServiceStatus` and `pendingServiceStatus` blocks to provide visibility into the upgrade process.
+
+| Field | Type | Description |
+| :--- | :--- | :--- |
+| `targetCapacity` | `int32` | The target percentage of Serve replicas this cluster is *configured* to handle (from 0 to 100). This is controlled by KubeRay based on `maxSurgePercent`. |
+| `trafficRoutedPercent` | `int32` | The *actual* percentage of traffic (from 0 to 100) currently being routed to this cluster's endpoint. This is controlled by KubeRay during an upgrade based on `stepSizePercent` and `intervalSeconds`. |
+| `lastTrafficMigratedTime` | `metav1.Time` | A timestamp indicating the last time `trafficRoutedPercent` was updated. |
+
+#### Next steps:
+* See [Deploy on Kubernetes](https://docs.ray.io/en/latest/serve/production-guide/kubernetes.html) for more information about deploying Ray Serve with KubeRay.
+* See [Ray Serve Autoscaling](https://docs.ray.io/en/latest/serve/autoscaling-guide.html) to configure your Serve deployments to scale based on traffic load.
\ No newline at end of file
diff --git a/doc/source/serve/advanced-guides/index.md b/doc/source/serve/advanced-guides/index.md
index 92802056f001..fb9663ab6964 100644
--- a/doc/source/serve/advanced-guides/index.md
+++ b/doc/source/serve/advanced-guides/index.md
@@ -7,6 +7,7 @@
 app-builder-guide
 advanced-autoscaling
 performance
+incremental-upgrade
 dyn-req-batch
 inplace-updates
 dev-workflow
@@ -25,6 +26,7 @@ Use these advanced guides for more options and configurations:
 - [Pass Arguments to Applications](app-builder-guide)
 - [Advanced Ray Serve Autoscaling](serve-advanced-autoscaling)
 - [Performance Tuning](serve-perf-tuning)
+- [RayService Zero-Downtime Incremental Upgrades](rayservice-incremental-upgrade)
 - [Dynamic Request Batching](serve-performance-batching-requests)
 - [In-Place Updates for Serve](serve-inplace-updates)
 - [Development Workflow](serve-dev-workflow)

From daa184947aac30d5f91278a1cd61eab7d421b94b Mon Sep 17 00:00:00 2001
From: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com>
Date: Wed, 29 Oct 2025 16:00:45 -0700
Subject: [PATCH 02/13] Update
 doc/source/serve/advanced-guides/incremental-upgrade.md

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com>
---
 doc/source/serve/advanced-guides/incremental-upgrade.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/source/serve/advanced-guides/incremental-upgrade.md b/doc/source/serve/advanced-guides/incremental-upgrade.md
index 8ed4e3d38948..da03cb67ee6d 100644
--- a/doc/source/serve/advanced-guides/incremental-upgrade.md
+++ b/doc/source/serve/advanced-guides/incremental-upgrade.md
@@ -3,7 +3,7 @@
 
 This guide details how to configure and use the `NewClusterWithIncrementalUpgrade` strategy for a `RayService` with KubeRay. This feature was proposed in a [Ray Enhancement Proposal (REP)](https://github.com/ray-project/enhancements/blob/main/reps/2024-12-4-ray-service-incr-upgrade.md) and implemented with alpha support in KubeRay v1.5.0. If unfamiliar with RayServices and KubeRay, see the [RayService Quickstart](https://docs.ray.io/en/latest/cluster/kubernetes/getting-started/rayservice-quick-start.html).
 
-In previous versions of KubeRay, zero-downtime upgrades were supported only through the `NewCluster` strategy. This upgrade strategy involved scaling up a pending RayCluster with equal capacity as the active cluster, waiting until the updated Serve applications were healthy, and then switching traffic to the new RayCluster. While this upgrade strategy is reliable, it required users to scale 200% of their original clusters compute which can be prohibitive when dealing with expensive accelerator resources.
+In previous versions of KubeRay, zero-downtime upgrades were supported only through the `NewCluster` strategy. This upgrade strategy involved scaling up a pending RayCluster with equal capacity as the active cluster, waiting until the updated Serve applications were healthy, and then switching traffic to the new RayCluster. While this upgrade strategy is reliable, it required users to scale 200% of their original cluster's compute resources which can be prohibitive when dealing with expensive accelerator resources.
 
 The `NewClusterWithIncrementalUpgrade` strategy is designed for large-scale deployments, such as LLM serving, where duplicating resources for a standard blue/green deployment is not feasible due to resource constraints. Rather than creating a new `RayCluster` at 100% capacity, this strategy creates a new cluster and gradually scales its capacity up while simultaneously shifting user traffic from the old cluster to the new one. This gradual traffic migration enables users to safely scale their updated RayService while the old cluster auto-scales down, enabling users to save expensive compute resources and exert fine-grained control over the pace of their upgrade. This process relies on the Kubernetes Gateway API for fine-grained traffic splitting.
 

From 809c76ec534da9962c06fb7b080abe50726487b3 Mon Sep 17 00:00:00 2001
From: Ryan O'Leary <ryanaoleary@google.com>
Date: Fri, 31 Oct 2025 09:31:44 +0000
Subject: [PATCH 03/13] Fix review comments and add example setup

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
---
 .../advanced-guides/incremental-upgrade.md    | 88 +++++++++++++++----
 1 file changed, 73 insertions(+), 15 deletions(-)

diff --git a/doc/source/serve/advanced-guides/incremental-upgrade.md b/doc/source/serve/advanced-guides/incremental-upgrade.md
index da03cb67ee6d..c028f867836e 100644
--- a/doc/source/serve/advanced-guides/incremental-upgrade.md
+++ b/doc/source/serve/advanced-guides/incremental-upgrade.md
@@ -5,7 +5,9 @@ This guide details how to configure and use the `NewClusterWithIncrementalUpgrad
 
 In previous versions of KubeRay, zero-downtime upgrades were supported only through the `NewCluster` strategy. This upgrade strategy involved scaling up a pending RayCluster with equal capacity as the active cluster, waiting until the updated Serve applications were healthy, and then switching traffic to the new RayCluster. While this upgrade strategy is reliable, it required users to scale 200% of their original cluster's compute resources which can be prohibitive when dealing with expensive accelerator resources.
 
-The `NewClusterWithIncrementalUpgrade` strategy is designed for large-scale deployments, such as LLM serving, where duplicating resources for a standard blue/green deployment is not feasible due to resource constraints. Rather than creating a new `RayCluster` at 100% capacity, this strategy creates a new cluster and gradually scales its capacity up while simultaneously shifting user traffic from the old cluster to the new one. This gradual traffic migration enables users to safely scale their updated RayService while the old cluster auto-scales down, enabling users to save expensive compute resources and exert fine-grained control over the pace of their upgrade. This process relies on the Kubernetes Gateway API for fine-grained traffic splitting.
+The `NewClusterWithIncrementalUpgrade` strategy is designed for large-scale deployments, such as LLM serving, where duplicating resources for a standard blue/green deployment is not feasible due to resource constraints. This feature minimizes resource usage during RayService CR upgrades while maintaining service availability. Below we explain the design and usage.
+
+Rather than creating a new `RayCluster` at 100% capacity, this strategy creates a new cluster and gradually scales its capacity up while simultaneously shifting user traffic from the old cluster to the new one. This gradual traffic migration enables users to safely scale their updated RayService while the old cluster auto-scales down, enabling users to save expensive compute resources and exert greater control over the pace of their upgrade. This process relies on the Kubernetes Gateway API for fine-grained traffic splitting.
 
 ## Quickstart: Performing an Incremental Upgrade
 
@@ -36,11 +38,76 @@ Before you can use this feature, you **must** have the following set up in your
 
 4.  **Ray Autoscaler:** Incremental upgrades require the Ray Autoscaler to be enabled in your `RayCluster` spec, as KubeRay manages the upgrade by adjusting the `target_capacity` for Ray Serve which adjusts the number of Serve replicas for each deployment. These Serve replicas are translated into a resource load which the Ray autoscaler considers when determining the number of Pods to provision with KubeRay. For information on enabling and configuring Ray autoscaling on Kubernetes, see [KubeRay Autoscaling](https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/configuring-autoscaling.html).
 
+#### Example: Setting up a RayService on kind:
+
+The following instructions detail the minimal steps to configure a cluster with KubeRay and trigger a zero-downtime incremental upgrade for a RayService.
+
+1. Create a kind cluster
+```bash
+kind create cluster --image=kindest/node:v1.29.0
+```
+We use `v1.29.0` which is known to be compatible with recent Istio versions.
+
+2. Install Gateway API CRDs
+```bash
+kubectl apply --server-side -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.4.0/standard-install.yaml
+```
+
+3. Install and Configure MetalLB for LoadBalancer on kind [optional]
+```bash
+kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.14.7/config/manifests/metallb-native.yaml
+```
+
+Create an `IPAddressPool` with the following spec for MetalLB [optional]
+```yaml
+echo "apiVersion: metallb.io/v1beta1
+kind: IPAddressPool
+metadata:
+  name: kind-pool
+  namespace: metallb-system
+spec:
+  addresses:
+  - 192.168.8.200-192.168.8.250 # adjust based on your subnets range
+---
+apiVersion: metallb.io/v1beta1
+kind: L2Advertisement
+metadata:
+  name: default
+  namespace: metallb-system
+spec:
+  ipAddressPools:
+  - kind-pool" | kubectl apply -f -
+```
+
+4. Create a Gateway class with the following spec
+```yaml
+echo "apiVersion: gateway.networking.k8s.io/v1
+kind: GatewayClass
+metadata:
+  name: istio
+spec:
+  controllerName: istio.io/gateway-controller" | kubectl apply -f -
+```
+
+5. Install istio
+```
+istioctl install --set profile=demo -y
+```
+
+6. Install the KubeRay operator, following [these instructions](https://docs.ray.io/en/latest/cluster/kubernetes/getting-started/kuberay-operator-installation.html). The minimum version for this guide is v1.5.0.
+
+7. Create a RayService with incremental upgrade enabled.
+```bash
+kubectl apply -f https://raw.githubusercontent.com/ray-project/kuberay/master/ray-operator/config/samples/ray-service.incremental-upgrade.yaml
+```
+
+8. Update one of the fields under `rayClusterConfig` and re-apply the RayService to trigger a zero-downtime upgrade.
+
 ### 2. How it Works: The Upgrade Process
 
 Understanding the lifecycle of an incremental upgrade helps in monitoring and configuration.
 
-1.  **Trigger:** You trigger an upgrade by updating the `RayService` spec, such as changing the container `image` or updating the `serveConfigV2`.
+1.  **Trigger:** You trigger an upgrade by updating the `RayService` spec, such as changing the container `image` or updating the `resources` used by a worker group in the `rayClusterSpec`.
 2.  **Pending Cluster Creation:** KubeRay detects the change and creates a new, *pending* `RayCluster`. It sets this cluster's initial `target_capacity` (the percentage of serve replicas it should run) to `0%`.
 3.  **Gateway and Route Creation:** KubeRay creates a `Gateway` resource for your `RayService` and an `HTTPRoute` resource that initially routes 100% of traffic to the old, *active* cluster and 0% to the new, *pending* cluster.
 4.  **The Upgrade Loop Begins:**
@@ -62,10 +129,12 @@ Understanding the lifecycle of an incremental upgrade helps in monitoring and co
         * Active `target_capacity`: 100%
         * Pending `target_capacity`: 0% $\rightarrow$ **20%**
         * **Total Capacity: 120%**
-        * The Ray Autoscaler begins provisioning pods for the pending cluster to handle 20% of the target load.
+        * If the Ray Serve autoscaler is enabled, the Serve application will scale its `num_replicas` from `min_replicas` based on the new `target_capacity`. Without the Ray Serve autoscaler enabled, the new `target_capacity` value will directly adjust `num_replicas` for each Serve deployment. Depending on the updated value of`num_replicas`, the Ray Autoscaler will begin provisioning pods for the pending cluster to handle the updated resource load.
 
     * **Phase 2: Shift Traffic (HTTPRoute)**
-        * KubeRay waits for the pending cluster's new pods to be ready.
+        * KubeRay waits for the pending cluster's new pods to be ready. With the alpha version of this feature and
+        Ray Serve autoscaling, there may be a temporary drop in requests-per-second while worker Pods are being
+        created for the updated Ray serve replicas.
         * Once ready, it begins to *gradually* shift traffic. Every `intervalSeconds`, it updates the `HTTPRoute` weights, moving `stepSizePercent` (5%) of traffic from the active to the pending cluster.
         * This continues until the *actual* traffic (`trafficRoutedPercent`) "catches up" to the *pending* cluster's `target_capacity` (20% in this example).
 
@@ -163,17 +232,6 @@ You can monitor the progress of the upgrade by inspecting the `RayService` statu
     ```
     Look at the `spec.rules.backendRefs`. You will see the `weight` for the old and new services change in real-time as the traffic shift (Phase 2) progresses.
 
-### 5. Rollback Support
-
-To roll back a failing or poorly performing upgrade, simply **update the `RayService` manifest back to the original configuration** (e.g., change the `image` back to the old tag).
-
-KubeRay's controller will detect that the "goal state" now matches the *active* (old) cluster. It will reverse the process:
-1.  Scale the active cluster's `target_capacity` back to 100%.
-2.  Shift all traffic back to the active cluster.
-3.  Scale down and terminate the *pending* (new) cluster.
-
----
-
 ## API Overview (Reference)
 
 This section details the new and updated fields in the `RayService` CRD.

From 75aacd138d41a339425e58d6729f67483347811d Mon Sep 17 00:00:00 2001
From: ryanaoleary <ryanaoleary@google.com>
Date: Thu, 4 Dec 2025 19:42:28 +0000
Subject: [PATCH 04/13] Remove [optional] from MetaLb step

Signed-off-by: ryanaoleary <ryanaoleary@google.com>
---
 doc/source/serve/advanced-guides/incremental-upgrade.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/source/serve/advanced-guides/incremental-upgrade.md b/doc/source/serve/advanced-guides/incremental-upgrade.md
index c028f867836e..b0dfd61bb1b9 100644
--- a/doc/source/serve/advanced-guides/incremental-upgrade.md
+++ b/doc/source/serve/advanced-guides/incremental-upgrade.md
@@ -53,7 +53,7 @@ We use `v1.29.0` which is known to be compatible with recent Istio versions.
 kubectl apply --server-side -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.4.0/standard-install.yaml
 ```
 
-3. Install and Configure MetalLB for LoadBalancer on kind [optional]
+3. Install and Configure MetalLB for LoadBalancer on kind
 ```bash
 kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.14.7/config/manifests/metallb-native.yaml
 ```

From cb20e0b873b90791d40ecc61145ed3f78a4800bd Mon Sep 17 00:00:00 2001
From: ryanaoleary <ryanaoleary@google.com>
Date: Thu, 4 Dec 2025 20:01:51 +0000
Subject: [PATCH 05/13] Resolve remaining comments

Signed-off-by: ryanaoleary <ryanaoleary@google.com>
---
 .../advanced-guides/incremental-upgrade.md    | 23 +++++++++++++++----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/doc/source/serve/advanced-guides/incremental-upgrade.md b/doc/source/serve/advanced-guides/incremental-upgrade.md
index b0dfd61bb1b9..3a53ab058e75 100644
--- a/doc/source/serve/advanced-guides/incremental-upgrade.md
+++ b/doc/source/serve/advanced-guides/incremental-upgrade.md
@@ -94,7 +94,12 @@ spec:
 istioctl install --set profile=demo -y
 ```
 
-6. Install the KubeRay operator, following [these instructions](https://docs.ray.io/en/latest/cluster/kubernetes/getting-started/kuberay-operator-installation.html). The minimum version for this guide is v1.5.0.
+6. Install the KubeRay operator, following [these instructions](https://docs.ray.io/en/latest/cluster/kubernetes/getting-started/kuberay-operator-installation.html). The minimum version for this guide is v1.5.1. To use this feature, the `RayServiceIncrementalUpgrade` feature gate must be enabled. To enable the feature gate when installing the kuberay operator, run the following command:
+```bash
+helm install kuberay-operator kuberay/kuberay-operator --version v1.5.1 \
+  --set featureGates\[0\].name=RayServiceIncrementalUpgrade \
+  --set featureGates\[0\].enabled=true
+```
 
 7. Create a RayService with incremental upgrade enabled.
 ```bash
@@ -161,7 +166,7 @@ To use the feature, set the `upgradeStrategy.type` to `NewClusterWithIncremental
 apiVersion: ray.io/v1
 kind: RayService
 metadata:
-  name: example-rayservice
+  name: rayservice-incremental-upgrade
 spec:
   # This is the main configuration block for the upgrade
   upgradeStrategy:
@@ -211,13 +216,21 @@ spec:
         # ... pod spec with GPU requests ...
 ```
 
-### 4. Monitoring the Upgrade
+### 4. Trigger the Upgrade
+
+Incremental upgrades are triggered exactly like standard zero-downtime upgrades in KubeRay: by modifying the `spec.rayClusterConfig` in your RayService Custom Resource.
+
+When KubeRay detects a change in the cluster specification (such as a new container image, modified resource limits, or updated environment variables), it calculates a new hash. If the hash differs from the active cluster and incremental upgrades are enabled, the `NewClusterWithIncrementalUpgrade` strategy is automatically initiated.
+
+Updates to the cluster specifications can occur by running `kubectl apply -f` on the updated YAML configuration file, or by directly editing the CR using `kubectl edit rayservice <your-rayservice-name>`.
+
+### 5. Monitoring the Upgrade
 
 You can monitor the progress of the upgrade by inspecting the `RayService` status and the `HTTPRoute` object.
 
 1.  **Check `RayService` Status:**
     ```bash
-    kubectl describe rayservice example-rayservice
+    kubectl describe rayservice rayservice-incremental-upgrade
     ```
     Look at the `Status` section. You will see both `Active Service Status` and `Pending Service Status`, which show the state of both clusters. Pay close attention to these two new fields:
     * **`Target Capacity`:** The percentage of replicas KubeRay is *telling* this cluster to scale to.
@@ -228,7 +241,7 @@ You can monitor the progress of the upgrade by inspecting the `RayService` statu
 2.  **Check `HTTPRoute` Weights:**
     You can also see the traffic weights directly on the `HTTPRoute` resource KubeRay manages.
     ```bash
-    kubectl get httproute example-rayservice-httproute -n <your-namespace> -o yaml
+    kubectl get httproute rayservice-incremental-upgrade-httproute -o yaml
     ```
     Look at the `spec.rules.backendRefs`. You will see the `weight` for the old and new services change in real-time as the traffic shift (Phase 2) progresses.
 

From 2fd01c81b09c56694b1e20665f1669e7832a4987 Mon Sep 17 00:00:00 2001
From: Future-Outlier <eric901201@gmail.com>
Date: Fri, 5 Dec 2025 16:16:56 +0800
Subject: [PATCH 06/13] update doc to explain how to upgrade safely

Signed-off-by: Future-Outlier <eric901201@gmail.com>
---
 .../advanced-guides/incremental-upgrade.md    | 66 ++++++++++++++++++-
 1 file changed, 63 insertions(+), 3 deletions(-)

diff --git a/doc/source/serve/advanced-guides/incremental-upgrade.md b/doc/source/serve/advanced-guides/incremental-upgrade.md
index 3a53ab058e75..25e6351a6a01 100644
--- a/doc/source/serve/advanced-guides/incremental-upgrade.md
+++ b/doc/source/serve/advanced-guides/incremental-upgrade.md
@@ -1,7 +1,7 @@
 (rayservice-incremental-upgrade)=
 # RayService Zero-Downtime Incremental Upgrades
 
-This guide details how to configure and use the `NewClusterWithIncrementalUpgrade` strategy for a `RayService` with KubeRay. This feature was proposed in a [Ray Enhancement Proposal (REP)](https://github.com/ray-project/enhancements/blob/main/reps/2024-12-4-ray-service-incr-upgrade.md) and implemented with alpha support in KubeRay v1.5.0. If unfamiliar with RayServices and KubeRay, see the [RayService Quickstart](https://docs.ray.io/en/latest/cluster/kubernetes/getting-started/rayservice-quick-start.html).
+This guide details how to configure and use the `NewClusterWithIncrementalUpgrade` strategy for a `RayService` with KubeRay. This feature was proposed in a [Ray Enhancement Proposal (REP)](https://github.com/ray-project/enhancements/blob/main/reps/2024-12-4-ray-service-incr-upgrade.md) and implemented with alpha support in KubeRay v1.5.1. If unfamiliar with RayServices and KubeRay, see the [RayService Quickstart](https://docs.ray.io/en/latest/cluster/kubernetes/getting-started/rayservice-quick-start.html).
 
 In previous versions of KubeRay, zero-downtime upgrades were supported only through the `NewCluster` strategy. This upgrade strategy involved scaling up a pending RayCluster with equal capacity as the active cluster, waiting until the updated Serve applications were healthy, and then switching traffic to the new RayCluster. While this upgrade strategy is reliable, it required users to scale 200% of their original cluster's compute resources which can be prohibitive when dealing with expensive accelerator resources.
 
@@ -137,8 +137,7 @@ Understanding the lifecycle of an incremental upgrade helps in monitoring and co
         * If the Ray Serve autoscaler is enabled, the Serve application will scale its `num_replicas` from `min_replicas` based on the new `target_capacity`. Without the Ray Serve autoscaler enabled, the new `target_capacity` value will directly adjust `num_replicas` for each Serve deployment. Depending on the updated value of`num_replicas`, the Ray Autoscaler will begin provisioning pods for the pending cluster to handle the updated resource load.
 
     * **Phase 2: Shift Traffic (HTTPRoute)**
-        * KubeRay waits for the pending cluster's new pods to be ready. With the alpha version of this feature and
-        Ray Serve autoscaling, there may be a temporary drop in requests-per-second while worker Pods are being
+        * KubeRay waits for the pending cluster's new pods to be ready. There may be a temporary drop in requests-per-second while worker Pods are being
         created for the updated Ray serve replicas.
         * Once ready, it begins to *gradually* shift traffic. Every `intervalSeconds`, it updates the `HTTPRoute` weights, moving `stepSizePercent` (5%) of traffic from the active to the pending cluster.
         * This continues until the *actual* traffic (`trafficRoutedPercent`) "catches up" to the *pending* cluster's `target_capacity` (20% in this example).
@@ -245,6 +244,67 @@ You can monitor the progress of the upgrade by inspecting the `RayService` statu
     ```
     Look at the `spec.rules.backendRefs`. You will see the `weight` for the old and new services change in real-time as the traffic shift (Phase 2) progresses.
 
+## How to upgrade safely?
+
+Since this feature is alpha and rollback is not yet supported, we recommend conservative parameter settings to minimize risk during upgrades.
+
+### Recommended Parameters
+
+To upgrade safely, you should:
+1. Scale up 1 worker pod in the new cluster and scale down 1 worker pod in the old cluster at a time
+2. Make the upgrade process gradual to allow the Ray Serve autoscaler to adapt
+
+Based on these principles, we recommend:
+- **maxSurgePercent**: Calculate based on the formula below
+- **stepSizePercent**: Set to a value less than `maxSurgePercent`
+- **intervalSeconds**: 60
+
+### Calculating maxSurgePercent
+
+The `maxSurgePercent` determines the maximum percentage of additional resources that can be provisioned during the upgrade. To calculate the minimum safe value:
+
+\begin{equation}
+\text{maxSurgePercent} = \frac{\text{resources per pod}}{\text{total cluster resources}} \times 100
+\end{equation}
+
+#### Example
+
+Consider a RayCluster with the following configuration:
+- `excludeHeadService`: true
+- Head pod: No GPU
+- 5 worker pods, each with 1 GPU (total: 5 GPUs)
+
+For this cluster:
+\begin{equation}
+\text{maxSurgePercent} = \frac{1 \text{ GPU}}{5 \text{ GPUs}} \times 100 = 20\%
+\end{equation}
+
+With `maxSurgePercent: 20`, the upgrade process ensures:
+- The new cluster scales up **1 worker pod at a time** (20% of 5 = 1 pod)
+- The old cluster scales down **1 worker pod at a time**
+- Your cluster temporarily uses 6 GPUs during the transition (5 original + 1 new)
+
+This configuration guarantees you have sufficient resources to run at least one additional worker pod during the upgrade without resource contention.
+
+### Understanding intervalSeconds
+
+Set `intervalSeconds` to 60 seconds to give the Ray Serve autoscaler and Ray autoscaler sufficient time to:
+- Detect load changes
+- Make scaling decisions while respecting upscale/downscale delays
+- Provision resources
+- Allow replicas to transition states gracefully to "deploying"
+
+A larger interval prevents the upgrade controller from making changes faster than the autoscaler can react, reducing the risk of service disruption.
+
+### Example Configuration
+
+```yaml
+upgradeStrategy:
+  maxSurgePercent: 20  # Calculated: (1 GPU / 5 GPUs) × 100
+  stepSizePercent: 10  # Less than maxSurgePercent
+  intervalSeconds: 60  # Wait 1 minute between steps
+```
+
 ## API Overview (Reference)
 
 This section details the new and updated fields in the `RayService` CRD.

From ee7c37e4ed9793a0a4c247117aaea50951e3a404 Mon Sep 17 00:00:00 2001
From: Future-Outlier <eric901201@gmail.com>
Date: Sun, 7 Dec 2025 14:13:40 +0800
Subject: [PATCH 07/13] add abrarsheikh and nick's advices

Signed-off-by: Future-Outlier <eric901201@gmail.com>
---
 .../serve/advanced-guides/incremental-upgrade.md  | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/doc/source/serve/advanced-guides/incremental-upgrade.md b/doc/source/serve/advanced-guides/incremental-upgrade.md
index 25e6351a6a01..3002b82b67f7 100644
--- a/doc/source/serve/advanced-guides/incremental-upgrade.md
+++ b/doc/source/serve/advanced-guides/incremental-upgrade.md
@@ -58,7 +58,7 @@ kubectl apply --server-side -f https://github.com/kubernetes-sigs/gateway-api/re
 kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.14.7/config/manifests/metallb-native.yaml
 ```
 
-Create an `IPAddressPool` with the following spec for MetalLB [optional]
+Create an `IPAddressPool` with the following spec for MetalLB
 ```yaml
 echo "apiVersion: metallb.io/v1beta1
 kind: IPAddressPool
@@ -244,6 +244,11 @@ You can monitor the progress of the upgrade by inspecting the `RayService` statu
     ```
     Look at the `spec.rules.backendRefs`. You will see the `weight` for the old and new services change in real-time as the traffic shift (Phase 2) progresses.
 
+For example:
+```yaml
+
+```
+
 ## How to upgrade safely?
 
 Since this feature is alpha and rollback is not yet supported, we recommend conservative parameter settings to minimize risk during upgrades.
@@ -252,7 +257,7 @@ Since this feature is alpha and rollback is not yet supported, we recommend cons
 
 To upgrade safely, you should:
 1. Scale up 1 worker pod in the new cluster and scale down 1 worker pod in the old cluster at a time
-2. Make the upgrade process gradual to allow the Ray Serve autoscaler to adapt
+2. Make the upgrade process gradual to allow the Ray Serve autoscaler and Ray autoscaler to adapt
 
 Based on these principles, we recommend:
 - **maxSurgePercent**: Calculate based on the formula below
@@ -290,9 +295,11 @@ This configuration guarantees you have sufficient resources to run at least one
 
 Set `intervalSeconds` to 60 seconds to give the Ray Serve autoscaler and Ray autoscaler sufficient time to:
 - Detect load changes
-- Make scaling decisions while respecting upscale/downscale delays
+- Immediately scale replicas up or down to enforce new min_replicas and max_replicas limits (via target_capacity)
+  - Scale down replicas immediately if they exceed the new max_replicas
+  - Scale up replicas immediately if they fall below the new min_replicas
 - Provision resources
-- Allow replicas to transition states gracefully to "deploying"
+- Allow replicas to transition states gracefully to "UPDATING"
 
 A larger interval prevents the upgrade controller from making changes faster than the autoscaler can react, reducing the risk of service disruption.
 

From de2e2832180bab63a80933a923224f817509cd9a Mon Sep 17 00:00:00 2001
From: Future-Outlier <eric901201@gmail.com>
Date: Sun, 7 Dec 2025 14:26:44 +0800
Subject: [PATCH 08/13] update

Signed-off-by: Future-Outlier <eric901201@gmail.com>
---
 doc/source/cluster/kubernetes/user-guides.md                   | 2 ++
 .../kubernetes/user-guides}/incremental-upgrade.md             | 3 ++-
 doc/source/serve/advanced-guides/index.md                      | 2 --
 3 files changed, 4 insertions(+), 3 deletions(-)
 rename doc/source/{serve/advanced-guides => cluster/kubernetes/user-guides}/incremental-upgrade.md (99%)

diff --git a/doc/source/cluster/kubernetes/user-guides.md b/doc/source/cluster/kubernetes/user-guides.md
index 6876d08a77eb..b389baf62b41 100644
--- a/doc/source/cluster/kubernetes/user-guides.md
+++ b/doc/source/cluster/kubernetes/user-guides.md
@@ -8,6 +8,7 @@
 Deploy Ray Serve Apps <user-guides/rayservice>
 user-guides/rayservice-no-ray-serve-replica
 user-guides/rayservice-high-availability
+user-guides/incremental-upgrade
 user-guides/observability
 user-guides/upgrade-guide
 user-guides/k8s-cluster-setup
@@ -42,6 +43,7 @@ at the {ref}`introductory guide <kuberay-quickstart>` first.
 * {ref}`kuberay-rayservice`
 * {ref}`kuberay-rayservice-no-ray-serve-replica`
 * {ref}`kuberay-rayservice-ha`
+* {ref}`rayservice-incremental-upgrade`
 * {ref}`kuberay-observability`
 * {ref}`kuberay-upgrade-guide`
 * {ref}`kuberay-k8s-setup`
diff --git a/doc/source/serve/advanced-guides/incremental-upgrade.md b/doc/source/cluster/kubernetes/user-guides/incremental-upgrade.md
similarity index 99%
rename from doc/source/serve/advanced-guides/incremental-upgrade.md
rename to doc/source/cluster/kubernetes/user-guides/incremental-upgrade.md
index 3002b82b67f7..c9b678bf0d68 100644
--- a/doc/source/serve/advanced-guides/incremental-upgrade.md
+++ b/doc/source/cluster/kubernetes/user-guides/incremental-upgrade.md
@@ -346,4 +346,5 @@ Three new fields are added to both the `activeServiceStatus` and `pendingService
 
 #### Next steps:
 * See [Deploy on Kubernetes](https://docs.ray.io/en/latest/serve/production-guide/kubernetes.html) for more information about deploying Ray Serve with KubeRay.
-* See [Ray Serve Autoscaling](https://docs.ray.io/en/latest/serve/autoscaling-guide.html) to configure your Serve deployments to scale based on traffic load.
\ No newline at end of file
+* See [Ray Serve Autoscaling](https://docs.ray.io/en/latest/serve/autoscaling-guide.html) to configure your Serve deployments to scale based on traffic load.
+
diff --git a/doc/source/serve/advanced-guides/index.md b/doc/source/serve/advanced-guides/index.md
index 3bb773dcf731..2658e5c4dbf7 100644
--- a/doc/source/serve/advanced-guides/index.md
+++ b/doc/source/serve/advanced-guides/index.md
@@ -8,7 +8,6 @@ app-builder-guide
 advanced-autoscaling
 asyncio-best-practices
 performance
-incremental-upgrade
 dyn-req-batch
 inplace-updates
 dev-workflow
@@ -28,7 +27,6 @@ Use these advanced guides for more options and configurations:
 - [Advanced Ray Serve Autoscaling](serve-advanced-autoscaling)
 - [Asyncio and Concurrency best practices in Ray Serve](serve-asyncio-best-practices)
 - [Performance Tuning](serve-perf-tuning)
-- [RayService Zero-Downtime Incremental Upgrades](rayservice-incremental-upgrade)
 - [Dynamic Request Batching](serve-performance-batching-requests)
 - [In-Place Updates for Serve](serve-inplace-updates)
 - [Development Workflow](serve-dev-workflow)

From 65437b76088dbbb2b50e420a2cd86371d971d35f Mon Sep 17 00:00:00 2001
From: Future-Outlier <eric901201@gmail.com>
Date: Sun, 7 Dec 2025 16:01:27 +0800
Subject: [PATCH 09/13] delete wrong explanation

Signed-off-by: Future-Outlier <eric901201@gmail.com>
---
 doc/source/cluster/kubernetes/user-guides/incremental-upgrade.md | 1 -
 1 file changed, 1 deletion(-)

diff --git a/doc/source/cluster/kubernetes/user-guides/incremental-upgrade.md b/doc/source/cluster/kubernetes/user-guides/incremental-upgrade.md
index c9b678bf0d68..aef636a462d8 100644
--- a/doc/source/cluster/kubernetes/user-guides/incremental-upgrade.md
+++ b/doc/source/cluster/kubernetes/user-guides/incremental-upgrade.md
@@ -299,7 +299,6 @@ Set `intervalSeconds` to 60 seconds to give the Ray Serve autoscaler and Ray aut
   - Scale down replicas immediately if they exceed the new max_replicas
   - Scale up replicas immediately if they fall below the new min_replicas
 - Provision resources
-- Allow replicas to transition states gracefully to "UPDATING"
 
 A larger interval prevents the upgrade controller from making changes faster than the autoscaler can react, reducing the risk of service disruption.
 

From 79e6efa237f2f17a1c39f228cd3fbaf228d26cb1 Mon Sep 17 00:00:00 2001
From: Future-Outlier <eric901201@gmail.com>
Date: Sun, 7 Dec 2025 16:02:15 +0800
Subject: [PATCH 10/13] Add http route weight

Signed-off-by: Future-Outlier <eric901201@gmail.com>
---
 .../user-guides/incremental-upgrade.md        | 62 ++++++++++++++++++-
 1 file changed, 61 insertions(+), 1 deletion(-)

diff --git a/doc/source/cluster/kubernetes/user-guides/incremental-upgrade.md b/doc/source/cluster/kubernetes/user-guides/incremental-upgrade.md
index aef636a462d8..74b553ded743 100644
--- a/doc/source/cluster/kubernetes/user-guides/incremental-upgrade.md
+++ b/doc/source/cluster/kubernetes/user-guides/incremental-upgrade.md
@@ -246,7 +246,67 @@ You can monitor the progress of the upgrade by inspecting the `RayService` statu
 
 For example:
 ```yaml
-
+apiVersion: gateway.networking.k8s.io/v1
+kind: HTTPRoute
+metadata:
+  creationTimestamp: "2025-12-07T07:42:24Z"
+  generation: 10
+  name: stress-test-serve-httproute
+  namespace: default
+  ownerReferences:
+  - apiVersion: ray.io/v1
+    blockOwnerDeletion: true
+    controller: true
+    kind: RayService
+    name: stress-test-serve
+    uid: 83a785cc-8745-4ccd-9973-2fc9f27000cc
+  resourceVersion: "3714"
+  uid: 660b14b5-78df-4507-b818-05989b1ef806
+spec:
+  parentRefs:
+  - group: gateway.networking.k8s.io
+    kind: Gateway
+    name: stress-test-serve-gateway
+    namespace: default
+  rules:
+  - backendRefs:
+    - group: ""
+      kind: Service
+      name: stress-test-serve-f6z4w-serve-svc
+      namespace: default
+      port: 8000
+      weight: 90
+    - group: ""
+      kind: Service
+      name: stress-test-serve-xclvf-serve-svc
+      namespace: default
+      port: 8000
+      weight: 10
+    matches:
+    - path:
+        type: PathPrefix
+        value: /
+status:
+  parents:
+  - conditions:
+    - lastTransitionTime: "2025-12-07T07:42:24Z"
+      message: Route was valid
+      observedGeneration: 10
+      reason: Accepted
+      status: "True"
+      type: Accepted
+    - lastTransitionTime: "2025-12-07T07:42:24Z"
+      message: All references resolved
+      observedGeneration: 10
+      reason: ResolvedRefs
+      status: "True"
+      type: ResolvedRefs
+    controllerName: istio.io/gateway-controller
+    parentRef:
+      group: gateway.networking.k8s.io
+      kind: Gateway
+      name: stress-test-serve-gateway
+      namespace: default
 ```
 
 ## How to upgrade safely?

From d1537e9e199ba4a58223d2db5ddcbec51c974431 Mon Sep 17 00:00:00 2001
From: Future-Outlier <eric901201@gmail.com>
Date: Sun, 7 Dec 2025 16:13:53 +0800
Subject: [PATCH 11/13] change kind dev order

Signed-off-by: Future-Outlier <eric901201@gmail.com>
---
 .../user-guides/incremental-upgrade.md        | 43 +++++++++++--------
 1 file changed, 25 insertions(+), 18 deletions(-)

diff --git a/doc/source/cluster/kubernetes/user-guides/incremental-upgrade.md b/doc/source/cluster/kubernetes/user-guides/incremental-upgrade.md
index 74b553ded743..4c54d1307045 100644
--- a/doc/source/cluster/kubernetes/user-guides/incremental-upgrade.md
+++ b/doc/source/cluster/kubernetes/user-guides/incremental-upgrade.md
@@ -48,12 +48,34 @@ kind create cluster --image=kindest/node:v1.29.0
 ```
 We use `v1.29.0` which is known to be compatible with recent Istio versions.
 
-2. Install Gateway API CRDs
+2. Install istio
+```
+istioctl install --set profile=demo -y
+```
+
+3. Install Gateway API CRDs
 ```bash
-kubectl apply --server-side -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.4.0/standard-install.yaml
+kubectl apply --server-side -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.3.0/standard-install.yaml
+```
+
+4. Create a Gateway class with the following spec
+```yaml
+echo "apiVersion: gateway.networking.k8s.io/v1
+kind: GatewayClass
+metadata:
+  name: istio
+spec:
+  controllerName: istio.io/gateway-controller" | kubectl apply -f -
 ```
 
-3. Install and Configure MetalLB for LoadBalancer on kind
+```yaml
+kubectl get gatewayclass
+NAME           CONTROLLER                    ACCEPTED   AGE
+istio          istio.io/gateway-controller   True       4s
+istio-remote   istio.io/unmanaged-gateway    True       3s
+```
+
+5. Install and Configure MetalLB for LoadBalancer on kind
 ```bash
 kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.14.7/config/manifests/metallb-native.yaml
 ```
@@ -79,21 +101,6 @@ spec:
   - kind-pool" | kubectl apply -f -
 ```
 
-4. Create a Gateway class with the following spec
-```yaml
-echo "apiVersion: gateway.networking.k8s.io/v1
-kind: GatewayClass
-metadata:
-  name: istio
-spec:
-  controllerName: istio.io/gateway-controller" | kubectl apply -f -
-```
-
-5. Install istio
-```
-istioctl install --set profile=demo -y
-```
-
 6. Install the KubeRay operator, following [these instructions](https://docs.ray.io/en/latest/cluster/kubernetes/getting-started/kuberay-operator-installation.html). The minimum version for this guide is v1.5.1. To use this feature, the `RayServiceIncrementalUpgrade` feature gate must be enabled. To enable the feature gate when installing the kuberay operator, run the following command:
 ```bash
 helm install kuberay-operator kuberay/kuberay-operator --version v1.5.1 \

From 5295a2105df459b9742dfd63662f42b30a193d3b Mon Sep 17 00:00:00 2001
From: Future-Outlier <eric901201@gmail.com>
Date: Mon, 8 Dec 2025 15:40:30 +0800
Subject: [PATCH 12/13] update doc ref name

Signed-off-by: Future-Outlier <eric901201@gmail.com>
---
 doc/source/cluster/kubernetes/user-guides.md                  | 4 ++--
 ...cremental-upgrade.md => rayservice-incremental-upgrade.md} | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)
 rename doc/source/cluster/kubernetes/user-guides/{incremental-upgrade.md => rayservice-incremental-upgrade.md} (99%)

diff --git a/doc/source/cluster/kubernetes/user-guides.md b/doc/source/cluster/kubernetes/user-guides.md
index b389baf62b41..e0a5513ccefd 100644
--- a/doc/source/cluster/kubernetes/user-guides.md
+++ b/doc/source/cluster/kubernetes/user-guides.md
@@ -8,7 +8,7 @@
 Deploy Ray Serve Apps <user-guides/rayservice>
 user-guides/rayservice-no-ray-serve-replica
 user-guides/rayservice-high-availability
-user-guides/incremental-upgrade
+user-guides/rayservice-incremental-upgrade
 user-guides/observability
 user-guides/upgrade-guide
 user-guides/k8s-cluster-setup
@@ -43,7 +43,7 @@ at the {ref}`introductory guide <kuberay-quickstart>` first.
 * {ref}`kuberay-rayservice`
 * {ref}`kuberay-rayservice-no-ray-serve-replica`
 * {ref}`kuberay-rayservice-ha`
-* {ref}`rayservice-incremental-upgrade`
+* {ref}`kuberay-rayservice-incremental-upgrade`
 * {ref}`kuberay-observability`
 * {ref}`kuberay-upgrade-guide`
 * {ref}`kuberay-k8s-setup`
diff --git a/doc/source/cluster/kubernetes/user-guides/incremental-upgrade.md b/doc/source/cluster/kubernetes/user-guides/rayservice-incremental-upgrade.md
similarity index 99%
rename from doc/source/cluster/kubernetes/user-guides/incremental-upgrade.md
rename to doc/source/cluster/kubernetes/user-guides/rayservice-incremental-upgrade.md
index 4c54d1307045..e6e46b227e96 100644
--- a/doc/source/cluster/kubernetes/user-guides/incremental-upgrade.md
+++ b/doc/source/cluster/kubernetes/user-guides/rayservice-incremental-upgrade.md
@@ -1,4 +1,4 @@
-(rayservice-incremental-upgrade)=
+(kuberay-rayservice-incremental-upgrade)=
 # RayService Zero-Downtime Incremental Upgrades
 
 This guide details how to configure and use the `NewClusterWithIncrementalUpgrade` strategy for a `RayService` with KubeRay. This feature was proposed in a [Ray Enhancement Proposal (REP)](https://github.com/ray-project/enhancements/blob/main/reps/2024-12-4-ray-service-incr-upgrade.md) and implemented with alpha support in KubeRay v1.5.1. If unfamiliar with RayServices and KubeRay, see the [RayService Quickstart](https://docs.ray.io/en/latest/cluster/kubernetes/getting-started/rayservice-quick-start.html).

From b33874691a8106d99e18840cdf06f05096730ac8 Mon Sep 17 00:00:00 2001
From: Future-Outlier <eric901201@gmail.com>
Date: Mon, 8 Dec 2025 16:31:09 +0800
Subject: [PATCH 13/13] update

Signed-off-by: Future-Outlier <eric901201@gmail.com>
---
 doc/source/cluster/kubernetes/user-guides/rayservice.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/doc/source/cluster/kubernetes/user-guides/rayservice.md b/doc/source/cluster/kubernetes/user-guides/rayservice.md
index 25586e1db36f..05eba6564430 100644
--- a/doc/source/cluster/kubernetes/user-guides/rayservice.md
+++ b/doc/source/cluster/kubernetes/user-guides/rayservice.md
@@ -231,6 +231,8 @@ curl -X POST -H 'Content-Type: application/json' rayservice-sample-serve-svc:800
 (step-8-zero-downtime-upgrade-for-ray-clusters)=
 ## Step 8: Zero downtime upgrade for Ray clusters
 
+This section describes the default `NewCluster` upgrade strategy. For large-scale deployments where duplicating resources isn't feasible, see [RayService incremental upgrade](kuberay-rayservice-incremental-upgrade) for the `NewClusterWithIncrementalUpgrade` strategy, which uses fewer resources during upgrades.
+
 In Step 7, modifying `serveConfigV2` doesn't trigger a zero downtime upgrade for Ray clusters.
 Instead, it reapplies the new configurations to the existing RayCluster.
 However, if you modify `spec.rayClusterConfig` in the RayService YAML file, it triggers a zero downtime upgrade for Ray clusters.