diff --git a/.cursor/.agent-tools/064c0caf-260b-43ce-89cd-87379b0cb213.txt b/.cursor/.agent-tools/064c0caf-260b-43ce-89cd-87379b0cb213.txt
new file mode 100644
index 000000000000..2d3f58bebb71
--- /dev/null
+++ b/.cursor/.agent-tools/064c0caf-260b-43ce-89cd-87379b0cb213.txt
@@ -0,0 +1 @@
+[{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2428247798","pull_request_review_id":3334343318,"id":2428247798,"node_id":"PRRC_kwDOBEmZvc6QvB72","diff_hunk":"@@ -0,0 +1,149 @@\n+(serve-external-scale-webhook)=\n+\n+# External Scaling Webhook\n+\n+Ray Serve exposes a REST API endpoint that you can use to dynamically scale your deployments from outside the Ray cluster. This endpoint gives you flexibility to implement custom scaling logic based on any metrics or signals you choose, such as external monitoring systems, business metrics, or predictive models.\n+\n+## Overview\n+\n+The external scaling webhook provides programmatic control over the number of replicas for any deployment in your Ray Serve application. Unlike Ray Serve's built-in autoscaling, which scales based on queue depth and ongoing requests, this webhook allows you to scale based on any external criteria you define.\n+\n+## Prerequisites\n+\n+Before you can use the external scaling webhook, you must enable it in your Ray Serve application configuration:\n+\n+### Enable external scaler\n+\n+Set `external_scaler_enabled: true` in your application configuration:\n+\n+```yaml\n+applications:\n+  - name: my-app\n+    import_path: my_module:app\n+    external_scaler_enabled: true\n+    deployments:\n+      - name: my-deployment\n+        num_replicas: 1\n+```\n+\n+:::{warning}\n+External scaling and built-in autoscaling are mutually exclusive. You can't use both for the same application.\n+\n+- If you set `external_scaler_enabled: true`, you **must not** configure `autoscaling_config` on any deployment in that application.\n+- If you configure `autoscaling_config` on any deployment, you **must not** set `external_scaler_enabled: true` for the application.\n+\n+Attempting to use both will result in an error.\n+:::\n+\n+### Get authentication token\n+\n+The external scaling webhook requires authentication using a bearer token. You can obtain this token from the Ray Dashboard UI:\n+\n+1. Open the Ray Dashboard in your browser (typically at `http://localhost:8265`).\n+2. Navigate to the Serve section.\n+3. Find and copy the authentication token for your application.\n+\n+## API endpoint\n+\n+The webhook is available at the following endpoint:\n+\n+```\n+POST /api/v1/applications/{application_name}/deployments/{deployment_name}/scale\n+```\n+\n+**Path Parameters:**\n+- `application_name`: The name of your Serve application.\n+- `deployment_name`: The name of the deployment you want to scale.\n+\n+**Headers:**\n+- `Authorization` (required): Bearer token for authentication. Format: `Bearer <token>`\n+- `Content-Type` (required): Must be `application/json`\n+\n+**Request Body:**\n+\n+The following example shows the request body structure:\n+\n+```json\n+{\n+    \"target_num_replicas\": 5\n+}\n+```\n+\n+The request body must conform to the [`ScaleDeploymentRequest`](https://docs.ray.io/en/latest/serve/api/doc/ray.serve.schema.ScaleDeploymentRequest.html) schema:\n+\n+- `target_num_replicas` (integer, required): The target number of replicas for the deployment. Must be a non-negative integer.\n+\n+\n+## Example - Predictive scaling\n+\n+Implement predictive scaling based on historical patterns or forecasts. For instance, you can preemptively scale up before anticipated traffic spikes:\n+\n+```python\n+import requests\n+from datetime import datetime\n+\n+def predictive_scale(\n+    application_name: str,\n+    deployment_name: str,\n+    auth_token: str,\n+    serve_endpoint: str = \"http://localhost:8000\"\n+) -> bool:\n+    \"\"\"Scale based on time of day and historical patterns.\"\"\"\n+    hour = datetime.now().hour\n+    \n+    # Define scaling profile based on historical traffic patterns\n+    if 9 <= hour < 17:  # Business hours\n+        target_replicas = 10\n+    elif 17 <= hour < 22:  # Evening peak\n+        target_replicas = 15\n+    else:  # Off-peak hours\n+        target_replicas = 3\n+    \n+    url = (\n+        f\"{serve_endpoint}/api/v1/applications/{application_name}\"\n+        f\"/deployments/{deployment_name}/scale\"\n+    )\n+    \n+    headers = {\n+        \"Authorization\": f\"Bearer {auth_token}\",\n+        \"Content-Type\": \"application/json\"\n+    }\n+    \n+    response = requests.post(\n+        url,\n+        headers=headers,\n+        json={\"target_num_replicas\": target_replicas}\n+    )","path":"doc/source/serve/advanced-guides/external-scaling-webhook.md","commit_id":"1c22056eb240d871bb816191419f4a569ed3ae69","original_commit_id":"ec8fb2e775e5765c81a20ee7a28c328d83bee157","user":{"login":"gemini-code-assist[bot]","id":176961590,"node_id":"BOT_kgDOCow4Ng","avatar_url":"https://avatars.githubusercontent.com/in/956858?v=4","gravatar_id":"","url":"https://api.github.com/users/gemini-code-assist%5Bbot%5D","html_url":"https://github.com/apps/gemini-code-assist","followers_url":"https://api.github.com/users/gemini-code-assist%5Bbot%5D/followers","following_url":"https://api.github.com/users/gemini-code-assist%5Bbot%5D/following{/other_user}","gists_url":"https://api.github.com/users/gemini-code-assist%5Bbot%5D/gists{/gist_id}","starred_url":"https://api.github.com/users/gemini-code-assist%5Bbot%5D/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/gemini-code-assist%5Bbot%5D/subscriptions","organizations_url":"https://api.github.com/users/gemini-code-assist%5Bbot%5D/orgs","repos_url":"https://api.github.com/users/gemini-code-assist%5Bbot%5D/repos","events_url":"https://api.github.com/users/gemini-code-assist%5Bbot%5D/events{/privacy}","received_events_url":"https://api.github.com/users/gemini-code-assist%5Bbot%5D/received_events","type":"Bot","user_view_type":"public","site_admin":false},"body":"![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)\n\nIt's a good practice to include a `timeout` in network requests to prevent the call from hanging indefinitely.\n\n```suggestion\n    response = requests.post(\n        url,\n        headers=headers,\n        json={\"target_num_replicas\": target_replicas},\n        timeout=10\n    )\n```","created_at":"2025-10-14T07:56:07Z","updated_at":"2025-10-14T07:56:07Z","html_url":"https://github.com/ray-project/ray/pull/57698#discussion_r2428247798","pull_request_url":"https://api.github.com/repos/ray-project/ray/pulls/57698","_links":{"self":{"href":"https://api.github.com/repos/ray-project/ray/pulls/comments/2428247798"},"html":{"href":"https://github.com/ray-project/ray/pull/57698#discussion_r2428247798"},"pull_request":{"href":"https://api.github.com/repos/ray-project/ray/pulls/57698"}},"reactions":{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2428247798/reactions","total_count":0,"+1":0,"-1":0,"laugh":0,"hooray":0,"confused":0,"heart":0,"rocket":0,"eyes":0},"start_line":116,"original_start_line":112,"start_side":"RIGHT","line":120,"original_line":116,"side":"RIGHT","author_association":"CONTRIBUTOR","original_position":116,"position":120,"subject_type":"line"},{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2428247807","pull_request_review_id":3334343318,"id":2428247807,"node_id":"PRRC_kwDOBEmZvc6QvB7_","diff_hunk":"@@ -0,0 +1,149 @@\n+(serve-external-scale-webhook)=\n+\n+# External Scaling Webhook\n+\n+Ray Serve exposes a REST API endpoint that you can use to dynamically scale your deployments from outside the Ray cluster. This endpoint gives you flexibility to implement custom scaling logic based on any metrics or signals you choose, such as external monitoring systems, business metrics, or predictive models.\n+\n+## Overview\n+\n+The external scaling webhook provides programmatic control over the number of replicas for any deployment in your Ray Serve application. Unlike Ray Serve's built-in autoscaling, which scales based on queue depth and ongoing requests, this webhook allows you to scale based on any external criteria you define.\n+\n+## Prerequisites\n+\n+Before you can use the external scaling webhook, you must enable it in your Ray Serve application configuration:\n+\n+### Enable external scaler\n+\n+Set `external_scaler_enabled: true` in your application configuration:\n+\n+```yaml\n+applications:\n+  - name: my-app\n+    import_path: my_module:app\n+    external_scaler_enabled: true\n+    deployments:\n+      - name: my-deployment\n+        num_replicas: 1\n+```\n+\n+:::{warning}\n+External scaling and built-in autoscaling are mutually exclusive. You can't use both for the same application.\n+\n+- If you set `external_scaler_enabled: true`, you **must not** configure `autoscaling_config` on any deployment in that application.\n+- If you configure `autoscaling_config` on any deployment, you **must not** set `external_scaler_enabled: true` for the application.\n+\n+Attempting to use both will result in an error.\n+:::\n+\n+### Get authentication token\n+\n+The external scaling webhook requires authentication using a bearer token. You can obtain this token from the Ray Dashboard UI:\n+\n+1. Open the Ray Dashboard in your browser (typically at `http://localhost:8265`).\n+2. Navigate to the Serve section.\n+3. Find and copy the authentication token for your application.\n+\n+## API endpoint\n+\n+The webhook is available at the following endpoint:\n+\n+```\n+POST /api/v1/applications/{application_name}/deployments/{deployment_name}/scale\n+```\n+\n+**Path Parameters:**\n+- `application_name`: The name of your Serve application.\n+- `deployment_name`: The name of the deployment you want to scale.\n+\n+**Headers:**\n+- `Authorization` (required): Bearer token for authentication. Format: `Bearer <token>`\n+- `Content-Type` (required): Must be `application/json`\n+\n+**Request Body:**\n+\n+The following example shows the request body structure:\n+\n+```json\n+{\n+    \"target_num_replicas\": 5\n+}\n+```\n+\n+The request body must conform to the [`ScaleDeploymentRequest`](https://docs.ray.io/en/latest/serve/api/doc/ray.serve.schema.ScaleDeploymentRequest.html) schema:\n+\n+- `target_num_replicas` (integer, required): The target number of replicas for the deployment. Must be a non-negative integer.\n+\n+\n+## Example - Predictive scaling\n+\n+Implement predictive scaling based on historical patterns or forecasts. For instance, you can preemptively scale up before anticipated traffic spikes:\n+\n+```python\n+import requests\n+from datetime import datetime\n+\n+def predictive_scale(\n+    application_name: str,\n+    deployment_name: str,\n+    auth_token: str,\n+    serve_endpoint: str = \"http://localhost:8000\"\n+) -> bool:\n+    \"\"\"Scale based on time of day and historical patterns.\"\"\"\n+    hour = datetime.now().hour\n+    \n+    # Define scaling profile based on historical traffic patterns\n+    if 9 <= hour < 17:  # Business hours\n+        target_replicas = 10\n+    elif 17 <= hour < 22:  # Evening peak\n+        target_replicas = 15\n+    else:  # Off-peak hours\n+        target_replicas = 3\n+    \n+    url = (\n+        f\"{serve_endpoint}/api/v1/applications/{application_name}\"\n+        f\"/deployments/{deployment_name}/scale\"\n+    )\n+    \n+    headers = {\n+        \"Authorization\": f\"Bearer {auth_token}\",\n+        \"Content-Type\": \"application/json\"\n+    }\n+    \n+    response = requests.post(\n+        url,\n+        headers=headers,\n+        json={\"target_num_replicas\": target_replicas}\n+    )\n+    \n+    return response.status_code == 200","path":"doc/source/serve/advanced-guides/external-scaling-webhook.md","commit_id":"1c22056eb240d871bb816191419f4a569ed3ae69","original_commit_id":"ec8fb2e775e5765c81a20ee7a28c328d83bee157","user":{"login":"gemini-code-assist[bot]","id":176961590,"node_id":"BOT_kgDOCow4Ng","avatar_url":"https://avatars.githubusercontent.com/in/956858?v=4","gravatar_id":"","url":"https://api.github.com/users/gemini-code-assist%5Bbot%5D","html_url":"https://github.com/apps/gemini-code-assist","followers_url":"https://api.github.com/users/gemini-code-assist%5Bbot%5D/followers","following_url":"https://api.github.com/users/gemini-code-assist%5Bbot%5D/following{/other_user}","gists_url":"https://api.github.com/users/gemini-code-assist%5Bbot%5D/gists{/gist_id}","starred_url":"https://api.github.com/users/gemini-code-assist%5Bbot%5D/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/gemini-code-assist%5Bbot%5D/subscriptions","organizations_url":"https://api.github.com/users/gemini-code-assist%5Bbot%5D/orgs","repos_url":"https://api.github.com/users/gemini-code-assist%5Bbot%5D/repos","events_url":"https://api.github.com/users/gemini-code-assist%5Bbot%5D/events{/privacy}","received_events_url":"https://api.github.com/users/gemini-code-assist%5Bbot%5D/received_events","type":"Bot","user_view_type":"public","site_admin":false},"body":"![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)\n\nInstead of checking only for status code `200`, using `response.ok` is more robust as it will correctly handle any successful HTTP status code in the 2xx range (e.g., 200, 201, 202).\n\n```suggestion\n    return response.ok\n```","created_at":"2025-10-14T07:56:07Z","updated_at":"2025-10-14T07:56:07Z","html_url":"https://github.com/ray-project/ray/pull/57698#discussion_r2428247807","pull_request_url":"https://api.github.com/repos/ray-project/ray/pulls/57698","_links":{"self":{"href":"https://api.github.com/repos/ray-project/ray/pulls/comments/2428247807"},"html":{"href":"https://github.com/ray-project/ray/pull/57698#discussion_r2428247807"},"pull_request":{"href":"https://api.github.com/repos/ray-project/ray/pulls/57698"}},"reactions":{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2428247807/reactions","total_count":0,"+1":0,"-1":0,"laugh":0,"hooray":0,"confused":0,"heart":0,"rocket":0,"eyes":0},"start_line":null,"original_start_line":null,"start_side":null,"line":122,"original_line":118,"side":"RIGHT","author_association":"CONTRIBUTOR","original_position":118,"position":122,"subject_type":"line"},{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2429995097","pull_request_review_id":3336833217,"id":2429995097,"node_id":"PRRC_kwDOBEmZvc6Q1shZ","diff_hunk":"@@ -0,0 +1,153 @@\n+(serve-external-scale-webhook)=\n+\n+:::{warning}\n+This API is in alpha and may change before becoming stable.\n+:::\n+\n+# External Scaling Webhook","path":"doc/source/serve/advanced-guides/external-scaling-webhook.md","commit_id":"1c22056eb240d871bb816191419f4a569ed3ae69","original_commit_id":"1c22056eb240d871bb816191419f4a569ed3ae69","user":{"login":"angelinalg","id":122562471,"node_id":"U_kgDOB04npw","avatar_url":"https://avatars.githubusercontent.com/u/122562471?v=4","gravatar_id":"","url":"https://api.github.com/users/angelinalg","html_url":"https://github.com/angelinalg","followers_url":"https://api.github.com/users/angelinalg/followers","following_url":"https://api.github.com/users/angelinalg/following{/other_user}","gists_url":"https://api.github.com/users/angelinalg/gists{/gist_id}","starred_url":"https://api.github.com/users/angelinalg/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/angelinalg/subscriptions","organizations_url":"https://api.github.com/users/angelinalg/orgs","repos_url":"https://api.github.com/users/angelinalg/repos","events_url":"https://api.github.com/users/angelinalg/events{/privacy}","received_events_url":"https://api.github.com/users/angelinalg/received_events","type":"User","user_view_type":"public","site_admin":false},"body":"```suggestion\r\n# External scaling webhook\r\n```","created_at":"2025-10-14T17:50:53Z","updated_at":"2025-10-14T17:56:24Z","html_url":"https://github.com/ray-project/ray/pull/57698#discussion_r2429995097","pull_request_url":"https://api.github.com/repos/ray-project/ray/pulls/57698","_links":{"self":{"href":"https://api.github.com/repos/ray-project/ray/pulls/comments/2429995097"},"html":{"href":"https://github.com/ray-project/ray/pull/57698#discussion_r2429995097"},"pull_request":{"href":"https://api.github.com/repos/ray-project/ray/pulls/57698"}},"reactions":{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2429995097/reactions","total_count":0,"+1":0,"-1":0,"laugh":0,"hooray":0,"confused":0,"heart":0,"rocket":0,"eyes":0},"start_line":null,"original_start_line":null,"start_side":null,"line":7,"original_line":7,"side":"RIGHT","author_association":"CONTRIBUTOR","original_position":7,"position":7,"subject_type":"line"},{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2429995559","pull_request_review_id":3336833217,"id":2429995559,"node_id":"PRRC_kwDOBEmZvc6Q1son","diff_hunk":"@@ -0,0 +1,153 @@\n+(serve-external-scale-webhook)=\n+\n+:::{warning}\n+This API is in alpha and may change before becoming stable.\n+:::\n+\n+# External Scaling Webhook\n+\n+Ray Serve exposes a REST API endpoint that you can use to dynamically scale your deployments from outside the Ray cluster. This endpoint gives you flexibility to implement custom scaling logic based on any metrics or signals you choose, such as external monitoring systems, business metrics, or predictive models.\n+\n+## Overview","path":"doc/source/serve/advanced-guides/external-scaling-webhook.md","commit_id":"1c22056eb240d871bb816191419f4a569ed3ae69","original_commit_id":"1c22056eb240d871bb816191419f4a569ed3ae69","user":{"login":"angelinalg","id":122562471,"node_id":"U_kgDOB04npw","avatar_url":"https://avatars.githubusercontent.com/u/122562471?v=4","gravatar_id":"","url":"https://api.github.com/users/angelinalg","html_url":"https://github.com/angelinalg","followers_url":"https://api.github.com/users/angelinalg/followers","following_url":"https://api.github.com/users/angelinalg/following{/other_user}","gists_url":"https://api.github.com/users/angelinalg/gists{/gist_id}","starred_url":"https://api.github.com/users/angelinalg/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/angelinalg/subscriptions","organizations_url":"https://api.github.com/users/angelinalg/orgs","repos_url":"https://api.github.com/users/angelinalg/repos","events_url":"https://api.github.com/users/angelinalg/events{/privacy}","received_events_url":"https://api.github.com/users/angelinalg/received_events","type":"User","user_view_type":"public","site_admin":false},"body":"```suggestion\r\n```","created_at":"2025-10-14T17:51:06Z","updated_at":"2025-10-14T17:56:24Z","html_url":"https://github.com/ray-project/ray/pull/57698#discussion_r2429995559","pull_request_url":"https://api.github.com/repos/ray-project/ray/pulls/57698","_links":{"self":{"href":"https://api.github.com/repos/ray-project/ray/pulls/comments/2429995559"},"html":{"href":"https://github.com/ray-project/ray/pull/57698#discussion_r2429995559"},"pull_request":{"href":"https://api.github.com/repos/ray-project/ray/pulls/57698"}},"reactions":{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2429995559/reactions","total_count":0,"+1":0,"-1":0,"laugh":0,"hooray":0,"confused":0,"heart":0,"rocket":0,"eyes":0},"start_line":null,"original_start_line":null,"start_side":null,"line":11,"original_line":11,"side":"RIGHT","author_association":"CONTRIBUTOR","original_position":11,"position":11,"subject_type":"line"},{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2429996047","pull_request_review_id":3336833217,"id":2429996047,"node_id":"PRRC_kwDOBEmZvc6Q1swP","diff_hunk":"@@ -0,0 +1,153 @@\n+(serve-external-scale-webhook)=\n+\n+:::{warning}\n+This API is in alpha and may change before becoming stable.\n+:::\n+\n+# External Scaling Webhook\n+\n+Ray Serve exposes a REST API endpoint that you can use to dynamically scale your deployments from outside the Ray cluster. This endpoint gives you flexibility to implement custom scaling logic based on any metrics or signals you choose, such as external monitoring systems, business metrics, or predictive models.\n+\n+## Overview\n+","path":"doc/source/serve/advanced-guides/external-scaling-webhook.md","commit_id":"1c22056eb240d871bb816191419f4a569ed3ae69","original_commit_id":"1c22056eb240d871bb816191419f4a569ed3ae69","user":{"login":"angelinalg","id":122562471,"node_id":"U_kgDOB04npw","avatar_url":"https://avatars.githubusercontent.com/u/122562471?v=4","gravatar_id":"","url":"https://api.github.com/users/angelinalg","html_url":"https://github.com/angelinalg","followers_url":"https://api.github.com/users/angelinalg/followers","following_url":"https://api.github.com/users/angelinalg/following{/other_user}","gists_url":"https://api.github.com/users/angelinalg/gists{/gist_id}","starred_url":"https://api.github.com/users/angelinalg/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/angelinalg/subscriptions","organizations_url":"https://api.github.com/users/angelinalg/orgs","repos_url":"https://api.github.com/users/angelinalg/repos","events_url":"https://api.github.com/users/angelinalg/events{/privacy}","received_events_url":"https://api.github.com/users/angelinalg/received_events","type":"User","user_view_type":"public","site_admin":false},"body":"```suggestion\r\n```","created_at":"2025-10-14T17:51:18Z","updated_at":"2025-10-14T17:56:24Z","html_url":"https://github.com/ray-project/ray/pull/57698#discussion_r2429996047","pull_request_url":"https://api.github.com/repos/ray-project/ray/pulls/57698","_links":{"self":{"href":"https://api.github.com/repos/ray-project/ray/pulls/comments/2429996047"},"html":{"href":"https://github.com/ray-project/ray/pull/57698#discussion_r2429996047"},"pull_request":{"href":"https://api.github.com/repos/ray-project/ray/pulls/57698"}},"reactions":{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2429996047/reactions","total_count":0,"+1":0,"-1":0,"laugh":0,"hooray":0,"confused":0,"heart":0,"rocket":0,"eyes":0},"start_line":null,"original_start_line":null,"start_side":null,"line":12,"original_line":12,"side":"RIGHT","author_association":"CONTRIBUTOR","original_position":12,"position":12,"subject_type":"line"},{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2429996629","pull_request_review_id":3336833217,"id":2429996629,"node_id":"PRRC_kwDOBEmZvc6Q1s5V","diff_hunk":"@@ -0,0 +1,153 @@\n+(serve-external-scale-webhook)=\n+\n+:::{warning}\n+This API is in alpha and may change before becoming stable.\n+:::\n+\n+# External Scaling Webhook\n+\n+Ray Serve exposes a REST API endpoint that you can use to dynamically scale your deployments from outside the Ray cluster. This endpoint gives you flexibility to implement custom scaling logic based on any metrics or signals you choose, such as external monitoring systems, business metrics, or predictive models.\n+\n+## Overview","path":"doc/source/serve/advanced-guides/external-scaling-webhook.md","commit_id":"1c22056eb240d871bb816191419f4a569ed3ae69","original_commit_id":"1c22056eb240d871bb816191419f4a569ed3ae69","user":{"login":"angelinalg","id":122562471,"node_id":"U_kgDOB04npw","avatar_url":"https://avatars.githubusercontent.com/u/122562471?v=4","gravatar_id":"","url":"https://api.github.com/users/angelinalg","html_url":"https://github.com/angelinalg","followers_url":"https://api.github.com/users/angelinalg/followers","following_url":"https://api.github.com/users/angelinalg/following{/other_user}","gists_url":"https://api.github.com/users/angelinalg/gists{/gist_id}","starred_url":"https://api.github.com/users/angelinalg/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/angelinalg/subscriptions","organizations_url":"https://api.github.com/users/angelinalg/orgs","repos_url":"https://api.github.com/users/angelinalg/repos","events_url":"https://api.github.com/users/angelinalg/events{/privacy}","received_events_url":"https://api.github.com/users/angelinalg/received_events","type":"User","user_view_type":"public","site_admin":false},"body":"No need for a heading for the introductory paragraph.","created_at":"2025-10-14T17:51:33Z","updated_at":"2025-10-14T17:56:24Z","html_url":"https://github.com/ray-project/ray/pull/57698#discussion_r2429996629","pull_request_url":"https://api.github.com/repos/ray-project/ray/pulls/57698","_links":{"self":{"href":"https://api.github.com/repos/ray-project/ray/pulls/comments/2429996629"},"html":{"href":"https://github.com/ray-project/ray/pull/57698#discussion_r2429996629"},"pull_request":{"href":"https://api.github.com/repos/ray-project/ray/pulls/57698"}},"reactions":{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2429996629/reactions","total_count":0,"+1":0,"-1":0,"laugh":0,"hooray":0,"confused":0,"heart":0,"rocket":0,"eyes":0},"start_line":null,"original_start_line":null,"start_side":null,"line":11,"original_line":11,"side":"RIGHT","in_reply_to_id":2429995559,"author_association":"CONTRIBUTOR","original_position":11,"position":11,"subject_type":"line"},{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2429997932","pull_request_review_id":3336833217,"id":2429997932,"node_id":"PRRC_kwDOBEmZvc6Q1tNs","diff_hunk":"@@ -0,0 +1,153 @@\n+(serve-external-scale-webhook)=\n+\n+:::{warning}\n+This API is in alpha and may change before becoming stable.\n+:::\n+\n+# External Scaling Webhook\n+\n+Ray Serve exposes a REST API endpoint that you can use to dynamically scale your deployments from outside the Ray cluster. This endpoint gives you flexibility to implement custom scaling logic based on any metrics or signals you choose, such as external monitoring systems, business metrics, or predictive models.\n+\n+## Overview\n+\n+The external scaling webhook provides programmatic control over the number of replicas for any deployment in your Ray Serve application. Unlike Ray Serve's built-in autoscaling, which scales based on queue depth and ongoing requests, this webhook allows you to scale based on any external criteria you define.\n+\n+## Prerequisites\n+\n+Before you can use the external scaling webhook, you must enable it in your Ray Serve application configuration:\n+\n+### Enable external scaler\n+\n+Set `external_scaler_enabled: true` in your application configuration:\n+\n+```yaml\n+applications:\n+  - name: my-app\n+    import_path: my_module:app\n+    external_scaler_enabled: true\n+    deployments:\n+      - name: my-deployment\n+        num_replicas: 1\n+```\n+\n+:::{warning}\n+External scaling and built-in autoscaling are mutually exclusive. You can't use both for the same application.\n+\n+- If you set `external_scaler_enabled: true`, you **must not** configure `autoscaling_config` on any deployment in that application.\n+- If you configure `autoscaling_config` on any deployment, you **must not** set `external_scaler_enabled: true` for the application.\n+\n+Attempting to use both will result in an error.","path":"doc/source/serve/advanced-guides/external-scaling-webhook.md","commit_id":"1c22056eb240d871bb816191419f4a569ed3ae69","original_commit_id":"1c22056eb240d871bb816191419f4a569ed3ae69","user":{"login":"angelinalg","id":122562471,"node_id":"U_kgDOB04npw","avatar_url":"https://avatars.githubusercontent.com/u/122562471?v=4","gravatar_id":"","url":"https://api.github.com/users/angelinalg","html_url":"https://github.com/angelinalg","followers_url":"https://api.github.com/users/angelinalg/followers","following_url":"https://api.github.com/users/angelinalg/following{/other_user}","gists_url":"https://api.github.com/users/angelinalg/gists{/gist_id}","starred_url":"https://api.github.com/users/angelinalg/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/angelinalg/subscriptions","organizations_url":"https://api.github.com/users/angelinalg/orgs","repos_url":"https://api.github.com/users/angelinalg/repos","events_url":"https://api.github.com/users/angelinalg/events{/privacy}","received_events_url":"https://api.github.com/users/angelinalg/received_events","type":"User","user_view_type":"public","site_admin":false},"body":"```suggestion\r\nAttempting to use both results in an error.\r\n```","created_at":"2025-10-14T17:52:05Z","updated_at":"2025-10-14T17:56:24Z","html_url":"https://github.com/ray-project/ray/pull/57698#discussion_r2429997932","pull_request_url":"https://api.github.com/repos/ray-project/ray/pulls/57698","_links":{"self":{"href":"https://api.github.com/repos/ray-project/ray/pulls/comments/2429997932"},"html":{"href":"https://github.com/ray-project/ray/pull/57698#discussion_r2429997932"},"pull_request":{"href":"https://api.github.com/repos/ray-project/ray/pulls/57698"}},"reactions":{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2429997932/reactions","total_count":0,"+1":0,"-1":0,"laugh":0,"hooray":0,"confused":0,"heart":0,"rocket":0,"eyes":0},"start_line":null,"original_start_line":null,"start_side":null,"line":39,"original_line":39,"side":"RIGHT","author_association":"CONTRIBUTOR","original_position":39,"position":39,"subject_type":"line"},{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2429998411","pull_request_review_id":3336833217,"id":2429998411,"node_id":"PRRC_kwDOBEmZvc6Q1tVL","diff_hunk":"@@ -0,0 +1,153 @@\n+(serve-external-scale-webhook)=\n+\n+:::{warning}\n+This API is in alpha and may change before becoming stable.\n+:::\n+\n+# External Scaling Webhook\n+\n+Ray Serve exposes a REST API endpoint that you can use to dynamically scale your deployments from outside the Ray cluster. This endpoint gives you flexibility to implement custom scaling logic based on any metrics or signals you choose, such as external monitoring systems, business metrics, or predictive models.\n+\n+## Overview\n+\n+The external scaling webhook provides programmatic control over the number of replicas for any deployment in your Ray Serve application. Unlike Ray Serve's built-in autoscaling, which scales based on queue depth and ongoing requests, this webhook allows you to scale based on any external criteria you define.\n+\n+## Prerequisites\n+\n+Before you can use the external scaling webhook, you must enable it in your Ray Serve application configuration:\n+\n+### Enable external scaler\n+\n+Set `external_scaler_enabled: true` in your application configuration:\n+\n+```yaml\n+applications:\n+  - name: my-app\n+    import_path: my_module:app\n+    external_scaler_enabled: true\n+    deployments:\n+      - name: my-deployment\n+        num_replicas: 1\n+```\n+\n+:::{warning}\n+External scaling and built-in autoscaling are mutually exclusive. You can't use both for the same application.\n+\n+- If you set `external_scaler_enabled: true`, you **must not** configure `autoscaling_config` on any deployment in that application.\n+- If you configure `autoscaling_config` on any deployment, you **must not** set `external_scaler_enabled: true` for the application.\n+\n+Attempting to use both will result in an error.\n+:::\n+\n+### Get authentication token\n+\n+The external scaling webhook requires authentication using a bearer token. You can obtain this token from the Ray Dashboard UI:\n+\n+1. Open the Ray Dashboard in your browser (typically at `http://localhost:8265`).","path":"doc/source/serve/advanced-guides/external-scaling-webhook.md","commit_id":"1c22056eb240d871bb816191419f4a569ed3ae69","original_commit_id":"1c22056eb240d871bb816191419f4a569ed3ae69","user":{"login":"angelinalg","id":122562471,"node_id":"U_kgDOB04npw","avatar_url":"https://avatars.githubusercontent.com/u/122562471?v=4","gravatar_id":"","url":"https://api.github.com/users/angelinalg","html_url":"https://github.com/angelinalg","followers_url":"https://api.github.com/users/angelinalg/followers","following_url":"https://api.github.com/users/angelinalg/following{/other_user}","gists_url":"https://api.github.com/users/angelinalg/gists{/gist_id}","starred_url":"https://api.github.com/users/angelinalg/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/angelinalg/subscriptions","organizations_url":"https://api.github.com/users/angelinalg/orgs","repos_url":"https://api.github.com/users/angelinalg/repos","events_url":"https://api.github.com/users/angelinalg/events{/privacy}","received_events_url":"https://api.github.com/users/angelinalg/received_events","type":"User","user_view_type":"public","site_admin":false},"body":"```suggestion\r\n1. Open the Ray dashboard in your browser (typically at `http://localhost:8265`).\r\n```","created_at":"2025-10-14T17:52:18Z","updated_at":"2025-10-14T17:56:24Z","html_url":"https://github.com/ray-project/ray/pull/57698#discussion_r2429998411","pull_request_url":"https://api.github.com/repos/ray-project/ray/pulls/57698","_links":{"self":{"href":"https://api.github.com/repos/ray-project/ray/pulls/comments/2429998411"},"html":{"href":"https://github.com/ray-project/ray/pull/57698#discussion_r2429998411"},"pull_request":{"href":"https://api.github.com/repos/ray-project/ray/pulls/57698"}},"reactions":{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2429998411/reactions","total_count":0,"+1":0,"-1":0,"laugh":0,"hooray":0,"confused":0,"heart":0,"rocket":0,"eyes":0},"start_line":null,"original_start_line":null,"start_side":null,"line":46,"original_line":46,"side":"RIGHT","author_association":"CONTRIBUTOR","original_position":46,"position":46,"subject_type":"line"},{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2430000707","pull_request_review_id":3336833217,"id":2430000707,"node_id":"PRRC_kwDOBEmZvc6Q1t5D","diff_hunk":"@@ -0,0 +1,153 @@\n+(serve-external-scale-webhook)=\n+\n+:::{warning}\n+This API is in alpha and may change before becoming stable.\n+:::\n+\n+# External Scaling Webhook\n+\n+Ray Serve exposes a REST API endpoint that you can use to dynamically scale your deployments from outside the Ray cluster. This endpoint gives you flexibility to implement custom scaling logic based on any metrics or signals you choose, such as external monitoring systems, business metrics, or predictive models.\n+\n+## Overview\n+\n+The external scaling webhook provides programmatic control over the number of replicas for any deployment in your Ray Serve application. Unlike Ray Serve's built-in autoscaling, which scales based on queue depth and ongoing requests, this webhook allows you to scale based on any external criteria you define.\n+\n+## Prerequisites\n+\n+Before you can use the external scaling webhook, you must enable it in your Ray Serve application configuration:\n+\n+### Enable external scaler\n+\n+Set `external_scaler_enabled: true` in your application configuration:\n+\n+```yaml\n+applications:\n+  - name: my-app\n+    import_path: my_module:app\n+    external_scaler_enabled: true\n+    deployments:\n+      - name: my-deployment\n+        num_replicas: 1\n+```\n+\n+:::{warning}\n+External scaling and built-in autoscaling are mutually exclusive. You can't use both for the same application.\n+\n+- If you set `external_scaler_enabled: true`, you **must not** configure `autoscaling_config` on any deployment in that application.\n+- If you configure `autoscaling_config` on any deployment, you **must not** set `external_scaler_enabled: true` for the application.\n+\n+Attempting to use both will result in an error.\n+:::\n+\n+### Get authentication token\n+\n+The external scaling webhook requires authentication using a bearer token. You can obtain this token from the Ray Dashboard UI:\n+\n+1. Open the Ray Dashboard in your browser (typically at `http://localhost:8265`).\n+2. Navigate to the Serve section.\n+3. Find and copy the authentication token for your application.\n+\n+## API endpoint\n+\n+The webhook is available at the following endpoint:\n+\n+```\n+POST /api/v1/applications/{application_name}/deployments/{deployment_name}/scale\n+```\n+\n+**Path Parameters:**","path":"doc/source/serve/advanced-guides/external-scaling-webhook.md","commit_id":"1c22056eb240d871bb816191419f4a569ed3ae69","original_commit_id":"1c22056eb240d871bb816191419f4a569ed3ae69","user":{"login":"angelinalg","id":122562471,"node_id":"U_kgDOB04npw","avatar_url":"https://avatars.githubusercontent.com/u/122562471?v=4","gravatar_id":"","url":"https://api.github.com/users/angelinalg","html_url":"https://github.com/angelinalg","followers_url":"https://api.github.com/users/angelinalg/followers","following_url":"https://api.github.com/users/angelinalg/following{/other_user}","gists_url":"https://api.github.com/users/angelinalg/gists{/gist_id}","starred_url":"https://api.github.com/users/angelinalg/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/angelinalg/subscriptions","organizations_url":"https://api.github.com/users/angelinalg/orgs","repos_url":"https://api.github.com/users/angelinalg/repos","events_url":"https://api.github.com/users/angelinalg/events{/privacy}","received_events_url":"https://api.github.com/users/angelinalg/received_events","type":"User","user_view_type":"public","site_admin":false},"body":"```suggestion\r\n### Path Parameters\r\n```","created_at":"2025-10-14T17:53:14Z","updated_at":"2025-10-14T17:56:24Z","html_url":"https://github.com/ray-project/ray/pull/57698#discussion_r2430000707","pull_request_url":"https://api.github.com/repos/ray-project/ray/pulls/57698","_links":{"self":{"href":"https://api.github.com/repos/ray-project/ray/pulls/comments/2430000707"},"html":{"href":"https://github.com/ray-project/ray/pull/57698#discussion_r2430000707"},"pull_request":{"href":"https://api.github.com/repos/ray-project/ray/pulls/57698"}},"reactions":{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2430000707/reactions","total_count":0,"+1":0,"-1":0,"laugh":0,"hooray":0,"confused":0,"heart":0,"rocket":0,"eyes":0},"start_line":null,"original_start_line":null,"start_side":null,"line":58,"original_line":58,"side":"RIGHT","author_association":"CONTRIBUTOR","original_position":58,"position":58,"subject_type":"line"},{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2430001155","pull_request_review_id":3336833217,"id":2430001155,"node_id":"PRRC_kwDOBEmZvc6Q1uAD","diff_hunk":"@@ -0,0 +1,153 @@\n+(serve-external-scale-webhook)=\n+\n+:::{warning}\n+This API is in alpha and may change before becoming stable.\n+:::\n+\n+# External Scaling Webhook\n+\n+Ray Serve exposes a REST API endpoint that you can use to dynamically scale your deployments from outside the Ray cluster. This endpoint gives you flexibility to implement custom scaling logic based on any metrics or signals you choose, such as external monitoring systems, business metrics, or predictive models.\n+\n+## Overview\n+\n+The external scaling webhook provides programmatic control over the number of replicas for any deployment in your Ray Serve application. Unlike Ray Serve's built-in autoscaling, which scales based on queue depth and ongoing requests, this webhook allows you to scale based on any external criteria you define.\n+\n+## Prerequisites\n+\n+Before you can use the external scaling webhook, you must enable it in your Ray Serve application configuration:\n+\n+### Enable external scaler\n+\n+Set `external_scaler_enabled: true` in your application configuration:\n+\n+```yaml\n+applications:\n+  - name: my-app\n+    import_path: my_module:app\n+    external_scaler_enabled: true\n+    deployments:\n+      - name: my-deployment\n+        num_replicas: 1\n+```\n+\n+:::{warning}\n+External scaling and built-in autoscaling are mutually exclusive. You can't use both for the same application.\n+\n+- If you set `external_scaler_enabled: true`, you **must not** configure `autoscaling_config` on any deployment in that application.\n+- If you configure `autoscaling_config` on any deployment, you **must not** set `external_scaler_enabled: true` for the application.\n+\n+Attempting to use both will result in an error.\n+:::\n+\n+### Get authentication token\n+\n+The external scaling webhook requires authentication using a bearer token. You can obtain this token from the Ray Dashboard UI:\n+\n+1. Open the Ray Dashboard in your browser (typically at `http://localhost:8265`).\n+2. Navigate to the Serve section.\n+3. Find and copy the authentication token for your application.\n+\n+## API endpoint\n+\n+The webhook is available at the following endpoint:\n+\n+```\n+POST /api/v1/applications/{application_name}/deployments/{deployment_name}/scale\n+```\n+\n+**Path Parameters:**\n+- `application_name`: The name of your Serve application.\n+- `deployment_name`: The name of the deployment you want to scale.\n+\n+**Headers:**","path":"doc/source/serve/advanced-guides/external-scaling-webhook.md","commit_id":"1c22056eb240d871bb816191419f4a569ed3ae69","original_commit_id":"1c22056eb240d871bb816191419f4a569ed3ae69","user":{"login":"angelinalg","id":122562471,"node_id":"U_kgDOB04npw","avatar_url":"https://avatars.githubusercontent.com/u/122562471?v=4","gravatar_id":"","url":"https://api.github.com/users/angelinalg","html_url":"https://github.com/angelinalg","followers_url":"https://api.github.com/users/angelinalg/followers","following_url":"https://api.github.com/users/angelinalg/following{/other_user}","gists_url":"https://api.github.com/users/angelinalg/gists{/gist_id}","starred_url":"https://api.github.com/users/angelinalg/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/angelinalg/subscriptions","organizations_url":"https://api.github.com/users/angelinalg/orgs","repos_url":"https://api.github.com/users/angelinalg/repos","events_url":"https://api.github.com/users/angelinalg/events{/privacy}","received_events_url":"https://api.github.com/users/angelinalg/received_events","type":"User","user_view_type":"public","site_admin":false},"body":"```suggestion\r\n### Headers\r\n```","created_at":"2025-10-14T17:53:23Z","updated_at":"2025-10-14T17:56:24Z","html_url":"https://github.com/ray-project/ray/pull/57698#discussion_r2430001155","pull_request_url":"https://api.github.com/repos/ray-project/ray/pulls/57698","_links":{"self":{"href":"https://api.github.com/repos/ray-project/ray/pulls/comments/2430001155"},"html":{"href":"https://github.com/ray-project/ray/pull/57698#discussion_r2430001155"},"pull_request":{"href":"https://api.github.com/repos/ray-project/ray/pulls/57698"}},"reactions":{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2430001155/reactions","total_count":0,"+1":0,"-1":0,"laugh":0,"hooray":0,"confused":0,"heart":0,"rocket":0,"eyes":0},"start_line":null,"original_start_line":null,"start_side":null,"line":62,"original_line":62,"side":"RIGHT","author_association":"CONTRIBUTOR","original_position":62,"position":62,"subject_type":"line"},{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2430002315","pull_request_review_id":3336833217,"id":2430002315,"node_id":"PRRC_kwDOBEmZvc6Q1uSL","diff_hunk":"@@ -0,0 +1,153 @@\n+(serve-external-scale-webhook)=\n+\n+:::{warning}\n+This API is in alpha and may change before becoming stable.\n+:::\n+\n+# External Scaling Webhook\n+\n+Ray Serve exposes a REST API endpoint that you can use to dynamically scale your deployments from outside the Ray cluster. This endpoint gives you flexibility to implement custom scaling logic based on any metrics or signals you choose, such as external monitoring systems, business metrics, or predictive models.\n+\n+## Overview\n+\n+The external scaling webhook provides programmatic control over the number of replicas for any deployment in your Ray Serve application. Unlike Ray Serve's built-in autoscaling, which scales based on queue depth and ongoing requests, this webhook allows you to scale based on any external criteria you define.\n+\n+## Prerequisites\n+\n+Before you can use the external scaling webhook, you must enable it in your Ray Serve application configuration:\n+\n+### Enable external scaler\n+\n+Set `external_scaler_enabled: true` in your application configuration:\n+\n+```yaml\n+applications:\n+  - name: my-app\n+    import_path: my_module:app\n+    external_scaler_enabled: true\n+    deployments:\n+      - name: my-deployment\n+        num_replicas: 1\n+```\n+\n+:::{warning}\n+External scaling and built-in autoscaling are mutually exclusive. You can't use both for the same application.\n+\n+- If you set `external_scaler_enabled: true`, you **must not** configure `autoscaling_config` on any deployment in that application.\n+- If you configure `autoscaling_config` on any deployment, you **must not** set `external_scaler_enabled: true` for the application.\n+\n+Attempting to use both will result in an error.\n+:::\n+\n+### Get authentication token\n+\n+The external scaling webhook requires authentication using a bearer token. You can obtain this token from the Ray Dashboard UI:\n+\n+1. Open the Ray Dashboard in your browser (typically at `http://localhost:8265`).\n+2. Navigate to the Serve section.\n+3. Find and copy the authentication token for your application.\n+\n+## API endpoint\n+\n+The webhook is available at the following endpoint:\n+\n+```\n+POST /api/v1/applications/{application_name}/deployments/{deployment_name}/scale\n+```\n+\n+**Path Parameters:**\n+- `application_name`: The name of your Serve application.\n+- `deployment_name`: The name of the deployment you want to scale.\n+\n+**Headers:**\n+- `Authorization` (required): Bearer token for authentication. Format: `Bearer <token>`\n+- `Content-Type` (required): Must be `application/json`\n+\n+**Request Body:**","path":"doc/source/serve/advanced-guides/external-scaling-webhook.md","commit_id":"1c22056eb240d871bb816191419f4a569ed3ae69","original_commit_id":"1c22056eb240d871bb816191419f4a569ed3ae69","user":{"login":"angelinalg","id":122562471,"node_id":"U_kgDOB04npw","avatar_url":"https://avatars.githubusercontent.com/u/122562471?v=4","gravatar_id":"","url":"https://api.github.com/users/angelinalg","html_url":"https://github.com/angelinalg","followers_url":"https://api.github.com/users/angelinalg/followers","following_url":"https://api.github.com/users/angelinalg/following{/other_user}","gists_url":"https://api.github.com/users/angelinalg/gists{/gist_id}","starred_url":"https://api.github.com/users/angelinalg/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/angelinalg/subscriptions","organizations_url":"https://api.github.com/users/angelinalg/orgs","repos_url":"https://api.github.com/users/angelinalg/repos","events_url":"https://api.github.com/users/angelinalg/events{/privacy}","received_events_url":"https://api.github.com/users/angelinalg/received_events","type":"User","user_view_type":"public","site_admin":false},"body":"```suggestion\r\n### Request Body\r\n```","created_at":"2025-10-14T17:53:56Z","updated_at":"2025-10-14T17:56:24Z","html_url":"https://github.com/ray-project/ray/pull/57698#discussion_r2430002315","pull_request_url":"https://api.github.com/repos/ray-project/ray/pulls/57698","_links":{"self":{"href":"https://api.github.com/repos/ray-project/ray/pulls/comments/2430002315"},"html":{"href":"https://github.com/ray-project/ray/pull/57698#discussion_r2430002315"},"pull_request":{"href":"https://api.github.com/repos/ray-project/ray/pulls/57698"}},"reactions":{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2430002315/reactions","total_count":0,"+1":0,"-1":0,"laugh":0,"hooray":0,"confused":0,"heart":0,"rocket":0,"eyes":0},"start_line":null,"original_start_line":null,"start_side":null,"line":66,"original_line":66,"side":"RIGHT","author_association":"CONTRIBUTOR","original_position":66,"position":66,"subject_type":"line"},{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2430003400","pull_request_review_id":3336833217,"id":2430003400,"node_id":"PRRC_kwDOBEmZvc6Q1ujI","diff_hunk":"@@ -0,0 +1,153 @@\n+(serve-external-scale-webhook)=\n+\n+:::{warning}\n+This API is in alpha and may change before becoming stable.\n+:::\n+\n+# External Scaling Webhook\n+\n+Ray Serve exposes a REST API endpoint that you can use to dynamically scale your deployments from outside the Ray cluster. This endpoint gives you flexibility to implement custom scaling logic based on any metrics or signals you choose, such as external monitoring systems, business metrics, or predictive models.\n+\n+## Overview\n+\n+The external scaling webhook provides programmatic control over the number of replicas for any deployment in your Ray Serve application. Unlike Ray Serve's built-in autoscaling, which scales based on queue depth and ongoing requests, this webhook allows you to scale based on any external criteria you define.\n+\n+## Prerequisites\n+\n+Before you can use the external scaling webhook, you must enable it in your Ray Serve application configuration:\n+\n+### Enable external scaler\n+\n+Set `external_scaler_enabled: true` in your application configuration:\n+\n+```yaml\n+applications:\n+  - name: my-app\n+    import_path: my_module:app\n+    external_scaler_enabled: true\n+    deployments:\n+      - name: my-deployment\n+        num_replicas: 1\n+```\n+\n+:::{warning}\n+External scaling and built-in autoscaling are mutually exclusive. You can't use both for the same application.\n+\n+- If you set `external_scaler_enabled: true`, you **must not** configure `autoscaling_config` on any deployment in that application.\n+- If you configure `autoscaling_config` on any deployment, you **must not** set `external_scaler_enabled: true` for the application.\n+\n+Attempting to use both will result in an error.\n+:::\n+\n+### Get authentication token\n+\n+The external scaling webhook requires authentication using a bearer token. You can obtain this token from the Ray Dashboard UI:\n+\n+1. Open the Ray Dashboard in your browser (typically at `http://localhost:8265`).\n+2. Navigate to the Serve section.\n+3. Find and copy the authentication token for your application.\n+\n+## API endpoint\n+\n+The webhook is available at the following endpoint:\n+\n+```\n+POST /api/v1/applications/{application_name}/deployments/{deployment_name}/scale\n+```\n+\n+**Path Parameters:**\n+- `application_name`: The name of your Serve application.\n+- `deployment_name`: The name of the deployment you want to scale.\n+\n+**Headers:**\n+- `Authorization` (required): Bearer token for authentication. Format: `Bearer <token>`\n+- `Content-Type` (required): Must be `application/json`\n+\n+**Request Body:**\n+\n+The following example shows the request body structure:\n+\n+```json\n+{\n+    \"target_num_replicas\": 5\n+}\n+```\n+\n+The request body must conform to the [`ScaleDeploymentRequest`](https://docs.ray.io/en/latest/serve/api/doc/ray.serve.schema.ScaleDeploymentRequest.html) schema:\n+\n+- `target_num_replicas` (integer, required): The target number of replicas for the deployment. Must be a non-negative integer.\n+\n+\n+## Example - Predictive scaling","path":"doc/source/serve/advanced-guides/external-scaling-webhook.md","commit_id":"1c22056eb240d871bb816191419f4a569ed3ae69","original_commit_id":"1c22056eb240d871bb816191419f4a569ed3ae69","user":{"login":"angelinalg","id":122562471,"node_id":"U_kgDOB04npw","avatar_url":"https://avatars.githubusercontent.com/u/122562471?v=4","gravatar_id":"","url":"https://api.github.com/users/angelinalg","html_url":"https://github.com/angelinalg","followers_url":"https://api.github.com/users/angelinalg/followers","following_url":"https://api.github.com/users/angelinalg/following{/other_user}","gists_url":"https://api.github.com/users/angelinalg/gists{/gist_id}","starred_url":"https://api.github.com/users/angelinalg/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/angelinalg/subscriptions","organizations_url":"https://api.github.com/users/angelinalg/orgs","repos_url":"https://api.github.com/users/angelinalg/repos","events_url":"https://api.github.com/users/angelinalg/events{/privacy}","received_events_url":"https://api.github.com/users/angelinalg/received_events","type":"User","user_view_type":"public","site_admin":false},"body":"```suggestion\r\n## Example: Predictive scaling\r\n```","created_at":"2025-10-14T17:54:25Z","updated_at":"2025-10-14T17:56:24Z","html_url":"https://github.com/ray-project/ray/pull/57698#discussion_r2430003400","pull_request_url":"https://api.github.com/repos/ray-project/ray/pulls/57698","_links":{"self":{"href":"https://api.github.com/repos/ray-project/ray/pulls/comments/2430003400"},"html":{"href":"https://github.com/ray-project/ray/pull/57698#discussion_r2430003400"},"pull_request":{"href":"https://api.github.com/repos/ray-project/ray/pulls/57698"}},"reactions":{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2430003400/reactions","total_count":0,"+1":0,"-1":0,"laugh":0,"hooray":0,"confused":0,"heart":0,"rocket":0,"eyes":0},"start_line":null,"original_start_line":null,"start_side":null,"line":81,"original_line":81,"side":"RIGHT","author_association":"CONTRIBUTOR","original_position":81,"position":81,"subject_type":"line"},{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2430005021","pull_request_review_id":3336833217,"id":2430005021,"node_id":"PRRC_kwDOBEmZvc6Q1u8d","diff_hunk":"@@ -0,0 +1,153 @@\n+(serve-external-scale-webhook)=\n+\n+:::{warning}\n+This API is in alpha and may change before becoming stable.\n+:::\n+\n+# External Scaling Webhook\n+\n+Ray Serve exposes a REST API endpoint that you can use to dynamically scale your deployments from outside the Ray cluster. This endpoint gives you flexibility to implement custom scaling logic based on any metrics or signals you choose, such as external monitoring systems, business metrics, or predictive models.\n+\n+## Overview\n+\n+The external scaling webhook provides programmatic control over the number of replicas for any deployment in your Ray Serve application. Unlike Ray Serve's built-in autoscaling, which scales based on queue depth and ongoing requests, this webhook allows you to scale based on any external criteria you define.\n+\n+## Prerequisites\n+\n+Before you can use the external scaling webhook, you must enable it in your Ray Serve application configuration:\n+\n+### Enable external scaler\n+\n+Set `external_scaler_enabled: true` in your application configuration:\n+\n+```yaml\n+applications:\n+  - name: my-app\n+    import_path: my_module:app\n+    external_scaler_enabled: true\n+    deployments:\n+      - name: my-deployment\n+        num_replicas: 1\n+```\n+\n+:::{warning}\n+External scaling and built-in autoscaling are mutually exclusive. You can't use both for the same application.\n+\n+- If you set `external_scaler_enabled: true`, you **must not** configure `autoscaling_config` on any deployment in that application.\n+- If you configure `autoscaling_config` on any deployment, you **must not** set `external_scaler_enabled: true` for the application.\n+\n+Attempting to use both will result in an error.\n+:::\n+\n+### Get authentication token\n+\n+The external scaling webhook requires authentication using a bearer token. You can obtain this token from the Ray Dashboard UI:\n+\n+1. Open the Ray Dashboard in your browser (typically at `http://localhost:8265`).\n+2. Navigate to the Serve section.\n+3. Find and copy the authentication token for your application.\n+\n+## API endpoint\n+\n+The webhook is available at the following endpoint:\n+\n+```\n+POST /api/v1/applications/{application_name}/deployments/{deployment_name}/scale\n+```\n+\n+**Path Parameters:**\n+- `application_name`: The name of your Serve application.\n+- `deployment_name`: The name of the deployment you want to scale.\n+\n+**Headers:**\n+- `Authorization` (required): Bearer token for authentication. Format: `Bearer <token>`\n+- `Content-Type` (required): Must be `application/json`\n+\n+**Request Body:**\n+\n+The following example shows the request body structure:\n+\n+```json\n+{\n+    \"target_num_replicas\": 5\n+}\n+```\n+\n+The request body must conform to the [`ScaleDeploymentRequest`](https://docs.ray.io/en/latest/serve/api/doc/ray.serve.schema.ScaleDeploymentRequest.html) schema:\n+\n+- `target_num_replicas` (integer, required): The target number of replicas for the deployment. Must be a non-negative integer.\n+\n+\n+## Example - Predictive scaling\n+\n+Implement predictive scaling based on historical patterns or forecasts. For instance, you can preemptively scale up before anticipated traffic spikes:\n+\n+```python\n+import requests\n+from datetime import datetime\n+\n+def predictive_scale(\n+    application_name: str,\n+    deployment_name: str,\n+    auth_token: str,\n+    serve_endpoint: str = \"http://localhost:8000\"\n+) -> bool:\n+    \"\"\"Scale based on time of day and historical patterns.\"\"\"\n+    hour = datetime.now().hour\n+    \n+    # Define scaling profile based on historical traffic patterns\n+    if 9 <= hour < 17:  # Business hours\n+        target_replicas = 10\n+    elif 17 <= hour < 22:  # Evening peak\n+        target_replicas = 15\n+    else:  # Off-peak hours\n+        target_replicas = 3\n+    \n+    url = (\n+        f\"{serve_endpoint}/api/v1/applications/{application_name}\"\n+        f\"/deployments/{deployment_name}/scale\"\n+    )\n+    \n+    headers = {\n+        \"Authorization\": f\"Bearer {auth_token}\",\n+        \"Content-Type\": \"application/json\"\n+    }\n+    \n+    response = requests.post(\n+        url,\n+        headers=headers,\n+        json={\"target_num_replicas\": target_replicas}\n+    )\n+    \n+    return response.status_code == 200\n+\n+```\n+\n+## Use cases\n+\n+The external scaling webhook is useful for several scenarios where you need custom scaling logic beyond what Ray Serve's built-in autoscaling provides:\n+\n+### Custom metric-based scaling\n+\n+Scale your deployments based on business or application metrics that Ray Serve doesn't track automatically:","path":"doc/source/serve/advanced-guides/external-scaling-webhook.md","commit_id":"1c22056eb240d871bb816191419f4a569ed3ae69","original_commit_id":"1c22056eb240d871bb816191419f4a569ed3ae69","user":{"login":"angelinalg","id":122562471,"node_id":"U_kgDOB04npw","avatar_url":"https://avatars.githubusercontent.com/u/122562471?v=4","gravatar_id":"","url":"https://api.github.com/users/angelinalg","html_url":"https://github.com/angelinalg","followers_url":"https://api.github.com/users/angelinalg/followers","following_url":"https://api.github.com/users/angelinalg/following{/other_user}","gists_url":"https://api.github.com/users/angelinalg/gists{/gist_id}","starred_url":"https://api.github.com/users/angelinalg/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/angelinalg/subscriptions","organizations_url":"https://api.github.com/users/angelinalg/orgs","repos_url":"https://api.github.com/users/angelinalg/repos","events_url":"https://api.github.com/users/angelinalg/events{/privacy}","received_events_url":"https://api.github.com/users/angelinalg/received_events","type":"User","user_view_type":"public","site_admin":false},"body":"```suggestion\r\nScale your deployments based on business or application metrics that Ray Serve doesn't track automatically like the following:\r\n```","created_at":"2025-10-14T17:55:02Z","updated_at":"2025-10-14T17:56:24Z","html_url":"https://github.com/ray-project/ray/pull/57698#discussion_r2430005021","pull_request_url":"https://api.github.com/repos/ray-project/ray/pulls/57698","_links":{"self":{"href":"https://api.github.com/repos/ray-project/ray/pulls/comments/2430005021"},"html":{"href":"https://github.com/ray-project/ray/pull/57698#discussion_r2430005021"},"pull_request":{"href":"https://api.github.com/repos/ray-project/ray/pulls/57698"}},"reactions":{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2430005021/reactions","total_count":0,"+1":0,"-1":0,"laugh":0,"hooray":0,"confused":0,"heart":0,"rocket":0,"eyes":0},"start_line":null,"original_start_line":null,"start_side":null,"line":132,"original_line":132,"side":"RIGHT","author_association":"CONTRIBUTOR","original_position":132,"position":132,"subject_type":"line"},{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2430005448","pull_request_review_id":3336833217,"id":2430005448,"node_id":"PRRC_kwDOBEmZvc6Q1vDI","diff_hunk":"@@ -0,0 +1,153 @@\n+(serve-external-scale-webhook)=\n+\n+:::{warning}\n+This API is in alpha and may change before becoming stable.\n+:::\n+\n+# External Scaling Webhook\n+\n+Ray Serve exposes a REST API endpoint that you can use to dynamically scale your deployments from outside the Ray cluster. This endpoint gives you flexibility to implement custom scaling logic based on any metrics or signals you choose, such as external monitoring systems, business metrics, or predictive models.\n+\n+## Overview\n+\n+The external scaling webhook provides programmatic control over the number of replicas for any deployment in your Ray Serve application. Unlike Ray Serve's built-in autoscaling, which scales based on queue depth and ongoing requests, this webhook allows you to scale based on any external criteria you define.\n+\n+## Prerequisites\n+\n+Before you can use the external scaling webhook, you must enable it in your Ray Serve application configuration:\n+\n+### Enable external scaler\n+\n+Set `external_scaler_enabled: true` in your application configuration:\n+\n+```yaml\n+applications:\n+  - name: my-app\n+    import_path: my_module:app\n+    external_scaler_enabled: true\n+    deployments:\n+      - name: my-deployment\n+        num_replicas: 1\n+```\n+\n+:::{warning}\n+External scaling and built-in autoscaling are mutually exclusive. You can't use both for the same application.\n+\n+- If you set `external_scaler_enabled: true`, you **must not** configure `autoscaling_config` on any deployment in that application.\n+- If you configure `autoscaling_config` on any deployment, you **must not** set `external_scaler_enabled: true` for the application.\n+\n+Attempting to use both will result in an error.\n+:::\n+\n+### Get authentication token\n+\n+The external scaling webhook requires authentication using a bearer token. You can obtain this token from the Ray Dashboard UI:\n+\n+1. Open the Ray Dashboard in your browser (typically at `http://localhost:8265`).\n+2. Navigate to the Serve section.\n+3. Find and copy the authentication token for your application.\n+\n+## API endpoint\n+\n+The webhook is available at the following endpoint:\n+\n+```\n+POST /api/v1/applications/{application_name}/deployments/{deployment_name}/scale\n+```\n+\n+**Path Parameters:**\n+- `application_name`: The name of your Serve application.\n+- `deployment_name`: The name of the deployment you want to scale.\n+\n+**Headers:**\n+- `Authorization` (required): Bearer token for authentication. Format: `Bearer <token>`\n+- `Content-Type` (required): Must be `application/json`\n+\n+**Request Body:**\n+\n+The following example shows the request body structure:\n+\n+```json\n+{\n+    \"target_num_replicas\": 5\n+}\n+```\n+\n+The request body must conform to the [`ScaleDeploymentRequest`](https://docs.ray.io/en/latest/serve/api/doc/ray.serve.schema.ScaleDeploymentRequest.html) schema:\n+\n+- `target_num_replicas` (integer, required): The target number of replicas for the deployment. Must be a non-negative integer.\n+\n+\n+## Example - Predictive scaling\n+\n+Implement predictive scaling based on historical patterns or forecasts. For instance, you can preemptively scale up before anticipated traffic spikes:\n+\n+```python\n+import requests\n+from datetime import datetime\n+\n+def predictive_scale(\n+    application_name: str,\n+    deployment_name: str,\n+    auth_token: str,\n+    serve_endpoint: str = \"http://localhost:8000\"\n+) -> bool:\n+    \"\"\"Scale based on time of day and historical patterns.\"\"\"\n+    hour = datetime.now().hour\n+    \n+    # Define scaling profile based on historical traffic patterns\n+    if 9 <= hour < 17:  # Business hours\n+        target_replicas = 10\n+    elif 17 <= hour < 22:  # Evening peak\n+        target_replicas = 15\n+    else:  # Off-peak hours\n+        target_replicas = 3\n+    \n+    url = (\n+        f\"{serve_endpoint}/api/v1/applications/{application_name}\"\n+        f\"/deployments/{deployment_name}/scale\"\n+    )\n+    \n+    headers = {\n+        \"Authorization\": f\"Bearer {auth_token}\",\n+        \"Content-Type\": \"application/json\"\n+    }\n+    \n+    response = requests.post(\n+        url,\n+        headers=headers,\n+        json={\"target_num_replicas\": target_replicas}\n+    )\n+    \n+    return response.status_code == 200\n+\n+```\n+\n+## Use cases\n+\n+The external scaling webhook is useful for several scenarios where you need custom scaling logic beyond what Ray Serve's built-in autoscaling provides:\n+\n+### Custom metric-based scaling\n+\n+Scale your deployments based on business or application metrics that Ray Serve doesn't track automatically:\n+\n+- External monitoring systems such as Prometheus, Datadog, or CloudWatch metrics.\n+- Database query latencies or connection pool sizes.\n+- Cost metrics to optimize for budget constraints.\n+\n+### Predictive and scheduled scaling\n+\n+Implement predictive scaling based on historical patterns or business schedules:","path":"doc/source/serve/advanced-guides/external-scaling-webhook.md","commit_id":"1c22056eb240d871bb816191419f4a569ed3ae69","original_commit_id":"1c22056eb240d871bb816191419f4a569ed3ae69","user":{"login":"angelinalg","id":122562471,"node_id":"U_kgDOB04npw","avatar_url":"https://avatars.githubusercontent.com/u/122562471?v=4","gravatar_id":"","url":"https://api.github.com/users/angelinalg","html_url":"https://github.com/angelinalg","followers_url":"https://api.github.com/users/angelinalg/followers","following_url":"https://api.github.com/users/angelinalg/following{/other_user}","gists_url":"https://api.github.com/users/angelinalg/gists{/gist_id}","starred_url":"https://api.github.com/users/angelinalg/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/angelinalg/subscriptions","organizations_url":"https://api.github.com/users/angelinalg/orgs","repos_url":"https://api.github.com/users/angelinalg/repos","events_url":"https://api.github.com/users/angelinalg/events{/privacy}","received_events_url":"https://api.github.com/users/angelinalg/received_events","type":"User","user_view_type":"public","site_admin":false},"body":"```suggestion\r\nImplement predictive scaling based on historical patterns or business schedules like the following:\r\n```","created_at":"2025-10-14T17:55:15Z","updated_at":"2025-10-14T17:56:24Z","html_url":"https://github.com/ray-project/ray/pull/57698#discussion_r2430005448","pull_request_url":"https://api.github.com/repos/ray-project/ray/pulls/57698","_links":{"self":{"href":"https://api.github.com/repos/ray-project/ray/pulls/comments/2430005448"},"html":{"href":"https://github.com/ray-project/ray/pull/57698#discussion_r2430005448"},"pull_request":{"href":"https://api.github.com/repos/ray-project/ray/pulls/57698"}},"reactions":{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2430005448/reactions","total_count":0,"+1":0,"-1":0,"laugh":0,"hooray":0,"confused":0,"heart":0,"rocket":0,"eyes":0},"start_line":null,"original_start_line":null,"start_side":null,"line":140,"original_line":140,"side":"RIGHT","author_association":"CONTRIBUTOR","original_position":140,"position":140,"subject_type":"line"},{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2430005969","pull_request_review_id":3336833217,"id":2430005969,"node_id":"PRRC_kwDOBEmZvc6Q1vLR","diff_hunk":"@@ -0,0 +1,153 @@\n+(serve-external-scale-webhook)=\n+\n+:::{warning}\n+This API is in alpha and may change before becoming stable.\n+:::\n+\n+# External Scaling Webhook\n+\n+Ray Serve exposes a REST API endpoint that you can use to dynamically scale your deployments from outside the Ray cluster. This endpoint gives you flexibility to implement custom scaling logic based on any metrics or signals you choose, such as external monitoring systems, business metrics, or predictive models.\n+\n+## Overview\n+\n+The external scaling webhook provides programmatic control over the number of replicas for any deployment in your Ray Serve application. Unlike Ray Serve's built-in autoscaling, which scales based on queue depth and ongoing requests, this webhook allows you to scale based on any external criteria you define.\n+\n+## Prerequisites\n+\n+Before you can use the external scaling webhook, you must enable it in your Ray Serve application configuration:\n+\n+### Enable external scaler\n+\n+Set `external_scaler_enabled: true` in your application configuration:\n+\n+```yaml\n+applications:\n+  - name: my-app\n+    import_path: my_module:app\n+    external_scaler_enabled: true\n+    deployments:\n+      - name: my-deployment\n+        num_replicas: 1\n+```\n+\n+:::{warning}\n+External scaling and built-in autoscaling are mutually exclusive. You can't use both for the same application.\n+\n+- If you set `external_scaler_enabled: true`, you **must not** configure `autoscaling_config` on any deployment in that application.\n+- If you configure `autoscaling_config` on any deployment, you **must not** set `external_scaler_enabled: true` for the application.\n+\n+Attempting to use both will result in an error.\n+:::\n+\n+### Get authentication token\n+\n+The external scaling webhook requires authentication using a bearer token. You can obtain this token from the Ray Dashboard UI:\n+\n+1. Open the Ray Dashboard in your browser (typically at `http://localhost:8265`).\n+2. Navigate to the Serve section.\n+3. Find and copy the authentication token for your application.\n+\n+## API endpoint\n+\n+The webhook is available at the following endpoint:\n+\n+```\n+POST /api/v1/applications/{application_name}/deployments/{deployment_name}/scale\n+```\n+\n+**Path Parameters:**\n+- `application_name`: The name of your Serve application.\n+- `deployment_name`: The name of the deployment you want to scale.\n+\n+**Headers:**\n+- `Authorization` (required): Bearer token for authentication. Format: `Bearer <token>`\n+- `Content-Type` (required): Must be `application/json`\n+\n+**Request Body:**\n+\n+The following example shows the request body structure:\n+\n+```json\n+{\n+    \"target_num_replicas\": 5\n+}\n+```\n+\n+The request body must conform to the [`ScaleDeploymentRequest`](https://docs.ray.io/en/latest/serve/api/doc/ray.serve.schema.ScaleDeploymentRequest.html) schema:\n+\n+- `target_num_replicas` (integer, required): The target number of replicas for the deployment. Must be a non-negative integer.\n+\n+\n+## Example - Predictive scaling\n+\n+Implement predictive scaling based on historical patterns or forecasts. For instance, you can preemptively scale up before anticipated traffic spikes:\n+\n+```python\n+import requests\n+from datetime import datetime\n+\n+def predictive_scale(\n+    application_name: str,\n+    deployment_name: str,\n+    auth_token: str,\n+    serve_endpoint: str = \"http://localhost:8000\"\n+) -> bool:\n+    \"\"\"Scale based on time of day and historical patterns.\"\"\"\n+    hour = datetime.now().hour\n+    \n+    # Define scaling profile based on historical traffic patterns\n+    if 9 <= hour < 17:  # Business hours\n+        target_replicas = 10\n+    elif 17 <= hour < 22:  # Evening peak\n+        target_replicas = 15\n+    else:  # Off-peak hours\n+        target_replicas = 3\n+    \n+    url = (\n+        f\"{serve_endpoint}/api/v1/applications/{application_name}\"\n+        f\"/deployments/{deployment_name}/scale\"\n+    )\n+    \n+    headers = {\n+        \"Authorization\": f\"Bearer {auth_token}\",\n+        \"Content-Type\": \"application/json\"\n+    }\n+    \n+    response = requests.post(\n+        url,\n+        headers=headers,\n+        json={\"target_num_replicas\": target_replicas}\n+    )\n+    \n+    return response.status_code == 200\n+\n+```\n+\n+## Use cases\n+\n+The external scaling webhook is useful for several scenarios where you need custom scaling logic beyond what Ray Serve's built-in autoscaling provides:\n+\n+### Custom metric-based scaling\n+\n+Scale your deployments based on business or application metrics that Ray Serve doesn't track automatically:\n+\n+- External monitoring systems such as Prometheus, Datadog, or CloudWatch metrics.\n+- Database query latencies or connection pool sizes.\n+- Cost metrics to optimize for budget constraints.\n+\n+### Predictive and scheduled scaling\n+\n+Implement predictive scaling based on historical patterns or business schedules:\n+\n+- Preemptive scaling before anticipated traffic spikes (such as daily or weekly patterns).","path":"doc/source/serve/advanced-guides/external-scaling-webhook.md","commit_id":"1c22056eb240d871bb816191419f4a569ed3ae69","original_commit_id":"1c22056eb240d871bb816191419f4a569ed3ae69","user":{"login":"angelinalg","id":122562471,"node_id":"U_kgDOB04npw","avatar_url":"https://avatars.githubusercontent.com/u/122562471?v=4","gravatar_id":"","url":"https://api.github.com/users/angelinalg","html_url":"https://github.com/angelinalg","followers_url":"https://api.github.com/users/angelinalg/followers","following_url":"https://api.github.com/users/angelinalg/following{/other_user}","gists_url":"https://api.github.com/users/angelinalg/gists{/gist_id}","starred_url":"https://api.github.com/users/angelinalg/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/angelinalg/subscriptions","organizations_url":"https://api.github.com/users/angelinalg/orgs","repos_url":"https://api.github.com/users/angelinalg/repos","events_url":"https://api.github.com/users/angelinalg/events{/privacy}","received_events_url":"https://api.github.com/users/angelinalg/received_events","type":"User","user_view_type":"public","site_admin":false},"body":"```suggestion\r\n- Preemptive scaling before anticipated traffic spikes such as daily or weekly patterns.\r\n```","created_at":"2025-10-14T17:55:30Z","updated_at":"2025-10-14T17:56:24Z","html_url":"https://github.com/ray-project/ray/pull/57698#discussion_r2430005969","pull_request_url":"https://api.github.com/repos/ray-project/ray/pulls/57698","_links":{"self":{"href":"https://api.github.com/repos/ray-project/ray/pulls/comments/2430005969"},"html":{"href":"https://github.com/ray-project/ray/pull/57698#discussion_r2430005969"},"pull_request":{"href":"https://api.github.com/repos/ray-project/ray/pulls/57698"}},"reactions":{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2430005969/reactions","total_count":0,"+1":0,"-1":0,"laugh":0,"hooray":0,"confused":0,"heart":0,"rocket":0,"eyes":0},"start_line":null,"original_start_line":null,"start_side":null,"line":142,"original_line":142,"side":"RIGHT","author_association":"CONTRIBUTOR","original_position":142,"position":142,"subject_type":"line"},{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2430006217","pull_request_review_id":3336833217,"id":2430006217,"node_id":"PRRC_kwDOBEmZvc6Q1vPJ","diff_hunk":"@@ -0,0 +1,153 @@\n+(serve-external-scale-webhook)=\n+\n+:::{warning}\n+This API is in alpha and may change before becoming stable.\n+:::\n+\n+# External Scaling Webhook\n+\n+Ray Serve exposes a REST API endpoint that you can use to dynamically scale your deployments from outside the Ray cluster. This endpoint gives you flexibility to implement custom scaling logic based on any metrics or signals you choose, such as external monitoring systems, business metrics, or predictive models.\n+\n+## Overview\n+\n+The external scaling webhook provides programmatic control over the number of replicas for any deployment in your Ray Serve application. Unlike Ray Serve's built-in autoscaling, which scales based on queue depth and ongoing requests, this webhook allows you to scale based on any external criteria you define.\n+\n+## Prerequisites\n+\n+Before you can use the external scaling webhook, you must enable it in your Ray Serve application configuration:\n+\n+### Enable external scaler\n+\n+Set `external_scaler_enabled: true` in your application configuration:\n+\n+```yaml\n+applications:\n+  - name: my-app\n+    import_path: my_module:app\n+    external_scaler_enabled: true\n+    deployments:\n+      - name: my-deployment\n+        num_replicas: 1\n+```\n+\n+:::{warning}\n+External scaling and built-in autoscaling are mutually exclusive. You can't use both for the same application.\n+\n+- If you set `external_scaler_enabled: true`, you **must not** configure `autoscaling_config` on any deployment in that application.\n+- If you configure `autoscaling_config` on any deployment, you **must not** set `external_scaler_enabled: true` for the application.\n+\n+Attempting to use both will result in an error.\n+:::\n+\n+### Get authentication token\n+\n+The external scaling webhook requires authentication using a bearer token. You can obtain this token from the Ray Dashboard UI:\n+\n+1. Open the Ray Dashboard in your browser (typically at `http://localhost:8265`).\n+2. Navigate to the Serve section.\n+3. Find and copy the authentication token for your application.\n+\n+## API endpoint\n+\n+The webhook is available at the following endpoint:\n+\n+```\n+POST /api/v1/applications/{application_name}/deployments/{deployment_name}/scale\n+```\n+\n+**Path Parameters:**\n+- `application_name`: The name of your Serve application.\n+- `deployment_name`: The name of the deployment you want to scale.\n+\n+**Headers:**\n+- `Authorization` (required): Bearer token for authentication. Format: `Bearer <token>`\n+- `Content-Type` (required): Must be `application/json`\n+\n+**Request Body:**\n+\n+The following example shows the request body structure:\n+\n+```json\n+{\n+    \"target_num_replicas\": 5\n+}\n+```\n+\n+The request body must conform to the [`ScaleDeploymentRequest`](https://docs.ray.io/en/latest/serve/api/doc/ray.serve.schema.ScaleDeploymentRequest.html) schema:\n+\n+- `target_num_replicas` (integer, required): The target number of replicas for the deployment. Must be a non-negative integer.\n+\n+\n+## Example - Predictive scaling\n+\n+Implement predictive scaling based on historical patterns or forecasts. For instance, you can preemptively scale up before anticipated traffic spikes:\n+\n+```python\n+import requests\n+from datetime import datetime\n+\n+def predictive_scale(\n+    application_name: str,\n+    deployment_name: str,\n+    auth_token: str,\n+    serve_endpoint: str = \"http://localhost:8000\"\n+) -> bool:\n+    \"\"\"Scale based on time of day and historical patterns.\"\"\"\n+    hour = datetime.now().hour\n+    \n+    # Define scaling profile based on historical traffic patterns\n+    if 9 <= hour < 17:  # Business hours\n+        target_replicas = 10\n+    elif 17 <= hour < 22:  # Evening peak\n+        target_replicas = 15\n+    else:  # Off-peak hours\n+        target_replicas = 3\n+    \n+    url = (\n+        f\"{serve_endpoint}/api/v1/applications/{application_name}\"\n+        f\"/deployments/{deployment_name}/scale\"\n+    )\n+    \n+    headers = {\n+        \"Authorization\": f\"Bearer {auth_token}\",\n+        \"Content-Type\": \"application/json\"\n+    }\n+    \n+    response = requests.post(\n+        url,\n+        headers=headers,\n+        json={\"target_num_replicas\": target_replicas}\n+    )\n+    \n+    return response.status_code == 200\n+\n+```\n+\n+## Use cases\n+\n+The external scaling webhook is useful for several scenarios where you need custom scaling logic beyond what Ray Serve's built-in autoscaling provides:\n+\n+### Custom metric-based scaling\n+\n+Scale your deployments based on business or application metrics that Ray Serve doesn't track automatically:\n+\n+- External monitoring systems such as Prometheus, Datadog, or CloudWatch metrics.\n+- Database query latencies or connection pool sizes.\n+- Cost metrics to optimize for budget constraints.\n+\n+### Predictive and scheduled scaling\n+\n+Implement predictive scaling based on historical patterns or business schedules:\n+\n+- Preemptive scaling before anticipated traffic spikes (such as daily or weekly patterns).\n+- Event-driven scaling for known traffic events (such as sales, launches, or scheduled batch jobs).","path":"doc/source/serve/advanced-guides/external-scaling-webhook.md","commit_id":"1c22056eb240d871bb816191419f4a569ed3ae69","original_commit_id":"1c22056eb240d871bb816191419f4a569ed3ae69","user":{"login":"angelinalg","id":122562471,"node_id":"U_kgDOB04npw","avatar_url":"https://avatars.githubusercontent.com/u/122562471?v=4","gravatar_id":"","url":"https://api.github.com/users/angelinalg","html_url":"https://github.com/angelinalg","followers_url":"https://api.github.com/users/angelinalg/followers","following_url":"https://api.github.com/users/angelinalg/following{/other_user}","gists_url":"https://api.github.com/users/angelinalg/gists{/gist_id}","starred_url":"https://api.github.com/users/angelinalg/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/angelinalg/subscriptions","organizations_url":"https://api.github.com/users/angelinalg/orgs","repos_url":"https://api.github.com/users/angelinalg/repos","events_url":"https://api.github.com/users/angelinalg/events{/privacy}","received_events_url":"https://api.github.com/users/angelinalg/received_events","type":"User","user_view_type":"public","site_admin":false},"body":"```suggestion\r\n- Event-driven scaling for known traffic events such as sales, launches, or scheduled batch jobs.\r\n```","created_at":"2025-10-14T17:55:38Z","updated_at":"2025-10-14T17:56:24Z","html_url":"https://github.com/ray-project/ray/pull/57698#discussion_r2430006217","pull_request_url":"https://api.github.com/repos/ray-project/ray/pulls/57698","_links":{"self":{"href":"https://api.github.com/repos/ray-project/ray/pulls/comments/2430006217"},"html":{"href":"https://github.com/ray-project/ray/pull/57698#discussion_r2430006217"},"pull_request":{"href":"https://api.github.com/repos/ray-project/ray/pulls/57698"}},"reactions":{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2430006217/reactions","total_count":0,"+1":0,"-1":0,"laugh":0,"hooray":0,"confused":0,"heart":0,"rocket":0,"eyes":0},"start_line":null,"original_start_line":null,"start_side":null,"line":143,"original_line":143,"side":"RIGHT","author_association":"CONTRIBUTOR","original_position":143,"position":143,"subject_type":"line"},{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2430006824","pull_request_review_id":3336833217,"id":2430006824,"node_id":"PRRC_kwDOBEmZvc6Q1vYo","diff_hunk":"@@ -0,0 +1,153 @@\n+(serve-external-scale-webhook)=\n+\n+:::{warning}\n+This API is in alpha and may change before becoming stable.\n+:::\n+\n+# External Scaling Webhook\n+\n+Ray Serve exposes a REST API endpoint that you can use to dynamically scale your deployments from outside the Ray cluster. This endpoint gives you flexibility to implement custom scaling logic based on any metrics or signals you choose, such as external monitoring systems, business metrics, or predictive models.\n+\n+## Overview\n+\n+The external scaling webhook provides programmatic control over the number of replicas for any deployment in your Ray Serve application. Unlike Ray Serve's built-in autoscaling, which scales based on queue depth and ongoing requests, this webhook allows you to scale based on any external criteria you define.\n+\n+## Prerequisites\n+\n+Before you can use the external scaling webhook, you must enable it in your Ray Serve application configuration:\n+\n+### Enable external scaler\n+\n+Set `external_scaler_enabled: true` in your application configuration:\n+\n+```yaml\n+applications:\n+  - name: my-app\n+    import_path: my_module:app\n+    external_scaler_enabled: true\n+    deployments:\n+      - name: my-deployment\n+        num_replicas: 1\n+```\n+\n+:::{warning}\n+External scaling and built-in autoscaling are mutually exclusive. You can't use both for the same application.\n+\n+- If you set `external_scaler_enabled: true`, you **must not** configure `autoscaling_config` on any deployment in that application.\n+- If you configure `autoscaling_config` on any deployment, you **must not** set `external_scaler_enabled: true` for the application.\n+\n+Attempting to use both will result in an error.\n+:::\n+\n+### Get authentication token\n+\n+The external scaling webhook requires authentication using a bearer token. You can obtain this token from the Ray Dashboard UI:\n+\n+1. Open the Ray Dashboard in your browser (typically at `http://localhost:8265`).\n+2. Navigate to the Serve section.\n+3. Find and copy the authentication token for your application.\n+\n+## API endpoint\n+\n+The webhook is available at the following endpoint:\n+\n+```\n+POST /api/v1/applications/{application_name}/deployments/{deployment_name}/scale\n+```\n+\n+**Path Parameters:**\n+- `application_name`: The name of your Serve application.\n+- `deployment_name`: The name of the deployment you want to scale.\n+\n+**Headers:**\n+- `Authorization` (required): Bearer token for authentication. Format: `Bearer <token>`\n+- `Content-Type` (required): Must be `application/json`\n+\n+**Request Body:**\n+\n+The following example shows the request body structure:\n+\n+```json\n+{\n+    \"target_num_replicas\": 5\n+}\n+```\n+\n+The request body must conform to the [`ScaleDeploymentRequest`](https://docs.ray.io/en/latest/serve/api/doc/ray.serve.schema.ScaleDeploymentRequest.html) schema:\n+\n+- `target_num_replicas` (integer, required): The target number of replicas for the deployment. Must be a non-negative integer.\n+\n+\n+## Example - Predictive scaling\n+\n+Implement predictive scaling based on historical patterns or forecasts. For instance, you can preemptively scale up before anticipated traffic spikes:\n+\n+```python\n+import requests\n+from datetime import datetime\n+\n+def predictive_scale(\n+    application_name: str,\n+    deployment_name: str,\n+    auth_token: str,\n+    serve_endpoint: str = \"http://localhost:8000\"\n+) -> bool:\n+    \"\"\"Scale based on time of day and historical patterns.\"\"\"\n+    hour = datetime.now().hour\n+    \n+    # Define scaling profile based on historical traffic patterns\n+    if 9 <= hour < 17:  # Business hours\n+        target_replicas = 10\n+    elif 17 <= hour < 22:  # Evening peak\n+        target_replicas = 15\n+    else:  # Off-peak hours\n+        target_replicas = 3\n+    \n+    url = (\n+        f\"{serve_endpoint}/api/v1/applications/{application_name}\"\n+        f\"/deployments/{deployment_name}/scale\"\n+    )\n+    \n+    headers = {\n+        \"Authorization\": f\"Bearer {auth_token}\",\n+        \"Content-Type\": \"application/json\"\n+    }\n+    \n+    response = requests.post(\n+        url,\n+        headers=headers,\n+        json={\"target_num_replicas\": target_replicas}\n+    )\n+    \n+    return response.status_code == 200\n+\n+```\n+\n+## Use cases\n+\n+The external scaling webhook is useful for several scenarios where you need custom scaling logic beyond what Ray Serve's built-in autoscaling provides:\n+\n+### Custom metric-based scaling\n+\n+Scale your deployments based on business or application metrics that Ray Serve doesn't track automatically:\n+\n+- External monitoring systems such as Prometheus, Datadog, or CloudWatch metrics.\n+- Database query latencies or connection pool sizes.\n+- Cost metrics to optimize for budget constraints.\n+\n+### Predictive and scheduled scaling\n+\n+Implement predictive scaling based on historical patterns or business schedules:\n+\n+- Preemptive scaling before anticipated traffic spikes (such as daily or weekly patterns).\n+- Event-driven scaling for known traffic events (such as sales, launches, or scheduled batch jobs).\n+- Time-of-day based scaling profiles for predictable workloads.\n+\n+### Manual and operational control\n+\n+Direct control over replica counts for operational scenarios:","path":"doc/source/serve/advanced-guides/external-scaling-webhook.md","commit_id":"1c22056eb240d871bb816191419f4a569ed3ae69","original_commit_id":"1c22056eb240d871bb816191419f4a569ed3ae69","user":{"login":"angelinalg","id":122562471,"node_id":"U_kgDOB04npw","avatar_url":"https://avatars.githubusercontent.com/u/122562471?v=4","gravatar_id":"","url":"https://api.github.com/users/angelinalg","html_url":"https://github.com/angelinalg","followers_url":"https://api.github.com/users/angelinalg/followers","following_url":"https://api.github.com/users/angelinalg/following{/other_user}","gists_url":"https://api.github.com/users/angelinalg/gists{/gist_id}","starred_url":"https://api.github.com/users/angelinalg/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/angelinalg/subscriptions","organizations_url":"https://api.github.com/users/angelinalg/orgs","repos_url":"https://api.github.com/users/angelinalg/repos","events_url":"https://api.github.com/users/angelinalg/events{/privacy}","received_events_url":"https://api.github.com/users/angelinalg/received_events","type":"User","user_view_type":"public","site_admin":false},"body":"```suggestion\r\nDirect control over replica counts for operational scenarios like the following:\r\n```","created_at":"2025-10-14T17:55:53Z","updated_at":"2025-10-14T17:56:24Z","html_url":"https://github.com/ray-project/ray/pull/57698#discussion_r2430006824","pull_request_url":"https://api.github.com/repos/ray-project/ray/pulls/57698","_links":{"self":{"href":"https://api.github.com/repos/ray-project/ray/pulls/comments/2430006824"},"html":{"href":"https://github.com/ray-project/ray/pull/57698#discussion_r2430006824"},"pull_request":{"href":"https://api.github.com/repos/ray-project/ray/pulls/57698"}},"reactions":{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2430006824/reactions","total_count":0,"+1":0,"-1":0,"laugh":0,"hooray":0,"confused":0,"heart":0,"rocket":0,"eyes":0},"start_line":null,"original_start_line":null,"start_side":null,"line":148,"original_line":148,"side":"RIGHT","author_association":"CONTRIBUTOR","original_position":148,"position":148,"subject_type":"line"},{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2430007271","pull_request_review_id":3336833217,"id":2430007271,"node_id":"PRRC_kwDOBEmZvc6Q1vfn","diff_hunk":"@@ -33,3 +34,4 @@ Use these advanced guides for more options and configurations:\n - [Run Applications in Different Containers](serve-container-runtime-env-guide)\n - [Use Custom Algorithm for Request Routing](custom-request-router)\n - [Troubleshoot multi-node GPU setups for serving LLMs](multi-node-gpu-troubleshooting)\n+- [External Scaling Webhook API](external-scaling-webhook)","path":"doc/source/serve/advanced-guides/index.md","commit_id":"1c22056eb240d871bb816191419f4a569ed3ae69","original_commit_id":"1c22056eb240d871bb816191419f4a569ed3ae69","user":{"login":"angelinalg","id":122562471,"node_id":"U_kgDOB04npw","avatar_url":"https://avatars.githubusercontent.com/u/122562471?v=4","gravatar_id":"","url":"https://api.github.com/users/angelinalg","html_url":"https://github.com/angelinalg","followers_url":"https://api.github.com/users/angelinalg/followers","following_url":"https://api.github.com/users/angelinalg/following{/other_user}","gists_url":"https://api.github.com/users/angelinalg/gists{/gist_id}","starred_url":"https://api.github.com/users/angelinalg/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/angelinalg/subscriptions","organizations_url":"https://api.github.com/users/angelinalg/orgs","repos_url":"https://api.github.com/users/angelinalg/repos","events_url":"https://api.github.com/users/angelinalg/events{/privacy}","received_events_url":"https://api.github.com/users/angelinalg/received_events","type":"User","user_view_type":"public","site_admin":false},"body":"```suggestion\r\n- [External scaling webhook API](external-scaling-webhook)\r\n```","created_at":"2025-10-14T17:56:06Z","updated_at":"2025-10-14T17:56:24Z","html_url":"https://github.com/ray-project/ray/pull/57698#discussion_r2430007271","pull_request_url":"https://api.github.com/repos/ray-project/ray/pulls/57698","_links":{"self":{"href":"https://api.github.com/repos/ray-project/ray/pulls/comments/2430007271"},"html":{"href":"https://github.com/ray-project/ray/pull/57698#discussion_r2430007271"},"pull_request":{"href":"https://api.github.com/repos/ray-project/ray/pulls/57698"}},"reactions":{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2430007271/reactions","total_count":0,"+1":0,"-1":0,"laugh":0,"hooray":0,"confused":0,"heart":0,"rocket":0,"eyes":0},"start_line":null,"original_start_line":null,"start_side":null,"line":37,"original_line":37,"side":"RIGHT","author_association":"CONTRIBUTOR","original_position":12,"position":12,"subject_type":"line"},{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2436993731","pull_request_review_id":3346339993,"id":2436993731,"node_id":"PRRC_kwDOBEmZvc6RQZLD","diff_hunk":"@@ -0,0 +1,153 @@\n+(serve-external-scale-webhook)=\n+\n+:::{warning}\n+This API is in alpha and may change before becoming stable.\n+:::\n+\n+# External Scaling Webhook\n+\n+Ray Serve exposes a REST API endpoint that you can use to dynamically scale your deployments from outside the Ray cluster. This endpoint gives you flexibility to implement custom scaling logic based on any metrics or signals you choose, such as external monitoring systems, business metrics, or predictive models.\n+\n+## Overview\n+\n+The external scaling webhook provides programmatic control over the number of replicas for any deployment in your Ray Serve application. Unlike Ray Serve's built-in autoscaling, which scales based on queue depth and ongoing requests, this webhook allows you to scale based on any external criteria you define.\n+\n+## Prerequisites\n+\n+Before you can use the external scaling webhook, you must enable it in your Ray Serve application configuration:\n+\n+### Enable external scaler\n+\n+Set `external_scaler_enabled: true` in your application configuration:\n+\n+```yaml\n+applications:\n+  - name: my-app\n+    import_path: my_module:app\n+    external_scaler_enabled: true\n+    deployments:\n+      - name: my-deployment\n+        num_replicas: 1\n+```","path":"doc/source/serve/advanced-guides/external-scaling-webhook.md","commit_id":"1c22056eb240d871bb816191419f4a569ed3ae69","original_commit_id":"1c22056eb240d871bb816191419f4a569ed3ae69","user":{"login":"abrarsheikh","id":5113943,"node_id":"MDQ6VXNlcjUxMTM5NDM=","avatar_url":"https://avatars.githubusercontent.com/u/5113943?v=4","gravatar_id":"","url":"https://api.github.com/users/abrarsheikh","html_url":"https://github.com/abrarsheikh","followers_url":"https://api.github.com/users/abrarsheikh/followers","following_url":"https://api.github.com/users/abrarsheikh/following{/other_user}","gists_url":"https://api.github.com/users/abrarsheikh/gists{/gist_id}","starred_url":"https://api.github.com/users/abrarsheikh/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/abrarsheikh/subscriptions","organizations_url":"https://api.github.com/users/abrarsheikh/orgs","repos_url":"https://api.github.com/users/abrarsheikh/repos","events_url":"https://api.github.com/users/abrarsheikh/events{/privacy}","received_events_url":"https://api.github.com/users/abrarsheikh/received_events","type":"User","user_view_type":"public","site_admin":false},"body":"move this to a file if `doc_code` and use emphasize on `external_scaler_enabled: true` line. See https://github.com/ray-project/ray/blob/master/doc/source/serve/advanced-guides/dyn-req-batch.md?plain=1#L29","created_at":"2025-10-16T18:13:36Z","updated_at":"2025-10-16T18:36:49Z","html_url":"https://github.com/ray-project/ray/pull/57698#discussion_r2436993731","pull_request_url":"https://api.github.com/repos/ray-project/ray/pulls/57698","_links":{"self":{"href":"https://api.github.com/repos/ray-project/ray/pulls/comments/2436993731"},"html":{"href":"https://github.com/ray-project/ray/pull/57698#discussion_r2436993731"},"pull_request":{"href":"https://api.github.com/repos/ray-project/ray/pulls/57698"}},"reactions":{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2436993731/reactions","total_count":0,"+1":0,"-1":0,"laugh":0,"hooray":0,"confused":0,"heart":0,"rocket":0,"eyes":0},"start_line":23,"original_start_line":23,"start_side":"RIGHT","line":31,"original_line":31,"side":"RIGHT","author_association":"CONTRIBUTOR","original_position":31,"position":31,"subject_type":"line"},{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2436997041","pull_request_review_id":3346339993,"id":2436997041,"node_id":"PRRC_kwDOBEmZvc6RQZ-x","diff_hunk":"@@ -0,0 +1,153 @@\n+(serve-external-scale-webhook)=\n+\n+:::{warning}\n+This API is in alpha and may change before becoming stable.\n+:::\n+\n+# External Scaling Webhook\n+\n+Ray Serve exposes a REST API endpoint that you can use to dynamically scale your deployments from outside the Ray cluster. This endpoint gives you flexibility to implement custom scaling logic based on any metrics or signals you choose, such as external monitoring systems, business metrics, or predictive models.\n+\n+## Overview\n+\n+The external scaling webhook provides programmatic control over the number of replicas for any deployment in your Ray Serve application. Unlike Ray Serve's built-in autoscaling, which scales based on queue depth and ongoing requests, this webhook allows you to scale based on any external criteria you define.\n+\n+## Prerequisites\n+\n+Before you can use the external scaling webhook, you must enable it in your Ray Serve application configuration:\n+\n+### Enable external scaler\n+\n+Set `external_scaler_enabled: true` in your application configuration:\n+\n+```yaml\n+applications:\n+  - name: my-app\n+    import_path: my_module:app\n+    external_scaler_enabled: true\n+    deployments:\n+      - name: my-deployment\n+        num_replicas: 1\n+```\n+\n+:::{warning}\n+External scaling and built-in autoscaling are mutually exclusive. You can't use both for the same application.\n+\n+- If you set `external_scaler_enabled: true`, you **must not** configure `autoscaling_config` on any deployment in that application.\n+- If you configure `autoscaling_config` on any deployment, you **must not** set `external_scaler_enabled: true` for the application.","path":"doc/source/serve/advanced-guides/external-scaling-webhook.md","commit_id":"1c22056eb240d871bb816191419f4a569ed3ae69","original_commit_id":"1c22056eb240d871bb816191419f4a569ed3ae69","user":{"login":"abrarsheikh","id":5113943,"node_id":"MDQ6VXNlcjUxMTM5NDM=","avatar_url":"https://avatars.githubusercontent.com/u/5113943?v=4","gravatar_id":"","url":"https://api.github.com/users/abrarsheikh","html_url":"https://github.com/abrarsheikh","followers_url":"https://api.github.com/users/abrarsheikh/followers","following_url":"https://api.github.com/users/abrarsheikh/following{/other_user}","gists_url":"https://api.github.com/users/abrarsheikh/gists{/gist_id}","starred_url":"https://api.github.com/users/abrarsheikh/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/abrarsheikh/subscriptions","organizations_url":"https://api.github.com/users/abrarsheikh/orgs","repos_url":"https://api.github.com/users/abrarsheikh/repos","events_url":"https://api.github.com/users/abrarsheikh/events{/privacy}","received_events_url":"https://api.github.com/users/abrarsheikh/received_events","type":"User","user_view_type":"public","site_admin":false},"body":"The two points for negation of the other. Choose one of the points and remove the other?","created_at":"2025-10-16T18:14:45Z","updated_at":"2025-10-16T18:36:49Z","html_url":"https://github.com/ray-project/ray/pull/57698#discussion_r2436997041","pull_request_url":"https://api.github.com/repos/ray-project/ray/pulls/57698","_links":{"self":{"href":"https://api.github.com/repos/ray-project/ray/pulls/comments/2436997041"},"html":{"href":"https://github.com/ray-project/ray/pull/57698#discussion_r2436997041"},"pull_request":{"href":"https://api.github.com/repos/ray-project/ray/pulls/57698"}},"reactions":{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2436997041/reactions","total_count":0,"+1":0,"-1":0,"laugh":0,"hooray":0,"confused":0,"heart":0,"rocket":0,"eyes":0},"start_line":36,"original_start_line":36,"start_side":"RIGHT","line":37,"original_line":37,"side":"RIGHT","author_association":"CONTRIBUTOR","original_position":37,"position":37,"subject_type":"line"},{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2437053726","pull_request_review_id":3346339993,"id":2437053726,"node_id":"PRRC_kwDOBEmZvc6RQn0e","diff_hunk":"@@ -0,0 +1,153 @@\n+(serve-external-scale-webhook)=\n+\n+:::{warning}\n+This API is in alpha and may change before becoming stable.\n+:::\n+\n+# External Scaling Webhook\n+\n+Ray Serve exposes a REST API endpoint that you can use to dynamically scale your deployments from outside the Ray cluster. This endpoint gives you flexibility to implement custom scaling logic based on any metrics or signals you choose, such as external monitoring systems, business metrics, or predictive models.\n+\n+## Overview\n+\n+The external scaling webhook provides programmatic control over the number of replicas for any deployment in your Ray Serve application. Unlike Ray Serve's built-in autoscaling, which scales based on queue depth and ongoing requests, this webhook allows you to scale based on any external criteria you define.\n+\n+## Prerequisites\n+\n+Before you can use the external scaling webhook, you must enable it in your Ray Serve application configuration:\n+\n+### Enable external scaler\n+\n+Set `external_scaler_enabled: true` in your application configuration:\n+\n+```yaml\n+applications:\n+  - name: my-app\n+    import_path: my_module:app\n+    external_scaler_enabled: true\n+    deployments:\n+      - name: my-deployment\n+        num_replicas: 1\n+```\n+\n+:::{warning}\n+External scaling and built-in autoscaling are mutually exclusive. You can't use both for the same application.\n+\n+- If you set `external_scaler_enabled: true`, you **must not** configure `autoscaling_config` on any deployment in that application.\n+- If you configure `autoscaling_config` on any deployment, you **must not** set `external_scaler_enabled: true` for the application.\n+\n+Attempting to use both will result in an error.\n+:::\n+\n+### Get authentication token\n+\n+The external scaling webhook requires authentication using a bearer token. You can obtain this token from the Ray Dashboard UI:\n+\n+1. Open the Ray Dashboard in your browser (typically at `http://localhost:8265`).\n+2. Navigate to the Serve section.\n+3. Find and copy the authentication token for your application.\n+\n+## API endpoint\n+\n+The webhook is available at the following endpoint:","path":"doc/source/serve/advanced-guides/external-scaling-webhook.md","commit_id":"1c22056eb240d871bb816191419f4a569ed3ae69","original_commit_id":"1c22056eb240d871bb816191419f4a569ed3ae69","user":{"login":"abrarsheikh","id":5113943,"node_id":"MDQ6VXNlcjUxMTM5NDM=","avatar_url":"https://avatars.githubusercontent.com/u/5113943?v=4","gravatar_id":"","url":"https://api.github.com/users/abrarsheikh","html_url":"https://github.com/abrarsheikh","followers_url":"https://api.github.com/users/abrarsheikh/followers","following_url":"https://api.github.com/users/abrarsheikh/following{/other_user}","gists_url":"https://api.github.com/users/abrarsheikh/gists{/gist_id}","starred_url":"https://api.github.com/users/abrarsheikh/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/abrarsheikh/subscriptions","organizations_url":"https://api.github.com/users/abrarsheikh/orgs","repos_url":"https://api.github.com/users/abrarsheikh/repos","events_url":"https://api.github.com/users/abrarsheikh/events{/privacy}","received_events_url":"https://api.github.com/users/abrarsheikh/received_events","type":"User","user_view_type":"public","site_admin":false},"body":"instead of splitting the instructions into multiple steps provide one curl command that contains all details.","created_at":"2025-10-16T18:33:08Z","updated_at":"2025-10-16T18:36:49Z","html_url":"https://github.com/ray-project/ray/pull/57698#discussion_r2437053726","pull_request_url":"https://api.github.com/repos/ray-project/ray/pulls/57698","_links":{"self":{"href":"https://api.github.com/repos/ray-project/ray/pulls/comments/2437053726"},"html":{"href":"https://github.com/ray-project/ray/pull/57698#discussion_r2437053726"},"pull_request":{"href":"https://api.github.com/repos/ray-project/ray/pulls/57698"}},"reactions":{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2437053726/reactions","total_count":0,"+1":0,"-1":0,"laugh":0,"hooray":0,"confused":0,"heart":0,"rocket":0,"eyes":0},"start_line":null,"original_start_line":null,"start_side":null,"line":52,"original_line":52,"side":"RIGHT","author_association":"CONTRIBUTOR","original_position":52,"position":52,"subject_type":"line"},{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2437057788","pull_request_review_id":3346339993,"id":2437057788,"node_id":"PRRC_kwDOBEmZvc6RQoz8","diff_hunk":"@@ -0,0 +1,153 @@\n+(serve-external-scale-webhook)=\n+\n+:::{warning}\n+This API is in alpha and may change before becoming stable.\n+:::\n+\n+# External Scaling Webhook\n+\n+Ray Serve exposes a REST API endpoint that you can use to dynamically scale your deployments from outside the Ray cluster. This endpoint gives you flexibility to implement custom scaling logic based on any metrics or signals you choose, such as external monitoring systems, business metrics, or predictive models.\n+\n+## Overview\n+\n+The external scaling webhook provides programmatic control over the number of replicas for any deployment in your Ray Serve application. Unlike Ray Serve's built-in autoscaling, which scales based on queue depth and ongoing requests, this webhook allows you to scale based on any external criteria you define.\n+\n+## Prerequisites\n+\n+Before you can use the external scaling webhook, you must enable it in your Ray Serve application configuration:\n+\n+### Enable external scaler\n+\n+Set `external_scaler_enabled: true` in your application configuration:\n+\n+```yaml\n+applications:\n+  - name: my-app\n+    import_path: my_module:app\n+    external_scaler_enabled: true\n+    deployments:\n+      - name: my-deployment\n+        num_replicas: 1\n+```\n+\n+:::{warning}\n+External scaling and built-in autoscaling are mutually exclusive. You can't use both for the same application.\n+\n+- If you set `external_scaler_enabled: true`, you **must not** configure `autoscaling_config` on any deployment in that application.\n+- If you configure `autoscaling_config` on any deployment, you **must not** set `external_scaler_enabled: true` for the application.\n+\n+Attempting to use both will result in an error.\n+:::\n+\n+### Get authentication token\n+\n+The external scaling webhook requires authentication using a bearer token. You can obtain this token from the Ray Dashboard UI:\n+\n+1. Open the Ray Dashboard in your browser (typically at `http://localhost:8265`).\n+2. Navigate to the Serve section.\n+3. Find and copy the authentication token for your application.\n+\n+## API endpoint\n+\n+The webhook is available at the following endpoint:\n+\n+```\n+POST /api/v1/applications/{application_name}/deployments/{deployment_name}/scale\n+```\n+\n+**Path Parameters:**\n+- `application_name`: The name of your Serve application.\n+- `deployment_name`: The name of the deployment you want to scale.\n+\n+**Headers:**\n+- `Authorization` (required): Bearer token for authentication. Format: `Bearer <token>`\n+- `Content-Type` (required): Must be `application/json`\n+\n+**Request Body:**\n+\n+The following example shows the request body structure:\n+\n+```json\n+{\n+    \"target_num_replicas\": 5\n+}\n+```\n+\n+The request body must conform to the [`ScaleDeploymentRequest`](https://docs.ray.io/en/latest/serve/api/doc/ray.serve.schema.ScaleDeploymentRequest.html) schema:\n+\n+- `target_num_replicas` (integer, required): The target number of replicas for the deployment. Must be a non-negative integer.\n+\n+\n+## Example - Predictive scaling\n+\n+Implement predictive scaling based on historical patterns or forecasts. For instance, you can preemptively scale up before anticipated traffic spikes:\n+\n+```python\n+import requests\n+from datetime import datetime\n+\n+def predictive_scale(\n+    application_name: str,\n+    deployment_name: str,\n+    auth_token: str,\n+    serve_endpoint: str = \"http://localhost:8000\"\n+) -> bool:\n+    \"\"\"Scale based on time of day and historical patterns.\"\"\"\n+    hour = datetime.now().hour\n+    \n+    # Define scaling profile based on historical traffic patterns\n+    if 9 <= hour < 17:  # Business hours\n+        target_replicas = 10\n+    elif 17 <= hour < 22:  # Evening peak\n+        target_replicas = 15\n+    else:  # Off-peak hours\n+        target_replicas = 3\n+    \n+    url = (\n+        f\"{serve_endpoint}/api/v1/applications/{application_name}\"\n+        f\"/deployments/{deployment_name}/scale\"\n+    )\n+    \n+    headers = {\n+        \"Authorization\": f\"Bearer {auth_token}\",\n+        \"Content-Type\": \"application/json\"\n+    }\n+    \n+    response = requests.post(\n+        url,\n+        headers=headers,\n+        json={\"target_num_replicas\": target_replicas}\n+    )\n+    \n+    return response.status_code == 200\n+\n+```\n+\n+## Use cases","path":"doc/source/serve/advanced-guides/external-scaling-webhook.md","commit_id":"1c22056eb240d871bb816191419f4a569ed3ae69","original_commit_id":"1c22056eb240d871bb816191419f4a569ed3ae69","user":{"login":"abrarsheikh","id":5113943,"node_id":"MDQ6VXNlcjUxMTM5NDM=","avatar_url":"https://avatars.githubusercontent.com/u/5113943?v=4","gravatar_id":"","url":"https://api.github.com/users/abrarsheikh","html_url":"https://github.com/abrarsheikh","followers_url":"https://api.github.com/users/abrarsheikh/followers","following_url":"https://api.github.com/users/abrarsheikh/following{/other_user}","gists_url":"https://api.github.com/users/abrarsheikh/gists{/gist_id}","starred_url":"https://api.github.com/users/abrarsheikh/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/abrarsheikh/subscriptions","organizations_url":"https://api.github.com/users/abrarsheikh/orgs","repos_url":"https://api.github.com/users/abrarsheikh/repos","events_url":"https://api.github.com/users/abrarsheikh/events{/privacy}","received_events_url":"https://api.github.com/users/abrarsheikh/received_events","type":"User","user_view_type":"public","site_admin":false},"body":"I suggest condensing the use-cases section into a paragraph, information here is too verbose and not adding much value.","created_at":"2025-10-16T18:34:36Z","updated_at":"2025-10-16T18:36:49Z","html_url":"https://github.com/ray-project/ray/pull/57698#discussion_r2437057788","pull_request_url":"https://api.github.com/repos/ray-project/ray/pulls/57698","_links":{"self":{"href":"https://api.github.com/repos/ray-project/ray/pulls/comments/2437057788"},"html":{"href":"https://github.com/ray-project/ray/pull/57698#discussion_r2437057788"},"pull_request":{"href":"https://api.github.com/repos/ray-project/ray/pulls/57698"}},"reactions":{"url":"https://api.github.com/repos/ray-project/ray/pulls/comments/2437057788/reactions","total_count":0,"+1":0,"-1":0,"laugh":0,"hooray":0,"confused":0,"heart":0,"rocket":0,"eyes":0},"start_line":null,"original_start_line":null,"start_side":null,"line":126,"original_line":126,"side":"RIGHT","author_association":"CONTRIBUTOR","original_position":126,"position":126,"subject_type":"line"}]
\ No newline at end of file
diff --git a/.cursor/.agent-tools/4017aade-a6b7-4ee3-a628-d0394bcd4c53.txt b/.cursor/.agent-tools/4017aade-a6b7-4ee3-a628-d0394bcd4c53.txt
new file mode 100644
index 000000000000..3bbc8a58c427
--- /dev/null
+++ b/.cursor/.agent-tools/4017aade-a6b7-4ee3-a628-d0394bcd4c53.txt
@@ -0,0 +1 @@
+{"url":"https://api.github.com/repos/ray-project/ray/pulls/57698","id":2912647974,"node_id":"PR_kwDOBEmZvc6tm3sm","html_url":"https://github.com/ray-project/ray/pull/57698","diff_url":"https://github.com/ray-project/ray/pull/57698.diff","patch_url":"https://github.com/ray-project/ray/pull/57698.patch","issue_url":"https://api.github.com/repos/ray-project/ray/issues/57698","number":57698,"state":"open","locked":false,"title":"add docs for post API","user":{"login":"harshit-anyscale","id":217743223,"node_id":"U_kgDODPp_dw","avatar_url":"https://avatars.githubusercontent.com/u/217743223?v=4","gravatar_id":"","url":"https://api.github.com/users/harshit-anyscale","html_url":"https://github.com/harshit-anyscale","followers_url":"https://api.github.com/users/harshit-anyscale/followers","following_url":"https://api.github.com/users/harshit-anyscale/following{/other_user}","gists_url":"https://api.github.com/users/harshit-anyscale/gists{/gist_id}","starred_url":"https://api.github.com/users/harshit-anyscale/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/harshit-anyscale/subscriptions","organizations_url":"https://api.github.com/users/harshit-anyscale/orgs","repos_url":"https://api.github.com/users/harshit-anyscale/repos","events_url":"https://api.github.com/users/harshit-anyscale/events{/privacy}","received_events_url":"https://api.github.com/users/harshit-anyscale/received_events","type":"User","user_view_type":"public","site_admin":false},"body":"- adding docs for POST API, here are more details: https://docs.google.com/document/d/1KtMUDz1O3koihG6eh-QcUqudZjNAX3NsqqOMYh3BoWA/edit?tab=t.0#heading=h.2vf4s2d7ca46\r\n\r\n- also, making changes for the external scaler enabled in the existing serve application docs\r\n\r\nto be merged after https://github.com/ray-project/ray/pull/57554","created_at":"2025-10-14T07:54:41Z","updated_at":"2025-10-17T07:28:28Z","closed_at":null,"merged_at":null,"merge_commit_sha":"e88caa994bc8ff9cd84cdde5f79854dbb3c144fc","assignee":{"login":"harshit-anyscale","id":217743223,"node_id":"U_kgDODPp_dw","avatar_url":"https://avatars.githubusercontent.com/u/217743223?v=4","gravatar_id":"","url":"https://api.github.com/users/harshit-anyscale","html_url":"https://github.com/harshit-anyscale","followers_url":"https://api.github.com/users/harshit-anyscale/followers","following_url":"https://api.github.com/users/harshit-anyscale/following{/other_user}","gists_url":"https://api.github.com/users/harshit-anyscale/gists{/gist_id}","starred_url":"https://api.github.com/users/harshit-anyscale/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/harshit-anyscale/subscriptions","organizations_url":"https://api.github.com/users/harshit-anyscale/orgs","repos_url":"https://api.github.com/users/harshit-anyscale/repos","events_url":"https://api.github.com/users/harshit-anyscale/events{/privacy}","received_events_url":"https://api.github.com/users/harshit-anyscale/received_events","type":"User","user_view_type":"public","site_admin":false},"assignees":[{"login":"harshit-anyscale","id":217743223,"node_id":"U_kgDODPp_dw","avatar_url":"https://avatars.githubusercontent.com/u/217743223?v=4","gravatar_id":"","url":"https://api.github.com/users/harshit-anyscale","html_url":"https://github.com/harshit-anyscale","followers_url":"https://api.github.com/users/harshit-anyscale/followers","following_url":"https://api.github.com/users/harshit-anyscale/following{/other_user}","gists_url":"https://api.github.com/users/harshit-anyscale/gists{/gist_id}","starred_url":"https://api.github.com/users/harshit-anyscale/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/harshit-anyscale/subscriptions","organizations_url":"https://api.github.com/users/harshit-anyscale/orgs","repos_url":"https://api.github.com/users/harshit-anyscale/repos","events_url":"https://api.github.com/users/harshit-anyscale/events{/privacy}","received_events_url":"https://api.github.com/users/harshit-anyscale/received_events","type":"User","user_view_type":"public","site_admin":false}],"requested_reviewers":[],"requested_teams":[],"labels":[{"id":2110730223,"node_id":"MDU6TGFiZWwyMTEwNzMwMjIz","url":"https://api.github.com/repos/ray-project/ray/labels/serve","name":"serve","color":"BFDADC","default":false,"description":"Ray Serve Related Issue"},{"id":3515394391,"node_id":"LA_kwDOBEmZvc7RiKlX","url":"https://api.github.com/repos/ray-project/ray/labels/docs","name":"docs","color":"1D76DB","default":false,"description":"An issue or change related to documentation"},{"id":6926977571,"node_id":"LA_kwDOBEmZvc8AAAABnOFKIw","url":"https://api.github.com/repos/ray-project/ray/labels/go","name":"go","color":"0E8A16","default":false,"description":"add ONLY when ready to merge, run all tests"}],"milestone":null,"draft":false,"commits_url":"https://api.github.com/repos/ray-project/ray/pulls/57698/commits","review_comments_url":"https://api.github.com/repos/ray-project/ray/pulls/57698/comments","review_comment_url":"https://api.github.com/repos/ray-project/ray/pulls/comments{/number}","comments_url":"https://api.github.com/repos/ray-project/ray/issues/57698/comments","statuses_url":"https://api.github.com/repos/ray-project/ray/statuses/1c22056eb240d871bb816191419f4a569ed3ae69","head":{"label":"ray-project:add-docs-for-post-api","ref":"add-docs-for-post-api","sha":"1c22056eb240d871bb816191419f4a569ed3ae69","user":{"login":"ray-project","id":22125274,"node_id":"MDEyOk9yZ2FuaXphdGlvbjIyMTI1Mjc0","avatar_url":"https://avatars.githubusercontent.com/u/22125274?v=4","gravatar_id":"","url":"https://api.github.com/users/ray-project","html_url":"https://github.com/ray-project","followers_url":"https://api.github.com/users/ray-project/followers","following_url":"https://api.github.com/users/ray-project/following{/other_user}","gists_url":"https://api.github.com/users/ray-project/gists{/gist_id}","starred_url":"https://api.github.com/users/ray-project/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/ray-project/subscriptions","organizations_url":"https://api.github.com/users/ray-project/orgs","repos_url":"https://api.github.com/users/ray-project/repos","events_url":"https://api.github.com/users/ray-project/events{/privacy}","received_events_url":"https://api.github.com/users/ray-project/received_events","type":"Organization","user_view_type":"public","site_admin":false},"repo":{"id":71932349,"node_id":"MDEwOlJlcG9zaXRvcnk3MTkzMjM0OQ==","name":"ray","full_name":"ray-project/ray","private":false,"owner":{"login":"ray-project","id":22125274,"node_id":"MDEyOk9yZ2FuaXphdGlvbjIyMTI1Mjc0","avatar_url":"https://avatars.githubusercontent.com/u/22125274?v=4","gravatar_id":"","url":"https://api.github.com/users/ray-project","html_url":"https://github.com/ray-project","followers_url":"https://api.github.com/users/ray-project/followers","following_url":"https://api.github.com/users/ray-project/following{/other_user}","gists_url":"https://api.github.com/users/ray-project/gists{/gist_id}","starred_url":"https://api.github.com/users/ray-project/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/ray-project/subscriptions","organizations_url":"https://api.github.com/users/ray-project/orgs","repos_url":"https://api.github.com/users/ray-project/repos","events_url":"https://api.github.com/users/ray-project/events{/privacy}","received_events_url":"https://api.github.com/users/ray-project/received_events","type":"Organization","user_view_type":"public","site_admin":false},"html_url":"https://github.com/ray-project/ray","description":"Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.","fork":false,"url":"https://api.github.com/repos/ray-project/ray","forks_url":"https://api.github.com/repos/ray-project/ray/forks","keys_url":"https://api.github.com/repos/ray-project/ray/keys{/key_id}","collaborators_url":"https://api.github.com/repos/ray-project/ray/collaborators{/collaborator}","teams_url":"https://api.github.com/repos/ray-project/ray/teams","hooks_url":"https://api.github.com/repos/ray-project/ray/hooks","issue_events_url":"https://api.github.com/repos/ray-project/ray/issues/events{/number}","events_url":"https://api.github.com/repos/ray-project/ray/events","assignees_url":"https://api.github.com/repos/ray-project/ray/assignees{/user}","branches_url":"https://api.github.com/repos/ray-project/ray/branches{/branch}","tags_url":"https://api.github.com/repos/ray-project/ray/tags","blobs_url":"https://api.github.com/repos/ray-project/ray/git/blobs{/sha}","git_tags_url":"https://api.github.com/repos/ray-project/ray/git/tags{/sha}","git_refs_url":"https://api.github.com/repos/ray-project/ray/git/refs{/sha}","trees_url":"https://api.github.com/repos/ray-project/ray/git/trees{/sha}","statuses_url":"https://api.github.com/repos/ray-project/ray/statuses/{sha}","languages_url":"https://api.github.com/repos/ray-project/ray/languages","stargazers_url":"https://api.github.com/repos/ray-project/ray/stargazers","contributors_url":"https://api.github.com/repos/ray-project/ray/contributors","subscribers_url":"https://api.github.com/repos/ray-project/ray/subscribers","subscription_url":"https://api.github.com/repos/ray-project/ray/subscription","commits_url":"https://api.github.com/repos/ray-project/ray/commits{/sha}","git_commits_url":"https://api.github.com/repos/ray-project/ray/git/commits{/sha}","comments_url":"https://api.github.com/repos/ray-project/ray/comments{/number}","issue_comment_url":"https://api.github.com/repos/ray-project/ray/issues/comments{/number}","contents_url":"https://api.github.com/repos/ray-project/ray/contents/{+path}","compare_url":"https://api.github.com/repos/ray-project/ray/compare/{base}...{head}","merges_url":"https://api.github.com/repos/ray-project/ray/merges","archive_url":"https://api.github.com/repos/ray-project/ray/{archive_format}{/ref}","downloads_url":"https://api.github.com/repos/ray-project/ray/downloads","issues_url":"https://api.github.com/repos/ray-project/ray/issues{/number}","pulls_url":"https://api.github.com/repos/ray-project/ray/pulls{/number}","milestones_url":"https://api.github.com/repos/ray-project/ray/milestones{/number}","notifications_url":"https://api.github.com/repos/ray-project/ray/notifications{?since,all,participating}","labels_url":"https://api.github.com/repos/ray-project/ray/labels{/name}","releases_url":"https://api.github.com/repos/ray-project/ray/releases{/id}","deployments_url":"https://api.github.com/repos/ray-project/ray/deployments","created_at":"2016-10-25T19:38:30Z","updated_at":"2025-10-17T05:23:53Z","pushed_at":"2025-10-17T06:33:34Z","git_url":"git://github.com/ray-project/ray.git","ssh_url":"git@github.com:ray-project/ray.git","clone_url":"https://github.com/ray-project/ray.git","svn_url":"https://github.com/ray-project/ray","homepage":"https://ray.io","size":610140,"stargazers_count":39362,"watchers_count":39362,"language":"Python","has_issues":true,"has_projects":true,"has_downloads":true,"has_wiki":false,"has_pages":false,"has_discussions":false,"forks_count":6780,"mirror_url":null,"archived":false,"disabled":false,"open_issues_count":3175,"license":{"key":"apache-2.0","name":"Apache License 2.0","spdx_id":"Apache-2.0","url":"https://api.github.com/licenses/apache-2.0","node_id":"MDc6TGljZW5zZTI="},"allow_forking":true,"is_template":false,"web_commit_signoff_required":true,"topics":["data-science","deep-learning","deployment","distributed","hyperparameter-optimization","hyperparameter-search","large-language-models","llm","llm-inference","llm-serving","machine-learning","optimization","parallel","python","pytorch","ray","reinforcement-learning","rllib","serving","tensorflow"],"visibility":"public","forks":6780,"open_issues":3175,"watchers":39362,"default_branch":"master"}},"base":{"label":"ray-project:master","ref":"master","sha":"f6f14aacf38abfea27c3657e4a866a249c781ec0","user":{"login":"ray-project","id":22125274,"node_id":"MDEyOk9yZ2FuaXphdGlvbjIyMTI1Mjc0","avatar_url":"https://avatars.githubusercontent.com/u/22125274?v=4","gravatar_id":"","url":"https://api.github.com/users/ray-project","html_url":"https://github.com/ray-project","followers_url":"https://api.github.com/users/ray-project/followers","following_url":"https://api.github.com/users/ray-project/following{/other_user}","gists_url":"https://api.github.com/users/ray-project/gists{/gist_id}","starred_url":"https://api.github.com/users/ray-project/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/ray-project/subscriptions","organizations_url":"https://api.github.com/users/ray-project/orgs","repos_url":"https://api.github.com/users/ray-project/repos","events_url":"https://api.github.com/users/ray-project/events{/privacy}","received_events_url":"https://api.github.com/users/ray-project/received_events","type":"Organization","user_view_type":"public","site_admin":false},"repo":{"id":71932349,"node_id":"MDEwOlJlcG9zaXRvcnk3MTkzMjM0OQ==","name":"ray","full_name":"ray-project/ray","private":false,"owner":{"login":"ray-project","id":22125274,"node_id":"MDEyOk9yZ2FuaXphdGlvbjIyMTI1Mjc0","avatar_url":"https://avatars.githubusercontent.com/u/22125274?v=4","gravatar_id":"","url":"https://api.github.com/users/ray-project","html_url":"https://github.com/ray-project","followers_url":"https://api.github.com/users/ray-project/followers","following_url":"https://api.github.com/users/ray-project/following{/other_user}","gists_url":"https://api.github.com/users/ray-project/gists{/gist_id}","starred_url":"https://api.github.com/users/ray-project/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/ray-project/subscriptions","organizations_url":"https://api.github.com/users/ray-project/orgs","repos_url":"https://api.github.com/users/ray-project/repos","events_url":"https://api.github.com/users/ray-project/events{/privacy}","received_events_url":"https://api.github.com/users/ray-project/received_events","type":"Organization","user_view_type":"public","site_admin":false},"html_url":"https://github.com/ray-project/ray","description":"Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.","fork":false,"url":"https://api.github.com/repos/ray-project/ray","forks_url":"https://api.github.com/repos/ray-project/ray/forks","keys_url":"https://api.github.com/repos/ray-project/ray/keys{/key_id}","collaborators_url":"https://api.github.com/repos/ray-project/ray/collaborators{/collaborator}","teams_url":"https://api.github.com/repos/ray-project/ray/teams","hooks_url":"https://api.github.com/repos/ray-project/ray/hooks","issue_events_url":"https://api.github.com/repos/ray-project/ray/issues/events{/number}","events_url":"https://api.github.com/repos/ray-project/ray/events","assignees_url":"https://api.github.com/repos/ray-project/ray/assignees{/user}","branches_url":"https://api.github.com/repos/ray-project/ray/branches{/branch}","tags_url":"https://api.github.com/repos/ray-project/ray/tags","blobs_url":"https://api.github.com/repos/ray-project/ray/git/blobs{/sha}","git_tags_url":"https://api.github.com/repos/ray-project/ray/git/tags{/sha}","git_refs_url":"https://api.github.com/repos/ray-project/ray/git/refs{/sha}","trees_url":"https://api.github.com/repos/ray-project/ray/git/trees{/sha}","statuses_url":"https://api.github.com/repos/ray-project/ray/statuses/{sha}","languages_url":"https://api.github.com/repos/ray-project/ray/languages","stargazers_url":"https://api.github.com/repos/ray-project/ray/stargazers","contributors_url":"https://api.github.com/repos/ray-project/ray/contributors","subscribers_url":"https://api.github.com/repos/ray-project/ray/subscribers","subscription_url":"https://api.github.com/repos/ray-project/ray/subscription","commits_url":"https://api.github.com/repos/ray-project/ray/commits{/sha}","git_commits_url":"https://api.github.com/repos/ray-project/ray/git/commits{/sha}","comments_url":"https://api.github.com/repos/ray-project/ray/comments{/number}","issue_comment_url":"https://api.github.com/repos/ray-project/ray/issues/comments{/number}","contents_url":"https://api.github.com/repos/ray-project/ray/contents/{+path}","compare_url":"https://api.github.com/repos/ray-project/ray/compare/{base}...{head}","merges_url":"https://api.github.com/repos/ray-project/ray/merges","archive_url":"https://api.github.com/repos/ray-project/ray/{archive_format}{/ref}","downloads_url":"https://api.github.com/repos/ray-project/ray/downloads","issues_url":"https://api.github.com/repos/ray-project/ray/issues{/number}","pulls_url":"https://api.github.com/repos/ray-project/ray/pulls{/number}","milestones_url":"https://api.github.com/repos/ray-project/ray/milestones{/number}","notifications_url":"https://api.github.com/repos/ray-project/ray/notifications{?since,all,participating}","labels_url":"https://api.github.com/repos/ray-project/ray/labels{/name}","releases_url":"https://api.github.com/repos/ray-project/ray/releases{/id}","deployments_url":"https://api.github.com/repos/ray-project/ray/deployments","created_at":"2016-10-25T19:38:30Z","updated_at":"2025-10-17T05:23:53Z","pushed_at":"2025-10-17T06:33:34Z","git_url":"git://github.com/ray-project/ray.git","ssh_url":"git@github.com:ray-project/ray.git","clone_url":"https://github.com/ray-project/ray.git","svn_url":"https://github.com/ray-project/ray","homepage":"https://ray.io","size":610140,"stargazers_count":39362,"watchers_count":39362,"language":"Python","has_issues":true,"has_projects":true,"has_downloads":true,"has_wiki":false,"has_pages":false,"has_discussions":false,"forks_count":6780,"mirror_url":null,"archived":false,"disabled":false,"open_issues_count":3175,"license":{"key":"apache-2.0","name":"Apache License 2.0","spdx_id":"Apache-2.0","url":"https://api.github.com/licenses/apache-2.0","node_id":"MDc6TGljZW5zZTI="},"allow_forking":true,"is_template":false,"web_commit_signoff_required":true,"topics":["data-science","deep-learning","deployment","distributed","hyperparameter-optimization","hyperparameter-search","large-language-models","llm","llm-inference","llm-serving","machine-learning","optimization","parallel","python","pytorch","ray","reinforcement-learning","rllib","serving","tensorflow"],"visibility":"public","forks":6780,"open_issues":3175,"watchers":39362,"default_branch":"master"}},"_links":{"self":{"href":"https://api.github.com/repos/ray-project/ray/pulls/57698"},"html":{"href":"https://github.com/ray-project/ray/pull/57698"},"issue":{"href":"https://api.github.com/repos/ray-project/ray/issues/57698"},"comments":{"href":"https://api.github.com/repos/ray-project/ray/issues/57698/comments"},"review_comments":{"href":"https://api.github.com/repos/ray-project/ray/pulls/57698/comments"},"review_comment":{"href":"https://api.github.com/repos/ray-project/ray/pulls/comments{/number}"},"commits":{"href":"https://api.github.com/repos/ray-project/ray/pulls/57698/commits"},"statuses":{"href":"https://api.github.com/repos/ray-project/ray/statuses/1c22056eb240d871bb816191419f4a569ed3ae69"}},"author_association":"CONTRIBUTOR","auto_merge":null,"active_lock_reason":null,"merged":false,"mergeable":true,"rebaseable":true,"mergeable_state":"blocked","merged_by":null,"comments":0,"review_comments":22,"maintainer_can_modify":false,"commits":2,"additions":158,"deletions":1,"changed_files":3}
\ No newline at end of file
diff --git a/.cursor/.agent-tools/bb102264-6571-4877-84c7-35e4995687e3.txt b/.cursor/.agent-tools/bb102264-6571-4877-84c7-35e4995687e3.txt
new file mode 100644
index 000000000000..7f74073ddf18
--- /dev/null
+++ b/.cursor/.agent-tools/bb102264-6571-4877-84c7-35e4995687e3.txt
@@ -0,0 +1 @@
+[{"sha":"7eeb73f7f362f506a9d6358d9ff14f82c2fa0d48","filename":"doc/source/serve/advanced-guides/external-scaling-webhook.md","status":"added","additions":153,"deletions":0,"changes":153,"blob_url":"https://github.com/ray-project/ray/blob/1c22056eb240d871bb816191419f4a569ed3ae69/doc%2Fsource%2Fserve%2Fadvanced-guides%2Fexternal-scaling-webhook.md","raw_url":"https://github.com/ray-project/ray/raw/1c22056eb240d871bb816191419f4a569ed3ae69/doc%2Fsource%2Fserve%2Fadvanced-guides%2Fexternal-scaling-webhook.md","contents_url":"https://api.github.com/repos/ray-project/ray/contents/doc%2Fsource%2Fserve%2Fadvanced-guides%2Fexternal-scaling-webhook.md?ref=1c22056eb240d871bb816191419f4a569ed3ae69","patch":"@@ -0,0 +1,153 @@\n+(serve-external-scale-webhook)=\n+\n+:::{warning}\n+This API is in alpha and may change before becoming stable.\n+:::\n+\n+# External Scaling Webhook\n+\n+Ray Serve exposes a REST API endpoint that you can use to dynamically scale your deployments from outside the Ray cluster. This endpoint gives you flexibility to implement custom scaling logic based on any metrics or signals you choose, such as external monitoring systems, business metrics, or predictive models.\n+\n+## Overview\n+\n+The external scaling webhook provides programmatic control over the number of replicas for any deployment in your Ray Serve application. Unlike Ray Serve's built-in autoscaling, which scales based on queue depth and ongoing requests, this webhook allows you to scale based on any external criteria you define.\n+\n+## Prerequisites\n+\n+Before you can use the external scaling webhook, you must enable it in your Ray Serve application configuration:\n+\n+### Enable external scaler\n+\n+Set `external_scaler_enabled: true` in your application configuration:\n+\n+```yaml\n+applications:\n+  - name: my-app\n+    import_path: my_module:app\n+    external_scaler_enabled: true\n+    deployments:\n+      - name: my-deployment\n+        num_replicas: 1\n+```\n+\n+:::{warning}\n+External scaling and built-in autoscaling are mutually exclusive. You can't use both for the same application.\n+\n+- If you set `external_scaler_enabled: true`, you **must not** configure `autoscaling_config` on any deployment in that application.\n+- If you configure `autoscaling_config` on any deployment, you **must not** set `external_scaler_enabled: true` for the application.\n+\n+Attempting to use both will result in an error.\n+:::\n+\n+### Get authentication token\n+\n+The external scaling webhook requires authentication using a bearer token. You can obtain this token from the Ray Dashboard UI:\n+\n+1. Open the Ray Dashboard in your browser (typically at `http://localhost:8265`).\n+2. Navigate to the Serve section.\n+3. Find and copy the authentication token for your application.\n+\n+## API endpoint\n+\n+The webhook is available at the following endpoint:\n+\n+```\n+POST /api/v1/applications/{application_name}/deployments/{deployment_name}/scale\n+```\n+\n+**Path Parameters:**\n+- `application_name`: The name of your Serve application.\n+- `deployment_name`: The name of the deployment you want to scale.\n+\n+**Headers:**\n+- `Authorization` (required): Bearer token for authentication. Format: `Bearer <token>`\n+- `Content-Type` (required): Must be `application/json`\n+\n+**Request Body:**\n+\n+The following example shows the request body structure:\n+\n+```json\n+{\n+    \"target_num_replicas\": 5\n+}\n+```\n+\n+The request body must conform to the [`ScaleDeploymentRequest`](https://docs.ray.io/en/latest/serve/api/doc/ray.serve.schema.ScaleDeploymentRequest.html) schema:\n+\n+- `target_num_replicas` (integer, required): The target number of replicas for the deployment. Must be a non-negative integer.\n+\n+\n+## Example - Predictive scaling\n+\n+Implement predictive scaling based on historical patterns or forecasts. For instance, you can preemptively scale up before anticipated traffic spikes:\n+\n+```python\n+import requests\n+from datetime import datetime\n+\n+def predictive_scale(\n+    application_name: str,\n+    deployment_name: str,\n+    auth_token: str,\n+    serve_endpoint: str = \"http://localhost:8000\"\n+) -> bool:\n+    \"\"\"Scale based on time of day and historical patterns.\"\"\"\n+    hour = datetime.now().hour\n+    \n+    # Define scaling profile based on historical traffic patterns\n+    if 9 <= hour < 17:  # Business hours\n+        target_replicas = 10\n+    elif 17 <= hour < 22:  # Evening peak\n+        target_replicas = 15\n+    else:  # Off-peak hours\n+        target_replicas = 3\n+    \n+    url = (\n+        f\"{serve_endpoint}/api/v1/applications/{application_name}\"\n+        f\"/deployments/{deployment_name}/scale\"\n+    )\n+    \n+    headers = {\n+        \"Authorization\": f\"Bearer {auth_token}\",\n+        \"Content-Type\": \"application/json\"\n+    }\n+    \n+    response = requests.post(\n+        url,\n+        headers=headers,\n+        json={\"target_num_replicas\": target_replicas}\n+    )\n+    \n+    return response.status_code == 200\n+\n+```\n+\n+## Use cases\n+\n+The external scaling webhook is useful for several scenarios where you need custom scaling logic beyond what Ray Serve's built-in autoscaling provides:\n+\n+### Custom metric-based scaling\n+\n+Scale your deployments based on business or application metrics that Ray Serve doesn't track automatically:\n+\n+- External monitoring systems such as Prometheus, Datadog, or CloudWatch metrics.\n+- Database query latencies or connection pool sizes.\n+- Cost metrics to optimize for budget constraints.\n+\n+### Predictive and scheduled scaling\n+\n+Implement predictive scaling based on historical patterns or business schedules:\n+\n+- Preemptive scaling before anticipated traffic spikes (such as daily or weekly patterns).\n+- Event-driven scaling for known traffic events (such as sales, launches, or scheduled batch jobs).\n+- Time-of-day based scaling profiles for predictable workloads.\n+\n+### Manual and operational control\n+\n+Direct control over replica counts for operational scenarios:\n+\n+- Manual scaling for load testing or performance testing.\n+- Cost optimization by scaling down during off-peak hours or weekends.\n+- Development and staging environment management.\n+"},{"sha":"08669127d2640545a840badb6f06a95b75461f18","filename":"doc/source/serve/advanced-guides/index.md","status":"modified","additions":2,"deletions":0,"changes":2,"blob_url":"https://github.com/ray-project/ray/blob/1c22056eb240d871bb816191419f4a569ed3ae69/doc%2Fsource%2Fserve%2Fadvanced-guides%2Findex.md","raw_url":"https://github.com/ray-project/ray/raw/1c22056eb240d871bb816191419f4a569ed3ae69/doc%2Fsource%2Fserve%2Fadvanced-guides%2Findex.md","contents_url":"https://api.github.com/repos/ray-project/ray/contents/doc%2Fsource%2Fserve%2Fadvanced-guides%2Findex.md?ref=1c22056eb240d871bb816191419f4a569ed3ae69","patch":"@@ -16,6 +16,7 @@ deploy-vm\n multi-app-container\n custom-request-router\n multi-node-gpu-troubleshooting\n+external-scaling-webhook\n ```\n \n If you’re new to Ray Serve, start with the [Ray Serve Quickstart](serve-getting-started).\n@@ -33,3 +34,4 @@ Use these advanced guides for more options and configurations:\n - [Run Applications in Different Containers](serve-container-runtime-env-guide)\n - [Use Custom Algorithm for Request Routing](custom-request-router)\n - [Troubleshoot multi-node GPU setups for serving LLMs](multi-node-gpu-troubleshooting)\n+- [External Scaling Webhook API](external-scaling-webhook)"},{"sha":"60e5a2c046f8f071fb13f67dcba1f1d9c61d8c2f","filename":"doc/source/serve/production-guide/config.md","status":"modified","additions":3,"deletions":1,"changes":4,"blob_url":"https://github.com/ray-project/ray/blob/1c22056eb240d871bb816191419f4a569ed3ae69/doc%2Fsource%2Fserve%2Fproduction-guide%2Fconfig.md","raw_url":"https://github.com/ray-project/ray/raw/1c22056eb240d871bb816191419f4a569ed3ae69/doc%2Fsource%2Fserve%2Fproduction-guide%2Fconfig.md","contents_url":"https://api.github.com/repos/ray-project/ray/contents/doc%2Fsource%2Fserve%2Fproduction-guide%2Fconfig.md?ref=1c22056eb240d871bb816191419f4a569ed3ae69","patch":"@@ -40,7 +40,8 @@ applications:\n - name: ...\n   route_prefix: ...\n   import_path: ...\n-  runtime_env: ... \n+  runtime_env: ...\n+  external_scaler_enabled: ...\n   deployments:\n   - name: ...\n     num_replicas: ...\n@@ -99,6 +100,7 @@ These are the fields per `application`:\n - **`route_prefix`**: An application can be called via HTTP at the specified route prefix. It defaults to `/`. The route prefix for each application must be unique.\n - **`import_path`**: The path to your top-level Serve deployment (or the same path passed to `serve run`). The most minimal config file consists of only an `import_path`.\n - **`runtime_env`**: Defines the environment that the application runs in. Use this parameter to package application dependencies such as `pip` packages (see {ref}`Runtime Environments <runtime-environments>` for supported fields). The `import_path` must be available _within_ the `runtime_env` if it's specified. The Serve config's `runtime_env` can only use [remote URIs](remote-uris) in its `working_dir` and `py_modules`; it can't use local zip files or directories. [More details on runtime env](serve-runtime-env).\n+- **`external_scaler_enabled`**: Enables the external scaling webhook, which lets you scale deployments from outside the Ray cluster using a REST API. When enabled, you can't use built-in autoscaling (`autoscaling_config`) for any deployment in this application. Defaults to `False`. See [External Scaling Webhook](serve-external-scale-webhook) for details.\n - **`deployments (optional)`**: A list of deployment options that allows you to override the `@serve.deployment` settings specified in the deployment graph code. Each entry in this list must include the deployment `name`, which must match one in the code. If this section is omitted, Serve launches all deployments in the graph with the parameters specified in the code. See how to [configure serve deployment options](serve-configure-deployment).\n - **`args`**: Arguments that are passed to the [application builder](serve-app-builder-guide).\n "}]
\ No newline at end of file
diff --git a/doc/source/serve/advanced-guides/advanced-autoscaling.md b/doc/source/serve/advanced-guides/advanced-autoscaling.md
index 8b25cd0e2ce6..0e0cb4e204e5 100644
--- a/doc/source/serve/advanced-guides/advanced-autoscaling.md
+++ b/doc/source/serve/advanced-guides/advanced-autoscaling.md
@@ -670,6 +670,110 @@ In your policy, access custom metrics via:
 * **`ctx.aggregated_metrics[metric_name]`** — A time-weighted average computed from the raw metric values for each replica.
 
 
+(serve-external-scale-webhook)=
+## External scaling webhook
+
+::::{warning}
+This API is in alpha and may change before becoming stable.
+::::
+
+Programmatically scale a deployment's replicas from outside the Ray cluster via a REST endpoint. Use this when you want scaling driven by external metrics, scheduled/predictive logic, or manual ops control—beyond the built-in request-driven autoscaler or custom policies.
+
+### Enable in your application config
+
+```yaml
+applications:
+  - name: my-app
+    import_path: my_module:app
+    external_scaler_enabled: true
+    deployments:
+      - name: my-deployment
+        num_replicas: 1
+```
+
+::::{warning}
+External scaling and built-in autoscaling are mutually exclusive for a given application.
+
+- If you set `external_scaler_enabled: true`, do not configure `autoscaling_config` on any deployment in that application.
+- If any deployment uses `autoscaling_config`, do not set `external_scaler_enabled: true` for the application.
+
+Attempting to use both results in an error.
+::::
+
+### Authentication
+
+Use a bearer token from the Ray Dashboard (Serve tab). Send it as `Authorization: Bearer <token>`.
+
+### API endpoint
+
+```
+POST /api/v1/applications/{application_name}/deployments/{deployment_name}/scale
+```
+
+- Path params: `application_name`, `deployment_name`
+- Headers: `Authorization: Bearer <token>`, `Content-Type: application/json`
+- Body (see [`ScaleDeploymentRequest`](https://docs.ray.io/en/latest/serve/api/doc/ray.serve.schema.ScaleDeploymentRequest.html)):
+
+```json
+{ "target_num_replicas": 5 }
+```
+
+### Minimal Python example (predictive scaling)
+
+```python
+import requests
+from datetime import datetime
+
+def predictive_scale(
+    application_name: str,
+    deployment_name: str,
+    auth_token: str,
+    serve_endpoint: str = "http://localhost:8000",
+    request_timeout_s: float = 10.0,
+) -> bool:
+    """Example: scale based on time-of-day profile.
+
+    Returns True on HTTP 2xx, False otherwise.
+    """
+    hour = datetime.now().hour
+
+    if 9 <= hour < 17:  # Business hours
+        target_replicas = 10
+    elif 17 <= hour < 22:  # Evening peak
+        target_replicas = 15
+    else:  # Off-peak hours
+        target_replicas = 3
+
+    url = (
+        f"{serve_endpoint}/api/v1/applications/{application_name}"
+        f"/deployments/{deployment_name}/scale"
+    )
+
+    headers = {
+        "Authorization": f"Bearer {auth_token}",
+        "Content-Type": "application/json",
+    }
+
+    try:
+        response = requests.post(
+            url,
+            headers=headers,
+            json={"target_num_replicas": target_replicas},
+            timeout=request_timeout_s,
+        )
+        response.raise_for_status()
+        return response.ok
+    except requests.RequestException:
+        return False
+```
+
+### Common use cases
+
+- Custom metric scaling: External monitoring (Prometheus/Datadog/CloudWatch), DB latency, cost caps
+- Predictive/scheduled: Anticipated spikes, event windows, time-of-day profiles
+- Manual/operational control: Load tests, off-peak cost optimization, staging envs
+
+
 ### Application level autoscaling
 
 By default, each deployment in Ray Serve autoscales independently. When you have multiple deployments that need to scale in a coordinated way—such as deployments that share backend resources, have dependencies on each other, or need load-aware routing—you can define an **application-level autoscaling policy**. This policy makes scaling decisions for all deployments within an application simultaneously.
diff --git a/doc/source/serve/production-guide/config.md b/doc/source/serve/production-guide/config.md
index fa5eff346fe5..cf19cbd88ee6 100644
--- a/doc/source/serve/production-guide/config.md
+++ b/doc/source/serve/production-guide/config.md
@@ -40,7 +40,8 @@ applications:
 - name: ...
   route_prefix: ...
   import_path: ...
-  runtime_env: ... 
+  runtime_env: ...
+  external_scaler_enabled: ...
   deployments:
   - name: ...
     num_replicas: ...
@@ -99,6 +100,7 @@ These are the fields per `application`:
 - **`route_prefix`**: An application can be called via HTTP at the specified route prefix. It defaults to `/`. The route prefix for each application must be unique.
 - **`import_path`**: The path to your top-level Serve deployment (or the same path passed to `serve run`). The most minimal config file consists of only an `import_path`.
 - **`runtime_env`**: Defines the environment that the application runs in. Use this parameter to package application dependencies such as `pip` packages (see {ref}`Runtime Environments <runtime-environments>` for supported fields). The `import_path` must be available _within_ the `runtime_env` if it's specified. The Serve config's `runtime_env` can only use [remote URIs](remote-uris) in its `working_dir` and `py_modules`; it can't use local zip files or directories. [More details on runtime env](serve-runtime-env).
+- **`external_scaler_enabled`**: Enables the external scaling webhook, which lets you scale deployments from outside the Ray cluster using a REST API. When enabled, you can't use built-in autoscaling (`autoscaling_config`) for any deployment in this application. Defaults to `False`. See [External scaling webhook](serve-external-scale-webhook) for details.
 - **`deployments (optional)`**: A list of deployment options that allows you to override the `@serve.deployment` settings specified in the deployment graph code. Each entry in this list must include the deployment `name`, which must match one in the code. If this section is omitted, Serve launches all deployments in the graph with the parameters specified in the code. See how to [configure serve deployment options](serve-configure-deployment).
 - **`args`**: Arguments that are passed to the [application builder](serve-app-builder-guide).