[RFC] [Serve] Custom Scaling #41135

GeneDer · 2023-11-14T21:56:41Z

This RFC outlines a proposal to add support for custom scaling. This will provide a new way for users to proactively scaling deployment replicas freely based on any metrics and provide better quality of service.

Problem

The current autoscaling in Serve only look at request queue size for each deployment and scale based off the traffic reactively. Users have needs for a more proactively way to scale the deployments and on metrics other than the request queue size. Few example usecases including but not limited to:

Scheduled burst requests coming in at a specific time of day. Serve should be able to scale replicas proactively before the burst to handle the requests
Users who uses third party autoscaler (e.g. keda) or metrics stored outside of Ray (e.g. CloudWatch, Data Dog...etc) wants a way to integrate with Serve and scale accordingly before the traffic is hit
QoS for example X clients sharing one replica and need to scale proactively based on outside metrics to ensure the quality of service will be met

Proposal

Users will specify the serve.yaml like below

# serve.yaml
applications:
  - name: default
    import_path: resnet:app
    deployments:
    - name: Model
      max_concurrent_queries: 5
      autoscaling_config:
        target_num_ongoing_requests_per_replica: 1
        min_replicas: 0
        initial_replicas: 0
        max_replicas: 200
        policy: "autoscaling_helpers:resnet_app"  <-- New config to determine the scale for this deployment

Alternatively, users will also be able to pass the policy as a string or callable directly to the serve.deployment decorator

...
from autoscaling_helpers import resnet_app


@serve.deployment(
    ray_actor_options={"num_cpus": 1},
    max_concurrent_queries=5,
    autoscaling_config={
        "target_num_ongoing_requests_per_replica": 1,
        "min_replicas": 0,
        "initial_replicas": 0,
        "max_replicas": 200,
        "policy": resnet_app,  # Or the string "autoscaling_helpers:resnet_app"
    },
)
class Model:
...

And users can define the policy function like any of the ones below. The function will be called from inside of controller's event loop and scale the deployment based on the return value from those functions

# autoscaling_helpers.py
def resnet_app(context: AutoscalingContext) -> int: 
    # scale the number of replicas to have a constant 10 requests per replica + 10 replicas to standby
    return context.request_queue_size * 10 + 10


def burst_request_scaling(context: AutoscalingContext) -> int:
    # scale based on time of day
    current_hour = datetime.datetime.utcnow().hour
    if 14 <= current_hour < 16:
        return 100

    return 5 


def thrid_party_integration(context: AutoscalingContext) -> int:
    # query outside metrics and acting on it
    custom_metrics = CloudWatch.Client.get_metric_data("my_metrics...")  # getting metrics from outside Ray
    return custom_metrics.metrics1 * 2 + custom_metrics.metrics2 + 3

Changes

Add an optional config policy to the deployment's autoscaling_config. This will work similar to import_path where it's a string pointed to a callable import path and gets imported at the time of deployment.
- Defaulting the current request based scaling as the default policy
- When the config is passed from serve.deployment decorator, either import path string or callable can be used
Add a new AutoscalingPolicyManager and refactor the current autoscaling logics into this class.
Add a new AutoscalingContext which provides some default metrics tracked in Ray such as request queue size, number of current replicas, config...etc
- The proposed fields for AutoscalingContext are
  - config: AutoscalingConfig the deployment started with
  - curr_target_num_replicas: The number of replicas that the deployment is currently trying to scale to.
  - current_num_ongoing_requests: List of number of ongoing requests for each replica.
  - current_handle_queued_queries: The number of handle queued queries, if there are multiple handles, the max number of queries at a single handle should be passed in
  - capacity_adjusted_min_replicas: The min_replica of the deployment adjusted by the target capacity.
  - capacity_adjusted_max_replicas: The max_replica of the deployment adjusted by the target capacity.
  - policy_state: Python dictionary to track the state between each autoscaling call. Custom autoscaling policy can add and update any fields in here and reuse for the the call.
    - last_scale_time: Updated by the autoscaling manager to track the timestamp of last scaled time. Will be None If not scaled yet.
  - app_name: The name of the application.
  - deployment_name: The name of the deployment.
Modify the update logics in the deployment state to call on the new AutoscalingPolicyManager methods.
- The call would need to be running on a separate thread and have some kind of timeout and loggings to it doesn't block the main event loop

The text was updated successfully, but these errors were encountered:

antoniomdk · 2023-11-14T23:33:40Z

Although this seems to not be taking into account K8s native scalers, for the Keda particular use case, users could still use the Metric Server with a dummy ScaledObject to retrieve metrics and scale upon them. May not be ideal, but I thought it was worth mentioning there is a way around it.

edoakes · 2024-02-02T20:38:51Z

@GeneDer P0 is reserved for time-critical bugs

GeneDer added serve Ray Serve Related Issue RFC RFC issues labels Nov 14, 2023

GeneDer self-assigned this Nov 14, 2023

akshay-anyscale changed the title ~~[Serve] Custom Scaling~~ [RFC] [Serve] Custom Scaling Nov 14, 2023

GeneDer mentioned this issue Dec 21, 2023

[Serve] Add policy config to AutoscalingConfig #42072

Merged

8 tasks

This was referenced Jan 8, 2024

[Serve] Refactor ReplicaQueueLengthAutoscalingPolicy into AutoscalingPolicyManager and policy function #42242

Merged

Add autoscaling context #42284

Closed

GeneDer added ray-2.10 P0 Issues that should be fixed in short order labels Jan 25, 2024

edoakes added P1 Issue that should be fixed within a few weeks and removed P0 Issues that should be fixed in short order labels Feb 2, 2024

edoakes added P2 Important issue, but not time-critical and removed P1 Issue that should be fixed within a few weeks labels Feb 14, 2024

GeneDer added ray-2.11 and removed ray-2.10 labels Feb 20, 2024

edoakes removed the ray-2.11 label May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] [Serve] Custom Scaling #41135

[RFC] [Serve] Custom Scaling #41135

GeneDer commented Nov 14, 2023 •

edited

Loading

antoniomdk commented Nov 14, 2023

edoakes commented Feb 2, 2024

[RFC] [Serve] Custom Scaling #41135

[RFC] [Serve] Custom Scaling #41135

Comments

GeneDer commented Nov 14, 2023 • edited Loading

Problem

Proposal

Changes

antoniomdk commented Nov 14, 2023

edoakes commented Feb 2, 2024

GeneDer commented Nov 14, 2023 •

edited

Loading