Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Serve] Autoscaler thrashing behavior when delay_s < look_back_period #26455

Open
simon-mo opened this issue Jul 11, 2022 · 1 comment
Open
Assignees
Labels
bug Something that is supposed to be working; but isn't P1 Issue that should be fixed within a few weeks ray-team-created Ray Team created serve Ray Serve Related Issue

Comments

@simon-mo
Copy link
Contributor

What happened + What you expected to happen

When the downscale_delay and upscale_delay is < look_back_period_s, there are weird behavior that the replicas will be started and shutdown repeatedly until the metric stabilizes.

Log

(base) ➜  bugbash-autoscaling serve run app:a
2022-07-11 15:52:24,315	INFO scripts.py:253 -- Deploying from import path: "app:a".
2022-07-11 15:52:26,210	INFO services.py:1477 -- View the Ray dashboard at http://127.0.0.1:8265
(ServeController pid=84130) INFO 2022-07-11 15:52:27,628 controller 84130 checkpoint_path.py:17 - Using RayInternalKVStore for controller checkpoint and recovery.
(ServeController pid=84130) INFO 2022-07-11 15:52:27,630 controller 84130 http_state.py:115 - Starting HTTP proxy with name 'SERVE_CONTROLLER_ACTOR:SERVE_PROXY_ACTOR-node:127.0.0.1-0' on node 'node:127.0.0.1-0' listening on '127.0.0.1:8000'
(HTTPProxyActor pid=84143) INFO:     Started server process [84143]
2022-07-11 15:52:29,245	SUCC scripts.py:266 -- Deployed successfully.
(ServeController pid=84130) INFO 2022-07-11 15:52:38,317 controller 84130 deployment_state.py:1280 - Adding 1 replicas to deployment 'A'.
(A pid=84151) new replica
(HTTPProxyActor pid=84143) INFO 2022-07-11 15:52:39,966 http_proxy 127.0.0.1 http_proxy.py:311 - GET / 200 6166.0ms
(A pid=84151) INFO 2022-07-11 15:52:39,963 A A#reDXbd replica.py:467 - HANDLE __call__ OK 1004.6ms
(ServeController pid=84130) INFO 2022-07-11 15:52:40,117 controller 84130 deployment_state.py:1303 - Removing 1 replicas from deployment 'A'.
(ServeController pid=84130) INFO 2022-07-11 15:52:42,247 controller 84130 deployment_state.py:1280 - Adding 1 replicas to deployment 'A'.
(A pid=84156) new replica
(ServeController pid=84130) INFO 2022-07-11 15:52:44,057 controller 84130 deployment_state.py:1303 - Removing 1 replicas from deployment 'A'.
(ServeController pid=84130) INFO 2022-07-11 15:52:46,196 controller 84130 deployment_state.py:1280 - Adding 1 replicas to deployment 'A'.
(A pid=84158) new replica
(ServeController pid=84130) INFO 2022-07-11 15:52:47,975 controller 84130 deployment_state.py:1303 - Removing 1 replicas from deployment 'A'.
(ServeController pid=84130) INFO 2022-07-11 15:52:50,140 controller 84130 deployment_state.py:1280 - Adding 1 replicas to deployment 'A'.
(A pid=84163) new replica
(ServeController pid=84130) INFO 2022-07-11 15:52:51,972 controller 84130 deployment_state.py:1303 - Removing 1 replicas from deployment 'A'.
(ServeController pid=84130) INFO 2022-07-11 15:52:54,112 controller 84130 deployment_state.py:1280 - Adding 1 replicas to deployment 'A'.
(A pid=84166) new replica
(ServeController pid=84130) INFO 2022-07-11 15:52:56,030 controller 84130 deployment_state.py:1303 - Removing 1 replicas from deployment 'A'.
(ServeController pid=84130) INFO 2022-07-11 15:52:58,161 controller 84130 deployment_state.py:1280 - Adding 1 replicas to deployment 'A'.
(A pid=84168) new replica
(ServeController pid=84130) INFO 2022-07-11 15:52:59,960 controller 84130 deployment_state.py:1303 - Removing 1 replicas from deployment 'A'.
(ServeController pid=84130) INFO 2022-07-11 15:53:02,083 controller 84130 deployment_state.py:1280 - Adding 1 replicas to deployment 'A'.
(A pid=84170) new replica
(ServeController pid=84130) INFO 2022-07-11 15:53:03,883 controller 84130 deployment_state.py:1303 - Removing 1 replicas from deployment 'A'.
(ServeController pid=84130) INFO 2022-07-11 15:53:06,025 controller 84130 deployment_state.py:1280 - Adding 1 replicas to deployment 'A'.
(A pid=84172) new replica
(ServeController pid=84130) INFO 2022-07-11 15:53:07,810 controller 84130 deployment_state.py:1303 - Removing 1 replicas from deployment 'A'.
(ServeController pid=84130) INFO 2022-07-11 15:53:09,939 controller 84130 deployment_state.py:1280 - Adding 1 replicas to deployment 'A'.
(A pid=84174) new replica
(ServeController pid=84130) INFO 2022-07-11 15:53:11,734 controller 84130 deployment_state.py:1303 - Removing 1 replicas from deployment 'A'.
# then stabilized 

Versions / Dependencies

master

Reproduction script

from ray import serve
import time


@serve.deployment
class A:
    def __init__(self):
        print("new replica")

    def __call__(self):
        time.sleep(1)
        return "hi"


config = {
    "min_replicas": 0,
    "max_replicas": 10,
    "downscale_delay_s": 1,
    "upscale_delay_s": 1,
}

a = A.options(_autoscaling_config=config).bind()

Issue Severity

No response

@simon-mo simon-mo added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jul 11, 2022
@architkulkarni
Copy link
Contributor

+1 independently ran into this

@richardliaw richardliaw added the serve Ray Serve Related Issue label Oct 7, 2022
@sihanwang41 sihanwang41 added the P2 Important issue, but not time-critical label Oct 26, 2022
@DmitriGekhtman DmitriGekhtman removed the triage Needs triage (eg: priority, bug/not-bug, and owning component) label Nov 14, 2022
@sihanwang41 sihanwang41 added P1 Issue that should be fixed within a few weeks and removed P2 Important issue, but not time-critical labels Mar 23, 2023
@richardliaw richardliaw added the ray-team-created Ray Team created label Apr 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't P1 Issue that should be fixed within a few weeks ray-team-created Ray Team created serve Ray Serve Related Issue
Projects
None yet
Development

No branches or pull requests

7 participants