-
Notifications
You must be signed in to change notification settings - Fork 678
[RayService] Support Incremental Zero-Downtime Upgrades #3166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 50 commits
5b9bc1d
6e880ff
d6781f8
21bc32a
f66b3a3
0c2e82c
f72e6e8
c25a3fe
5bd9ac1
9551928
14e73c5
424d4a0
9d6070b
5ceae50
dc5018f
e7af14b
bdcd401
0d145bc
ebbd280
7ac3371
fd5a657
a87fb1e
5e09293
3bc8ab5
df4e4fd
f665353
f5fb7ae
a553b1e
7e231db
acdcc8a
44faa8e
cbb1f25
3cd620f
64661fe
33060da
e89c1b4
023fd6c
629d0b6
65156a6
c19068b
5d953a8
3af80d6
c68c4cf
8e1ade4
f94b996
7f20ecb
e380bcc
4736990
638cbea
7f88d2f
b61b91f
71f19a9
c23f901
e935156
f04fee1
18be954
2073680
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -22,6 +22,9 @@ const ( | |
| type RayServiceUpgradeType string | ||
|
|
||
| const ( | ||
| // During upgrade, IncrementalUpgrade strategy will create an upgraded cluster to gradually scale | ||
| // and migrate traffic to using Gateway API. | ||
| IncrementalUpgrade RayServiceUpgradeType = "IncrementalUpgrade" | ||
|
||
| // During upgrade, NewCluster strategy will create new upgraded cluster and switch to it when it becomes ready | ||
| NewCluster RayServiceUpgradeType = "NewCluster" | ||
| // No new cluster will be created while the strategy is set to None | ||
|
|
@@ -57,10 +60,27 @@ var DeploymentStatusEnum = struct { | |
| UNHEALTHY: "UNHEALTHY", | ||
| } | ||
|
|
||
| // These options are currently only supported for the IncrementalUpgrade type. | ||
| type ClusterUpgradeOptions struct { | ||
| // The capacity of serve requests the upgraded cluster should scale to handle each interval. | ||
| // Defaults to 100%. | ||
| // +kubebuilder:default:=100 | ||
| MaxSurgePercent *int32 `json:"maxSurgePercent,omitempty"` | ||
| // The percentage of traffic to switch to the upgraded RayCluster at a set interval after scaling by MaxSurgePercent. | ||
| StepSizePercent *int32 `json:"stepSizePercent"` | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's the reason for stepSizePercent and IntervalSeconds not having defaults?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wanted to require users to have to specify I could set them to some safe values like 25% and 10 seconds respectively if that's preferred just for ease of use. |
||
| // The interval in seconds between transferring StepSize traffic from the old to new RayCluster. | ||
| IntervalSeconds *int32 `json:"intervalSeconds"` | ||
| // The name of the Gateway Class installed by the Kubernetes Cluster admin. | ||
| GatewayClassName string `json:"gatewayClassName"` | ||
| } | ||
|
|
||
| type RayServiceUpgradeStrategy struct { | ||
| // Type represents the strategy used when upgrading the RayService. Currently supports `NewCluster` and `None`. | ||
| // +optional | ||
| Type *RayServiceUpgradeType `json:"type,omitempty"` | ||
| // ClusterUpgradeOptions defines the behavior of an IncrementalUpgrade. | ||
| // RayServiceIncrementalUpgrade feature gate must be enabled to set ClusterUpgradeOptions. | ||
| ClusterUpgradeOptions *ClusterUpgradeOptions `json:"clusterUpgradeOptions,omitempty"` | ||
| } | ||
|
|
||
| // RayServiceSpec defines the desired state of RayService | ||
|
|
@@ -130,6 +150,12 @@ type RayServiceStatus struct { | |
| // +optional | ||
| Applications map[string]AppStatus `json:"applicationStatuses,omitempty"` | ||
| // +optional | ||
| TargetCapacity *int32 `json:"targetCapacity,omitempty"` | ||
ryanaoleary marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| // +optional | ||
| TrafficRoutedPercent *int32 `json:"trafficRoutedPercent,omitempty"` | ||
| // +optional | ||
| LastTrafficMigratedTime *metav1.Time `json:"lastTrafficMigratedTime,omitempty"` | ||
Future-Outlier marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| // +optional | ||
| RayClusterName string `json:"rayClusterName,omitempty"` | ||
| // +optional | ||
| RayClusterStatus RayClusterStatus `json:"rayClusterStatus,omitempty"` | ||
|
|
@@ -184,8 +210,7 @@ const ( | |
| type RayService struct { | ||
| metav1.TypeMeta `json:",inline"` | ||
| metav1.ObjectMeta `json:"metadata,omitempty"` | ||
|
|
||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why this change?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's the result of running |
||
| Spec RayServiceSpec `json:"spec,omitempty"` | ||
| Spec RayServiceSpec `json:"spec,omitempty"` | ||
| // +optional | ||
| Status RayServiceStatuses `json:"status,omitempty"` | ||
| } | ||
|
|
||
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe too late to change this, but wondering if
RollingUpgradebe a more appropriate name? I assume most people are more familiar with this term. WDYT @ryanaoleary @kevin85421 @MortalHappinessThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(not blocking this PR, we can cahnge it during alpha phase)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Late to reply to this, but I have no strong preference either way.
IncrementalUpgradeis what was used in the feature request and REP so that's why I stuck with it, but if there's a preference from any KubeRay maintainers or users I'm down to go through and change the feature name / all the related variable names.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @rueian for sharing opinion.
I think
RollingUpgradeis more a more straight forward name for me tooThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc: @kevin85421 since from offline discussion you seemed to have a preference against using
RollingUpgradehereThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kevin85421 what do you think about
ClusterUpgradeandClusterUpgradeOptions? I prefer to keep the upgrade term generic as the exact behavior could be changed in the future.@Future-Outlier was also wondering about the history of why we called it "incremental" upgrades.