-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Support slow Start mode in Envoy #13176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 41 commits
4173b08
2f8dad0
627c910
ed27cb7
80cd8eb
161cbaf
d7f395c
944607e
f966f8b
f61216c
6bfb2e0
a02698d
23e517e
a4f697d
d0f2cd2
3ab3951
e5a8534
e9e93ea
a365f6e
2156dc3
fe0e551
38f792a
bbc3fda
7d1cdb4
18f0463
3cf6f9a
1510abb
c038daf
c4b8f8b
bd87893
0963656
33737f8
0cbdbe7
43b2f54
4e8b9d7
78be70e
dc1bb99
5d5d231
fdbbd5f
b371ece
1602a7b
bf32ee5
9d96d4b
f1670a9
7a495e0
941a43e
3467ca4
bd467d6
7d8022d
49cd453
c6f2b86
514dabf
96d7b76
6a98431
74557b9
1a23da6
3e4f49a
4a2a508
875c763
ccc9338
19d288d
2766a4f
a2b1261
5e18212
2e9d0ff
2002d00
2128535
224daa2
b3c5c43
4d3efe7
3ed1ff4
e4a3c84
5c587e9
7f4b258
9ed50d9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -15,3 +15,4 @@ Load Balancing | |
| original_dst | ||
| zone_aware | ||
| subsets | ||
| slow_start | ||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,57 @@ | ||||||
| .. _arch_overview_load_balancing_slow_start: | ||||||
|
mattklein123 marked this conversation as resolved.
|
||||||
|
|
||||||
| Slow start mode | ||||||
| =============== | ||||||
|
|
||||||
| Slow start mode is a configuration setting in Envoy to progressively increase amount of traffic for newly added upstream endpoints. | ||||||
| With no slow start enabled Envoy would send a proportional amount of traffic to new upstream ednpoints. | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you fix the spelling in this doc as well, please. I'm not sure why the build didn't catch this.. I thought it used to.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was also surprised that spell check did not catch those, perhaps it does not cover new .rst files...Fixed (with online spell checker) |
||||||
| This could be undesirable for services that require warm up time to serve full production load and could result in request timeouts, loss of data and deteriorated user experience. | ||||||
|
|
||||||
| Slow start mode is a mechanism that affects load balancing weight of upstream endpoints and can be configured per upstream cluster. | ||||||
| Currently, slow start is supported in Round Robin and Least Request load balancer types. | ||||||
|
nezdolik marked this conversation as resolved.
Outdated
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: can you ref link to the relevant fields for each type? |
||||||
|
|
||||||
| Users can specify a :ref:`slow start window parameter<envoy_v3_api_field_config.cluster.v3.Cluster.CommonLbConfig.SlowStartConfig.slow_start_window>` (in seconds), so that if endpoint “cluster membership duration" (amount of time since it has joined the cluster) is within the configured window, it enters slow start mode. | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. good catch, hard to spot |
||||||
| During slow start window, load balancing weight of a particular endpoint will be scaled with :ref:`time bias parameter<envoy_v3_api_field_config.cluster.v3.Cluster.CommonLbConfig.SlowStartConfig.time_bias>`, e.g.: | ||||||
| `weight = load_balancing_weight * time_bias * time_factor`. | ||||||
| Time factor is value that increases as time progresses, and is calculated like: | ||||||
| `time_factor = (1 / slow_start_window_seconds) * endpoint_create_duration_seconds` | ||||||
|
|
||||||
| The longer slow start window is the less traffic would be sent to endpoint as time advances within slow start window. | ||||||
|
|
||||||
| Whenever a slow start window duration elapses, upstream endpoint exits slow start mode and gets regular amount of traffic acccording to load balanacing algorithm. | ||||||
| Its load balancing weight will no longer be scaled with runtime bias. Endpoint could also exit slow start mode in case it leaves the cluster. | ||||||
|
|
||||||
| To reiterate, endpoint enters slow start mode when: | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| * If no active healthcheck is configured per cluster, immediately if its cluster membership duration is within slow start window. | ||||||
| * In case an active healthcheck is configured per cluster, when its cluster membership duration is within slow start window and endpoint has passed an active healthcheck. | ||||||
| If endpoint does not pass an active healcheck during entire slow start window (since it has been added to upstream cluster), then it never enters slow start mode. | ||||||
|
|
||||||
| Endpoint exits slow start mode when: | ||||||
| * It leaves the cluster. | ||||||
| * Its cluster membership duration is greater than slow start window. | ||||||
| * It does not pass an active healcheck configured per cluster. | ||||||
| Endpoint could further re-enter slow start, if it passes an active healtcheck and its creation time is within slow start window. | ||||||
|
|
||||||
| Below is example of how requests would be distributed across endpoints with Round Robin Loadbalancer, slow start window of 10 seconds, no active healcheck and 0.5 time bias. | ||||||
| Endpoint E1 has statically configured initial weight of X and endpoint E2 weight of Y, the actual numerical values are of no significance for this example. | ||||||
|
|
||||||
| +-------------+--------------------+------------+------------+-----------+----------+-------------+ | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This diagram is pretty confusing to me. I get what's trying to be conveyed, but I'm not sure why the events are significant and the timestamps are hard to reason about. There's got to be a better way. Perhaps a graph with time on the x-axis and weights on the y-axis would be a bit easier to parse? Similar to the graph you have showing the effect of aggression, but each line should be a different endpoint.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. have not started this one |
||||||
| | Timestamp | Event | E1 in slow | E2 in slow | E1 LB | E2 LB | LB decision | | ||||||
| | | | start | start | weight | weight | | | ||||||
| +=============+====================+============+============+===========+==========+=============+ | ||||||
| | 1 | E1 create | YES | -- | 0.5X | -- | -- | | ||||||
| +-------------+--------------------+------------+------------+-----------+----------+-------------+ | ||||||
| | 11 | E2 create | NO | YES | X | 0.5Y | -- | | ||||||
| +-------------+--------------------+------------+------------+-----------+----------+-------------+ | ||||||
| | 12 | LB select endpoint | NO | YES | X | 0.5Y | E1 | | ||||||
| +-------------+--------------------+------------+------------+-----------+----------+-------------+ | ||||||
| | 13 | LB select endpoint | NO | YES | X | 0.5Y | E1 | | ||||||
| +-------------+--------------------+------------+------------+-----------+----------+-------------+ | ||||||
| | 14 | LB select endpoint | NO | YES | X | 0.5Y | E1 | | ||||||
| +-------------+--------------------+------------+------------+-----------+----------+-------------+ | ||||||
| | 15 |LB select endpoint | NO | YES | X | 0.5Y | E2 | | ||||||
| +-------------+--------------------+------------+------------+-----------+----------+-------------+ | ||||||
| | 22 | LB select endpoint | NO | NO | X | Y | E1 | | ||||||
| +-------------+--------------------+------------+------------+-----------+----------+-------------+ | ||||||
| | 23 | LB select endpoint | NO | NO | X | Y | E2 | | ||||||
| +-------------+--------------------+------------+------------+-----------+----------+-------------+ | ||||||
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason this wasn't a Duration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no smart reason for it