Support slow Start mode in Envoy#13176
Conversation
|
Planning to use callback mechanism for edf loadbalncer to be aware of which hosts are in slow start mode. |
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
8516d7e to
4173b08
Compare
|
@nezdolik lmk when you want a first pass on this! /wait |
| // Configuration for slow start mode. | ||
| // [#next-free-field: 3] | ||
| message SlowStartConfig { | ||
| google.protobuf.UInt32Value slow_start_window = 1; |
There was a problem hiding this comment.
thanks for review @htuch, i will fix api+docs once PR is in more mature state.
| } | ||
|
|
||
| enum EndpointWarmingPolicy { | ||
| NO_WAIT = 0; |
There was a problem hiding this comment.
Please add comment to enum values.
| WAIT_FOR_FIRST_PASSING_HC = 1; | ||
| } | ||
|
|
||
| // Configuration for slow start mode. |
There was a problem hiding this comment.
Can you write some Envoy docs for this and link from here? I'd suggest translating the design doc into RST and then cleaning that up a bit for end users.
|
I'm interested in this, would |
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
|
@sschepens currently this is the logic for This may be not he final version, but currently it checks host health flags. If passive HC keeps those flags up to date it should work. |
|
@mattklein123 @snowp need your initial thoughts on suggested approach for tracking hosts in slow start. The code is still wip and plenty of things will be reworked (eg todos, duplicated code, fix format etc). |
mattklein123
left a comment
There was a problem hiding this comment.
Thanks for working on this. The shape of this LGTM but I would definitely be interested in hearing from @snowp @antoniovicente @tonya11en if they have other impl ideas. Thank you!
/wait
| endpoint_warming_policy; | ||
| const uint32_t slow_start_window; | ||
| TimeSource& time_source_; | ||
| absl::node_hash_set<HostSharedPtr> hosts_in_slow_start_; |
There was a problem hiding this comment.
You should be able to use a flat_hash_set here.
There was a problem hiding this comment.
it's using absl::btree_set now
| // If all hosts are out of the window, we no longer need to track them and therefore we erase | ||
| // tracked hosts set. | ||
| if (current_time - latest_host_added_time > slow_start_window_ms) { | ||
| hosts_in_slow_start_.erase(hosts_in_slow_start_.begin(), hosts_in_slow_start_.end()); |
|
Just realised that storing only time of latest added host will not work, for example in case host is added to the cluster and then immediately removed. It needs to be more complex data structure, that supports ordering by time, querying for latest time and lookups by host. |
Assuming we stick with the high level approach, I think you could probably use |
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
|
Applied the latest review comments, fingers crossed that all checks will pass |
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
|
@mattklein123 am not sure why |
mattklein123
left a comment
There was a problem hiding this comment.
Awesome, thanks. Just one small comment and then let's ship!
/wait
| // 2021/08/15 17290 40349 add all host map to priority set for fast host | ||
| // searching |
There was a problem hiding this comment.
Is this a merge issue? I don't think this should be deleted?
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
|
🎉 🎉 🎉 |
|
🎉 🎉 🎉 It's great. |
|
Woohoo! |
|
🎉 🎉 🎉 |
|
🎉 🎉 🎉 Coooooooool |
|
@nezdolik one question on this - I understand that for new deployments, when all the pods are in slow start mode, all of them receive similar amount based on their host weights - so slow start mode is essentially not useful in that case and mostly would make sense if new pods come in HPA case. Is that correct? |
Correct @ramaraochavali |
|
Thank you |
Signed-off-by: Kateryna Nezdolii nezdolik@spotify.com
Support progressive traffic increase in Envoy, implementation is according to design doc: https://docs.google.com/document/d/1NiG1X0gbfFChjl1aL-EE1hdfYxKErjJ2688wJZaj5a0/edit
Additional Description: Please refer to RFC
Risk Level: Medium
Testing: Done
Docs Changes: Done
Release Notes: Done
Fixes #11050