redis: prefixed routing#5658
Conversation
4e1653a to
db5fdf5
Compare
|
This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
|
@danielhochman per offline convo can you take a high level design pass on this and then I can do the coding review after that is done? @HenryYYang feel free to weigh in also. |
There was a problem hiding this comment.
nit: s/upream/upstream/
There was a problem hiding this comment.
i wonder if instead we should try to make this more closely match the API of the HTTP router, which would be to include a timeout field directly and instead of using the settings object.
regardless, think it would be good to allow use of a global default from the parent settings field so every route doesn't require its own settings field.
There was a problem hiding this comment.
Yes, that's why I opted of using a RouteConfiguration message to be able to set shared sane defaults for all routes. It seemed pretty similar to how the HTTP router does it.
Would you rather use the default redis_proxy.settings for all routes, or have a redis_proxy.route_config.settings be the default? The reason I'm asking this is because I've currently used the redis_proxy.cluster and redis_proxy.settings as the default route used to forward every commands that does not match any prefix.
There was a problem hiding this comment.
is this field used currently?
There was a problem hiding this comment.
I don't think I've used it yet, but we could potentially use it to push forwarding statistics? I was unsure what I could use it for right now, but noticed that every http routes had names and tried to align as much as I could both APIs.
There was a problem hiding this comment.
which field are you referring to from the HTTP router? i don't see name here:
envoy/api/envoy/api/v2/route/route.proto
Line 151 in a06b9a8
There was a problem hiding this comment.
Ah my bad, I think I've started with VirtualHost at first, then kept the name thinking it could be useful for instrumentation purposes as I didn't think VirtualHost were useful here.
envoy/api/envoy/api/v2/route/route.proto
Line 34 in a06b9a8
I can remove it, and add if we think it's useful later. WDYT?
There was a problem hiding this comment.
yes, that sounds good.
0898504 to
e11cc29
Compare
There was a problem hiding this comment.
let's move settings to the RouteAction, and also can we make it not required?
4e55b84 to
de74d49
Compare
|
@mattklein123 this is ready for a pass. API is in a pretty good state. I had some question of how similar we want to make this to the http conn man's Router's API, e.g. having |
mattklein123
left a comment
There was a problem hiding this comment.
Thanks for working on this. I dropped in a few doc/API level comments to start with. Let's get those squared away and then I will review the code. Thank you!
/wait
There was a problem hiding this comment.
nit: prefer flipping to case_insensitive and just using a bool type.
There was a problem hiding this comment.
Can you add documentation/comments on how this field interacts with the top level cluster name and conn pool settings? Is that a required catch all if no route matches? If not, should we consider deprecating that and requiring people to specify a single route entry that matches everything?
There was a problem hiding this comment.
The way I currently implemented it was by using the original cluster + settings tuple to create a catch all route. This way it doesn't break backward compatibility, but it makes it a bit harder to understand what it's used for.
I was also thinking that if no routes are defined, the default cluster + settings becomes mandatory. But if some routes are defined, they become optional since I think there's valid scenarios where you wouldn't want any catch all instance and fail loudly.
Any thoughts?
There was a problem hiding this comment.
Personally, I think it would be most clear to just deprecate the top level cluster/settings, and then make it an error if they are set when any routes are set (basically we should require people to be explicit if they are using routes). Then in the future we can just delete the deprecated option and then people just have to specify at least one route which would be a catch-all. WDYT?
There was a problem hiding this comment.
I gave this implementation a shot in f822dd8, let me know what you think!
There was a problem hiding this comment.
Can you potentially add arch level docs on this feature? That would help review the feature goals, etc. https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/redis
This feature will also need a release note.
mattklein123
left a comment
There was a problem hiding this comment.
Thanks, this looks great. A few more API comments and then I will do the code review. Thank you!
/wait
There was a problem hiding this comment.
I think this can no longer be required, same for settings. We will need an app level check to make sure one or the other is set (until we remove these fields). Also, can you add these two fields to DEPRECATED.md?
There was a problem hiding this comment.
Instead of having an explicit catch all, can we just require that the routes list above has at least 1 element in it?
There was a problem hiding this comment.
I did it as an explicit catch all mostly because the RouteMatch doesn't really make sense for a catch all route. Unless we want to use a sentinel value such as * as a prefix? Or the route could have many match variants.
message Route {
oneof match {
Prefix prefix = 1 [(validate.rules).message.required = true, (gogoproto.nullable) = false];
CatchAll catch_all = 2;
}
// Route request to some upstream cluster.
RouteAction route = 2 [(validate.rules).message.required = true, (gogoproto.nullable) = false];
}WDYT?
There was a problem hiding this comment.
Personally, I think I would still require a RouteMatch, but inside I would make the match type a oneof, which currently has 2 values, prefix and "any_match", which is a bool that can only be set to true (search for this in the API, we do this elsewhere). This would allow us flexibility in the future to add regex, etc. Then for users that just want to do what they do today, they would do a single route match, with an any_match. WDYT?
There was a problem hiding this comment.
Yes definitely make sense to have a oneof type to support regex, or any other type of routes. However, since there can only be a 0..1 catch all route, I thought it would make sense to have it as an optional route at the RouteConfiguration level. Otherwise, I'm not sure how to handle if there's multiple any_match routes, do we forward only to the first one?
There was a problem hiding this comment.
Yes, I would just do FIFO matching like we do for HTTP. I think what I have proposed is most consistent with what we do already for HTTP FWIW.
There was a problem hiding this comment.
I think I would decide on whether you think you would favor performance (hash lookup, less matching options) or flexibility (FIFO). For Redis, favoring performance seems reasonable here, so I'm fine either way. I would just document how matches work if we are going to go for a hashing system.
There was a problem hiding this comment.
I've been thinking about this overnight, and I currently prefer this implementation for a few reasons, let me know if you agree with my reasoning 😄
-
The semantic for any matches is not used for the path or prefix matching, but for the tap service match predicates. I believe it provides a much more complex boolean logic API than what we really need here.
-
As for HTTP, a rule with
path: /is effectively a catch all. Here, we could do the same with using an any_match parameter, or using a sentinel such asprefix: *, but given that we can only have 0..1 value, I thought it was sane to embed this behavior directly in the protocol and not do any app-level check.
If I were to use an any_match parameter, I think I would change the protocol as such:
message PrefixMatch {
string prefix = 1 [(validate.rules).string.min_bytes = 1];
bool case_insensitive = 2;
}
// separate message for catch-all since case_insensitive does not really make sense for this variant.
message CatchAllMatch {
bool any_match;
}
// TODO: regex match
message Route {
oneof match {
PrefixMatch prefix = 1 [(validate.rules).message.required = true, (gogoproto.nullable) = false];
CatchAllMatch catch_all = 2;
RegexMatch regex = 3;
}
RouteAction route = 4 [(validate.rules).message.required = true, (gogoproto.nullable) = false];
}Given that the catch-all scenario is sort-of a lack of anything that matches, I think it makes sense to avoid defining a "match" expression for it.
Let me know what you think, I'll try to implement it as such today too to see what the implementation details would look like!
There was a problem hiding this comment.
Yeah I think given the fact that we want to support hash matching and not FIFO, this makes sense. Can you update the docs to make it clear that hashing is being used? This will also make it clear that we cannot support regex in the future (although even in the current implementation I'm not exactly sure how you would handle overlapping prefixes without more work?)
There was a problem hiding this comment.
In the current implementation, I'm using an ordered set to stored the prefix routes. Overlapping prefixes will be ignored, and it will look for the longest match first.
There was a problem hiding this comment.
SGTM, I would just fully document the expected behavior and I will review the code after!
mattklein123
left a comment
There was a problem hiding this comment.
In general looks great. A few more high level questions/comments. Thank you!
/wait
There was a problem hiding this comment.
This config doesn't actually validate this anymore so I think we need an actual error/exception here along with a test that checks it?
There was a problem hiding this comment.
nit: = default for consistency below.
There was a problem hiding this comment.
Can you add some comments here about the match semantics? AFAICT this is an iterative search over an ordered set, with longest match wins? Meaning, per our other discussion this is not going to give O(1) performance, it will still be an O(N) match, right? Can we document that fact in the public docs also?
There was a problem hiding this comment.
nit: I would return ConnPool::Instance& here instead of the pointer.
There was a problem hiding this comment.
nit: const, same next line (try to do const elsewhere if possible)
There was a problem hiding this comment.
On a second read through, it strikes me as a little odd that we have a per route/action connection pool. I think this is probably fine, but do we envision potentially having different routes go to the same cluster? Should we consider actually just having a single ConnPoolSettings that is used globally vs. per forward action? Or maybe it should be per cluster in some other map? Thoughts?
I'm also wondering if the code should be structured which maintains the connection pool map independently from the route table? Thoughts?
|
@mattklein123 I had to change a few things, I'm not 100% sure if I'm happy with the current implementation though. I'll try to spike on implementing regex based routing, and key migrations just to see if the abstractions makes sense.
Regarding this comment, I went ahead with using a |
I think this is my preference, or just have a single ConnPoolSettings structure that applies to everything. Do you think people will ever actually configure different settings? WDYT? |
mattklein123
left a comment
There was a problem hiding this comment.
Thanks this generally LGTM at a design level modulo the remaining comment, so if you don't mind let's fix that and then I can do the in-depth review and we can get this shipped! Thank you for your patience.
/wait
| message PrefixRoutes { | ||
| message Route { | ||
| // String prefix that must match the beginning of the keys. Envoy will always favor the | ||
| // longest matches. |
| } | ||
|
|
||
| // List of prefixes used to separate keys from different workloads to different clusters. Envoy | ||
| // will always favor the longest matches first. Prefixes are stored in a Trie and are accessed |
| // | ||
| // prefix_routes: | ||
| // routes: | ||
| // - prefix: "abc" # this prefix will be overwritten by the next entry |
There was a problem hiding this comment.
Yeah I would go ahead and add some way to detect this duplication and fail. I think it would be a better user experience and make this example unecessary and the description above simpler to understand. WDYT?
Signed-off-by: Maxime Bedard <maxime.bedard@shopify.com>
Signed-off-by: Maxime Bedard <maxime.bedard@shopify.com>
Signed-off-by: Maxime Bedard <maxime.bedard@shopify.com>
mattklein123
left a comment
There was a problem hiding this comment.
Thanks for your patience while we nailed down the design. This looks really good to me. I have a bunch of small comments but hopefully can get this shipped soon. @HenryYYang PTAL if you have some time.
/wait
| bool remove_prefix = 2; | ||
|
|
||
| // Upstream cluster to forward the command to. | ||
| string cluster = 3; |
There was a problem hiding this comment.
Should this have a min-length validation?
|
|
||
| // List of **unique** prefixes used to separate keys from different workloads to different | ||
| // clusters. Envoy will always favor the longest match first in case of overlap. A catch-all | ||
| // cluster can be used to forward commands to when there is no match. Time complexity of the |
| // cluster: "cluster_a" | ||
| // - prefix: "abc" | ||
| // cluster: "cluster_b" | ||
| // # next 2 lines will break uniqueness constraint |
There was a problem hiding this comment.
I would probably delete this. I think the error message generated would be obvious.
| // # cluster: "cluster_c" | ||
| // - prefix: "a" | ||
| // cluster: "cluster_d" | ||
| // |
There was a problem hiding this comment.
Can you add further text to the example along the lines of "When using the above routes, the following prefixes would be sent to: ..." (With some examples)
source/common/common/utility.h
Outdated
| current = current->entries_[c].get(); | ||
| key++; | ||
| } | ||
| if (current->value_) { |
There was a problem hiding this comment.
This might change the behavior for current callers. Should overwrite_existing instead be a param?
source/common/common/utility.h
Outdated
| * @param key the key used to find. | ||
| * @return the value matching the longest prefix based on the key. | ||
| */ | ||
| Value findPrefix(const char* key) const { |
|
|
||
| NiceMock<Server::Configuration::MockFactoryContext> context; | ||
|
|
||
| EXPECT_THROW( |
There was a problem hiding this comment.
You can use EXPECT_THROW_WITH_MESSAGE here. Same elsewhere.
| } | ||
|
|
||
| if (value != nullptr) { | ||
| // TODO: remove_prefix |
There was a problem hiding this comment.
I think this is implemented?
| }; | ||
|
|
||
| typedef std::shared_ptr<Prefix> PrefixPtr; | ||
|
|
There was a problem hiding this comment.
nit: remove new line between members
source/common/common/utility.h
Outdated
| * @return false when a value already exists for the given key. | ||
| */ | ||
| void add(const char* key, Value value) { | ||
| bool add(const char* key, Value value) { |
There was a problem hiding this comment.
Please add some explicit unit tests for the functions in this file.
Signed-off-by: Maxime Bedard <maxime.bedard@shopify.com>
Signed-off-by: Maxime Bedard <maxime.bedard@shopify.com>
Signed-off-by: Maxime Bedard <maxime.bedard@shopify.com>
mattklein123
left a comment
There was a problem hiding this comment.
Thank you! Awesome work!
|
@maximebedard sorry needs a master merge. Can you do that we can ship? Thanks! /wait |
Signed-off-by: Maxime Bedard <maxime.bedard@shopify.com>
The change breaks the existing Redis operation, for example redis-cli -p [WHATEVER] GET 1 crashes Envoy. This reverts commit 046e989. Signed-off-by: Nicolas Flacco <nflacco@lyft.com>
|
can we also make this prefix work with regular expression? For example ".*prefix-A" "prefix-[1-10]"? |
Description:
Adds prefixed routing as described in #5609. Similar to facebook/mcrouter, this PR implements prefix routes that will always choose longest prefix match if there are multiple possible matches, or use a default connection pool when available.
I'm not 100% sure this is the correct approach, but I was thinking that a router object would be owning all of the connection pools to the upstreams and would be able to dispatch the request to the proper connection pool. I've decided to create a router object that has the same public API as a connection pool, but with the following properties:
I've managed to preserved compatibility with the previous API by using the
cluster_nameandsettingsfield to create what I've called thewildcard route.I haven't taken a stance on if we should limit cross connection pool operations on multi-keys operations, maybe this could be added as a setting afterward? I don't have any use case for regex matching yet, but I think this class is extensible enough to add this feature when necessary.
Still very new to the project and to c++ in general, any initial feedback on the design/guidance would be truly appreciated!
What is missing from this PR and still needs to be done:
Risk Level: Medium
Testing: Work in progress.
Docs Changes: Work in progress.
Release Notes: Work in progress.
Fixes #5609