Skip to content

redis: prefixed routing#5658

Merged
mattklein123 merged 19 commits intoenvoyproxy:masterfrom
maximebedard:redis-prefixed-key-routing
Mar 26, 2019
Merged

redis: prefixed routing#5658
mattklein123 merged 19 commits intoenvoyproxy:masterfrom
maximebedard:redis-prefixed-key-routing

Conversation

@maximebedard
Copy link
Contributor

@maximebedard maximebedard commented Jan 18, 2019

Description:

Adds prefixed routing as described in #5609. Similar to facebook/mcrouter, this PR implements prefix routes that will always choose longest prefix match if there are multiple possible matches, or use a default connection pool when available.

I'm not 100% sure this is the correct approach, but I was thinking that a router object would be owning all of the connection pools to the upstreams and would be able to dispatch the request to the proper connection pool. I've decided to create a router object that has the same public API as a connection pool, but with the following properties:

  • 0..N Routes
  • 0..1 Wildcard route (a.k.a. the default upstream when no matches are found)

I've managed to preserved compatibility with the previous API by using the cluster_name and settings field to create what I've called the wildcard route.

I haven't taken a stance on if we should limit cross connection pool operations on multi-keys operations, maybe this could be added as a setting afterward? I don't have any use case for regex matching yet, but I think this class is extensible enough to add this feature when necessary.

Still very new to the project and to c++ in general, any initial feedback on the design/guidance would be truly appreciated!

What is missing from this PR and still needs to be done:

  • Add router tests
  • Improve api docs
  • Tweak the prefixing algorithm to match mcrouter's implementation
  • Add to public redis-proxy docs as a feature

Risk Level: Medium
Testing: Work in progress.
Docs Changes: Work in progress.
Release Notes: Work in progress.

Fixes #5609

@maximebedard maximebedard changed the title [WIP] redis: prefixed routing redis: prefixed routing Jan 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: case_sensitive = 2;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: = default

@maximebedard maximebedard force-pushed the redis-prefixed-key-routing branch 2 times, most recently from 4e1653a to db5fdf5 Compare January 23, 2019 19:48
@stale
Copy link

stale bot commented Jan 31, 2019

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Jan 31, 2019
@mattklein123 mattklein123 self-assigned this Feb 3, 2019
@mattklein123
Copy link
Member

@danielhochman per offline convo can you take a high level design pass on this and then I can do the coding review after that is done? @HenryYYang feel free to weigh in also.

@mattklein123 mattklein123 removed the stale stalebot believes this issue/PR has not been touched recently label Feb 3, 2019
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/upream/upstream/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if instead we should try to make this more closely match the API of the HTTP router, which would be to include a timeout field directly and instead of using the settings object.

regardless, think it would be good to allow use of a global default from the parent settings field so every route doesn't require its own settings field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's why I opted of using a RouteConfiguration message to be able to set shared sane defaults for all routes. It seemed pretty similar to how the HTTP router does it.

Would you rather use the default redis_proxy.settings for all routes, or have a redis_proxy.route_config.settings be the default? The reason I'm asking this is because I've currently used the redis_proxy.cluster and redis_proxy.settings as the default route used to forward every commands that does not match any prefix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this field used currently?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I've used it yet, but we could potentially use it to push forwarding statistics? I was unsure what I could use it for right now, but noticed that every http routes had names and tried to align as much as I could both APIs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which field are you referring to from the HTTP router? i don't see name here:

Copy link
Contributor Author

@maximebedard maximebedard Feb 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah my bad, I think I've started with VirtualHost at first, then kept the name thinking it could be useful for instrumentation purposes as I didn't think VirtualHost were useful here.

message VirtualHost {

I can remove it, and add if we think it's useful later. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that sounds good.

@maximebedard maximebedard force-pushed the redis-prefixed-key-routing branch from 0898504 to e11cc29 Compare February 6, 2019 01:49
Copy link
Contributor

@danielhochman danielhochman Feb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's move settings to the RouteAction, and also can we make it not required?

@maximebedard maximebedard force-pushed the redis-prefixed-key-routing branch from 4e55b84 to de74d49 Compare February 14, 2019 17:40
@danielhochman
Copy link
Contributor

@mattklein123 this is ready for a pass. API is in a pretty good state. I had some question of how similar we want to make this to the http conn man's Router's API, e.g. having ConnPoolSettings with a nested timeout field on the RouteAction vs. flattening the timeout field onto the RouteAction like http conn man. Let us know what you think.

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. I dropped in a few doc/API level comments to start with. Let's get those squared away and then I will review the code. Thank you!

/wait

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: prefer flipping to case_insensitive and just using a bool type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add documentation/comments on how this field interacts with the top level cluster name and conn pool settings? Is that a required catch all if no route matches? If not, should we consider deprecating that and requiring people to specify a single route entry that matches everything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I currently implemented it was by using the original cluster + settings tuple to create a catch all route. This way it doesn't break backward compatibility, but it makes it a bit harder to understand what it's used for.

I was also thinking that if no routes are defined, the default cluster + settings becomes mandatory. But if some routes are defined, they become optional since I think there's valid scenarios where you wouldn't want any catch all instance and fail loudly.

Any thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I think it would be most clear to just deprecate the top level cluster/settings, and then make it an error if they are set when any routes are set (basically we should require people to be explicit if they are using routes). Then in the future we can just delete the deprecated option and then people just have to specify at least one route which would be a catch-all. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave this implementation a shot in f822dd8, let me know what you think!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you potentially add arch level docs on this feature? That would help review the feature goals, etc. https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/redis

This feature will also need a release note.

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks great. A few more API comments and then I will do the code review. Thank you!

/wait

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can no longer be required, same for settings. We will need an app level check to make sure one or the other is set (until we remove these fields). Also, can you add these two fields to DEPRECATED.md?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having an explicit catch all, can we just require that the routes list above has at least 1 element in it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it as an explicit catch all mostly because the RouteMatch doesn't really make sense for a catch all route. Unless we want to use a sentinel value such as * as a prefix? Or the route could have many match variants.

message Route {
  oneof match {
    Prefix prefix = 1 [(validate.rules).message.required = true, (gogoproto.nullable) = false];
    CatchAll catch_all = 2;
  }

  // Route request to some upstream cluster.
  RouteAction route = 2 [(validate.rules).message.required = true, (gogoproto.nullable) = false];
}

WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I think I would still require a RouteMatch, but inside I would make the match type a oneof, which currently has 2 values, prefix and "any_match", which is a bool that can only be set to true (search for this in the API, we do this elsewhere). This would allow us flexibility in the future to add regex, etc. Then for users that just want to do what they do today, they would do a single route match, with an any_match. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes definitely make sense to have a oneof type to support regex, or any other type of routes. However, since there can only be a 0..1 catch all route, I thought it would make sense to have it as an optional route at the RouteConfiguration level. Otherwise, I'm not sure how to handle if there's multiple any_match routes, do we forward only to the first one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I would just do FIFO matching like we do for HTTP. I think what I have proposed is most consistent with what we do already for HTTP FWIW.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would decide on whether you think you would favor performance (hash lookup, less matching options) or flexibility (FIFO). For Redis, favoring performance seems reasonable here, so I'm fine either way. I would just document how matches work if we are going to go for a hashing system.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about this overnight, and I currently prefer this implementation for a few reasons, let me know if you agree with my reasoning 😄

  1. The semantic for any matches is not used for the path or prefix matching, but for the tap service match predicates. I believe it provides a much more complex boolean logic API than what we really need here.

  2. As for HTTP, a rule with path: / is effectively a catch all. Here, we could do the same with using an any_match parameter, or using a sentinel such as prefix: *, but given that we can only have 0..1 value, I thought it was sane to embed this behavior directly in the protocol and not do any app-level check.

If I were to use an any_match parameter, I think I would change the protocol as such:

message PrefixMatch {
  string prefix = 1 [(validate.rules).string.min_bytes = 1];
  bool case_insensitive = 2;
}

// separate message for catch-all since case_insensitive does not really make sense for this variant.
message CatchAllMatch {
  bool any_match;
}

// TODO: regex match

message Route {
  oneof match {
    PrefixMatch prefix = 1 [(validate.rules).message.required = true, (gogoproto.nullable) = false];
    CatchAllMatch catch_all = 2;
    RegexMatch regex = 3;
  }

  RouteAction route = 4 [(validate.rules).message.required = true, (gogoproto.nullable) = false];
}

Given that the catch-all scenario is sort-of a lack of anything that matches, I think it makes sense to avoid defining a "match" expression for it.

Let me know what you think, I'll try to implement it as such today too to see what the implementation details would look like!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think given the fact that we want to support hash matching and not FIFO, this makes sense. Can you update the docs to make it clear that hashing is being used? This will also make it clear that we cannot support regex in the future (although even in the current implementation I'm not exactly sure how you would handle overlapping prefixes without more work?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current implementation, I'm using an ordered set to stored the prefix routes. Overlapping prefixes will be ignored, and it will look for the longest match first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM, I would just fully document the expected behavior and I will review the code after!

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general looks great. A few more high level questions/comments. Thank you!

/wait

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This config doesn't actually validate this anymore so I think we need an actual error/exception here along with a test that checks it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: = default for consistency below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some comments here about the match semantics? AFAICT this is an iterative search over an ordered set, with longest match wins? Meaning, per our other discussion this is not going to give O(1) performance, it will still be an O(N) match, right? Can we document that fact in the public docs also?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would return ConnPool::Instance& here instead of the pointer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const, same next line (try to do const elsewhere if possible)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a second read through, it strikes me as a little odd that we have a per route/action connection pool. I think this is probably fine, but do we envision potentially having different routes go to the same cluster? Should we consider actually just having a single ConnPoolSettings that is used globally vs. per forward action? Or maybe it should be per cluster in some other map? Thoughts?

I'm also wondering if the code should be structured which maintains the connection pool map independently from the route table? Thoughts?

@maximebedard
Copy link
Contributor Author

@mattklein123 I had to change a few things, I'm not 100% sure if I'm happy with the current implementation though. I'll try to spike on implementing regex based routing, and key migrations just to see if the abstractions makes sense.

I think this is probably fine, but do we envision potentially having different routes go to the same cluster? Should we consider actually just having a single ConnPoolSettings that is used globally vs. per forward action? Or maybe it should be per cluster in some other map?

Regarding this comment, I went ahead with using a map<cluster_name, upstream> since it seems to be the simplest approach. However, if you have multiple upstreams with different connection pool options, the options are discarded for subsequent actions created for this route. If we think theses options are critical, I could either generate a digest of the cluster name + options and store the result in a map instead. Or create another structure in the API that would maintain the list of cluster + options, and the Action would be the object that bind the two together.

@mattklein123
Copy link
Member

Or create another structure in the API that would maintain the list of cluster + options, and the Action would be the object that bind the two together.

I think this is my preference, or just have a single ConnPoolSettings structure that applies to everything. Do you think people will ever actually configure different settings? WDYT?

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this generally LGTM at a design level modulo the remaining comment, so if you don't mind let's fix that and then I can do the in-depth review and we can get this shipped! Thank you for your patience.

/wait

message PrefixRoutes {
message Route {
// String prefix that must match the beginning of the keys. Envoy will always favor the
// longest matches.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/matches/match

}

// List of prefixes used to separate keys from different workloads to different clusters. Envoy
// will always favor the longest matches first. Prefixes are stored in a Trie and are accessed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/matches/match

//
// prefix_routes:
// routes:
// - prefix: "abc" # this prefix will be overwritten by the next entry
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I would go ahead and add some way to detect this duplication and fail. I think it would be a better user experience and make this example unecessary and the description above simpler to understand. WDYT?

Maxime Bedard added 2 commits March 21, 2019 10:01
Signed-off-by: Maxime Bedard <maxime.bedard@shopify.com>
Signed-off-by: Maxime Bedard <maxime.bedard@shopify.com>
Maxime Bedard added 2 commits March 22, 2019 10:08
Signed-off-by: Maxime Bedard <maxime.bedard@shopify.com>
Signed-off-by: Maxime Bedard <maxime.bedard@shopify.com>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience while we nailed down the design. This looks really good to me. I have a bunch of small comments but hopefully can get this shipped soon. @HenryYYang PTAL if you have some time.

/wait

bool remove_prefix = 2;

// Upstream cluster to forward the command to.
string cluster = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this have a min-length validation?


// List of **unique** prefixes used to separate keys from different workloads to different
// clusters. Envoy will always favor the longest match first in case of overlap. A catch-all
// cluster can be used to forward commands to when there is no match. Time complexity of the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/to when/when

// cluster: "cluster_a"
// - prefix: "abc"
// cluster: "cluster_b"
// # next 2 lines will break uniqueness constraint
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably delete this. I think the error message generated would be obvious.

// # cluster: "cluster_c"
// - prefix: "a"
// cluster: "cluster_d"
//
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add further text to the example along the lines of "When using the above routes, the following prefixes would be sent to: ..." (With some examples)

current = current->entries_[c].get();
key++;
}
if (current->value_) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might change the behavior for current callers. Should overwrite_existing instead be a param?

* @param key the key used to find.
* @return the value matching the longest prefix based on the key.
*/
Value findPrefix(const char* key) const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

findLongestPrefix ?


NiceMock<Server::Configuration::MockFactoryContext> context;

EXPECT_THROW(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use EXPECT_THROW_WITH_MESSAGE here. Same elsewhere.

}

if (value != nullptr) {
// TODO: remove_prefix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is implemented?

};

typedef std::shared_ptr<Prefix> PrefixPtr;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove new line between members

* @return false when a value already exists for the given key.
*/
void add(const char* key, Value value) {
bool add(const char* key, Value value) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some explicit unit tests for the functions in this file.

Maxime Bedard added 3 commits March 22, 2019 14:28
Signed-off-by: Maxime Bedard <maxime.bedard@shopify.com>
Signed-off-by: Maxime Bedard <maxime.bedard@shopify.com>
Signed-off-by: Maxime Bedard <maxime.bedard@shopify.com>
Signed-off-by: Maxime Bedard <maxime.bedard@shopify.com>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Awesome work!

@mattklein123
Copy link
Member

@maximebedard sorry needs a master merge. Can you do that we can ship? Thanks!

/wait

Signed-off-by: Maxime Bedard <maxime.bedard@shopify.com>
@mattklein123 mattklein123 merged commit 046e989 into envoyproxy:master Mar 26, 2019
FAYiEKcbD0XFqF2QK2E4viAHg8rMm2VbjYKdjTg added a commit to HenryYYang/envoy that referenced this pull request Mar 27, 2019
The change breaks the existing Redis operation, for example redis-cli -p
[WHATEVER] GET 1 crashes Envoy.

This reverts commit 046e989.

Signed-off-by: Nicolas Flacco <nflacco@lyft.com>
mattklein123 pushed a commit that referenced this pull request Mar 27, 2019
The change breaks the existing Redis operation, for example redis-cli -p
[WHATEVER] GET 1 crashes Envoy.

This reverts commit 046e989.

Signed-off-by: Nicolas Flacco <nflacco@lyft.com>
maximebedard pushed a commit to maximebedard/envoy that referenced this pull request Mar 28, 2019
…y#6401)"

This reverts commit bacd89e.

Signed-off-by: Maxime Bedard <maxime.bedard@shopify.com>
@whatvn
Copy link

whatvn commented Jan 11, 2022

can we also make this prefix work with regular expression? For example ".*prefix-A" "prefix-[1-10]"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Redis prefixed routing

5 participants