router: allow headers with multiple variables and static data#2409
router: allow headers with multiple variables and static data#2409zuercher merged 11 commits intoenvoyproxy:masterfrom
Conversation
…fixes envoyproxy#2250) Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
| static std::regex params_regex( | ||
| R"EOF(^\(\s*\[\s*)EOF" + string_match + | ||
| R"EOF(\s*(?:,\s*)EOF" + string_match + | ||
| R"EOF(\s*)*\]\s*\))EOF"); |
There was a problem hiding this comment.
I'm not entirely happy with this part. I contend, however, that's exceedingly difficult to correctly parse certain header values without knowing whether or not a particular character is within the JSON part of the expression. For example:
%UPSTREAM_METADATA(["X", "%%(Y)%%"])%%PROTOCOL%
%UPSTREAM_METADATA(["X", "Y"])%%PROTOCOL%
There was a problem hiding this comment.
Yeah, my headache started within about 5s of looking at this. 😉
At the very least, can you add a bunch of concrete examples in comments of what this is doing? I'm guessing that a proper recursive descent parser (or something like that) is what is needed here, but that is beyond me. @jmarantz seems to like this kind of stuff so maybe he has some additional thoughts.
There was a problem hiding this comment.
I'll think about it some more. Maybe I can write a thing that handles the non-parameterized variables and UPSTREAM_METADATA in one place.
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
mattklein123
left a comment
There was a problem hiding this comment.
Cool stuff. Few comments to get started. @jmarantz can you help review this also?
| const EnvoyException* cause = nullptr) { | ||
| std::string reason; | ||
| if (cause != nullptr) { | ||
| reason = fmt::format("{}", cause->what()); |
There was a problem hiding this comment.
nit: You don't really need this format call. You can probably just use the tertiary operator directly inside the function call below.
| static std::regex params_regex( | ||
| R"EOF(^\(\s*\[\s*)EOF" + string_match + | ||
| R"EOF(\s*(?:,\s*)EOF" + string_match + | ||
| R"EOF(\s*)*\]\s*\))EOF"); |
There was a problem hiding this comment.
Yeah, my headache started within about 5s of looking at this. 😉
At the very least, can you add a bunch of concrete examples in comments of what this is doing? I'm guessing that a proper recursive descent parser (or something like that) is what is needed here, but that is beyond me. @jmarantz seems to like this kind of stuff so maybe he has some additional thoughts.
| // parameters it's very difficult to distinguish the sequence ")%" within the params from the | ||
| // sequence ")%" at the end of the params (especially given that another %-expression may | ||
| // immediately follow). | ||
| static std::string string_match = R"EOF("(?:[^"\\[:cntrl:]]|\\["\\bfnrtu])*")EOF"; |
There was a problem hiding this comment.
nit: const here and next line
There was a problem hiding this comment.
This can be a static const char[], right? If you want to leave it a string, can you lazy-init it?
From https://github.com/envoyproxy/envoy/blob/master/STYLE.md :
The Google C++ style guide points out that non-PoD static and global variables are forbidden. This includes types such as std::string. We encourage the use of the advice in the C++ FAQ on the static initialization fiasco for how to best handle this.
There was a problem hiding this comment.
Can you describe more about the string syntax and how the different json fragments are delimited? A regex seems fundamentally unsuitable for recognizing the end of JSON generally, as it doesn't allow for arbitrarily nested state.
| return Envoy::AccessLog::AccessLogFormatUtils::protocolToString(request_info.protocol()); | ||
| }; | ||
| } else if (field_name == "CLIENT_IP" || field_name == "DOWNSTREAM_REMOTE_ADDRESS_WITHOUT_PORT") { | ||
| #define CLIENT_IP_VAR "%CLIENT_IP%" |
There was a problem hiding this comment.
Please use real typed variables. const char[] is probably fine.
|
|
||
| // HeaderFormatter::format | ||
| const std::string format(const Envoy::RequestInfo::RequestInfo& request_info) const override { | ||
| std::ostringstream buf; |
There was a problem hiding this comment.
I would probably just use a raw string here and append. Any reason to use a string stream?
| std::string buf; | ||
| buf.reserve(format_view.size()); | ||
|
|
||
| size_t prev = 0; |
There was a problem hiding this comment.
Can we get a bunch more comments on this code. I personally find it very hard to read this type of logic quickly and comments can help a lot.
…aderParser Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
|
So I removed the regular expressions and the need to handle % in two places by parsing the header format with a state machine. In the end HeaderFormatter is largely unchanged from master (modulo some use of absl::string_view to reduce string copying) and the brains of dealing with % and finding the boundaries of a %VAR(["X","Y","Z"])% is purely the responsibility of HeaderParser. |
| // parameters it's very difficult to distinguish the sequence ")%" within the params from the | ||
| // sequence ")%" at the end of the params (especially given that another %-expression may | ||
| // immediately follow). | ||
| static std::string string_match = R"EOF("(?:[^"\\[:cntrl:]]|\\["\\bfnrtu])*")EOF"; |
There was a problem hiding this comment.
This can be a static const char[], right? If you want to leave it a string, can you lazy-init it?
From https://github.com/envoyproxy/envoy/blob/master/STYLE.md :
The Google C++ style guide points out that non-PoD static and global variables are forbidden. This includes types such as std::string. We encourage the use of the advice in the C++ FAQ on the static initialization fiasco for how to best handle this.
| // parameters it's very difficult to distinguish the sequence ")%" within the params from the | ||
| // sequence ")%" at the end of the params (especially given that another %-expression may | ||
| // immediately follow). | ||
| static std::string string_match = R"EOF("(?:[^"\\[:cntrl:]]|\\["\\bfnrtu])*")EOF"; |
There was a problem hiding this comment.
Can you describe more about the string syntax and how the different json fragments are delimited? A regex seems fundamentally unsuitable for recognizing the end of JSON generally, as it doesn't allow for arbitrarily nested state.
| return nullptr; | ||
| } | ||
|
|
||
| if (var.compare(0, sizeof(PROTOCOL_VAR) - 1, PROTOCOL_VAR) == 0) { |
There was a problem hiding this comment.
Shall we add a STATIC_STRLEN macro to factor out this sizeof(str)-1 pattern?
Moreover, you can use absl::StartsWith(var, PROTOCOL_VAR, STATIC_STRLEN(PROTOCOL_VAR)); (declared in match.h).
| ExpectVariableEnd // expect closing % in %VAR(...)% | ||
| }; | ||
|
|
||
| std::string unescape(absl::string_view sv) { return absl::StrJoin(absl::StrSplit(sv, "%%"), "%"); } |
There was a problem hiding this comment.
It's worth a comment at least why you didn't use absl::StrReplaceAll().
There was a problem hiding this comment.
I'll switch to that function. It's not on their docs website (that I saw), so I didn't realize it existed.
|
|
||
| if (ch == '(') { | ||
| // Variable with arguments, search for start of arg array. | ||
| state = ParserState::ExpectArray; |
There was a problem hiding this comment.
I was expecting you to bump a depth-count here. Scanning forward a bit, it looks like the algorithm is very sensitive to a particular json structure. I think it might actually be just as easy (and maybe very few if any additional lines of code) to have your scanner track the paren depth for () [] {} and the quoting state for " and '. In a way it might be easier to review as the reviewer would just need to take into account basic lexical nesting and not the schema you are tuned to.
Perhaps this is speculative generality but I also wonder if the json structure you expect might change over time requiring the next person to then update this routine.
There was a problem hiding this comment.
The history of using JSON here is that when custom headers only supported the entire header value being custom (vs. what this PR is doing) we thought it would be easier to just use the JSON parser to handle arguments rather than writing an argument parser. In retrospect, I wish I had just handled arguments. FWIW, I'd be willing to toss the JSON business and rewrite this parser to just take quoted arguments (e.g. %UPSTREAM_METADATA("a", "b", "c")%. But if we keep JSON, the upshot is that I don't expect anyone to want anything other than an array. Something more complicated would (IMO) warrant modifying the protobuf definition of HeaderValueOption.
The JSON spec doesn't seem to allow single-quoted strings, so I don't think we need to handle them in any event.
There was a problem hiding this comment.
This comment spawned a bunch of thought from me, but not that much typing. As I'm relatively new to Envoy I still don't have a great sense of what needs to be fast (e.g. runs per request) and what can't be changed (because it's already released and documented).
If this runs per request, I suspect it'd be worth considering dumping JSON parsing and just scan using string_view-related routines. If this only runs at startup, I'm a little less concerned, however I'm also looking at istio/istio#2554 (comment) and #2373 which are all around startup speed. That doesn't mean this would be a problem but I think it's worth understanding it.
The second part of this: is the syntax you need to parse set in stone? If so, and JSON generality is not needed, it'd be nice to drop it in favor of something that could be more easily and quickly parsed.
For the moment I'll continue the review assuming the answers are "the speed does not matter" and "the format cannot be changed".
There was a problem hiding this comment.
Parsing these strings only happens only at configuration time.
The syntax needn't be set in stone, but there's also some user friction related to changing this stuff. It looks to me that all this was added after 1.5 and 1.6 isn't out yet, so if we were going to change it, now is the time. @mattklein123 what do you think?
There was a problem hiding this comment.
I don't have a strong opinion. This seems fine to me. What's the proposed alternative?
There was a problem hiding this comment.
Right now the parser does just enough parsing to find the start/end of % expressions, but needs to know something about JSON to do it. We could just make the parser smart enough to parse the params and ditch the JSON parser. While we're there we could drop the square brackets. If you want to do this I'll put in another PR.
|
|
||
| std::string unescape(absl::string_view sv) { return absl::StrJoin(absl::StrSplit(sv, "%%"), "%"); } | ||
|
|
||
| // Implements a state machine to parse custom headers. Each character of the custom header format |
There was a problem hiding this comment.
very nice and readable code in general. I like this better than the regex. Also I think it can potentially be made more general, as described below.
However I think this parser does deserve some rigorous unit tests :)
There was a problem hiding this comment.
I added enough tests to exercise each case in the switch statement and the branches within them. Is there a particular part you'd like more test cases for?
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
|
ping @jmarantz |
|
Sorry for the delay; will look at this today. |
jmarantz
left a comment
There was a problem hiding this comment.
In the current state, my main request with this code is for rigorous unit-testing of your new parser.
| "UPSTREAM_METADATA([\"namespace\", \"k\", ...]), actual format " | ||
| "UPSTREAM_METADATA{}{}", | ||
| params, reason); | ||
| absl::StrCat(params), reason); |
There was a problem hiding this comment.
You are just trying to convert a string_view to a std::string? Use std::string(params)
| throw EnvoyException(formatUpstreamMetadataParseException(params_str)); | ||
| } | ||
|
|
||
| absl::string_view json = params_str; |
There was a problem hiding this comment.
You could also write this
absl::string_view json = params_str.substr(1, params_str.size() - 2); // trim parens.
| try { | ||
| Json::ObjectSharedPtr parsed_params = | ||
| Json::Factory::loadFromString(StringUtil::subspan(params_str, 1, params_str.size() - 1)); | ||
| Json::ObjectSharedPtr parsed_params = Json::Factory::loadFromString(absl::StrCat(json)); |
| } else { | ||
| throw EnvoyException(fmt::format("field '{}' not supported as custom header", field_name)); | ||
| throw EnvoyException( | ||
| fmt::format("field '{}' not supported as custom header", absl::StrCat(field_name))); |
There was a problem hiding this comment.
std::string(field_name). Actually it'd be nice if fmt::format could be taught somehow about string_view as this will be a common pattern.
| } else if (StringUtil::startsWith(field_name.c_str(), "UPSTREAM_METADATA")) { | ||
| } else if (field_name.find_first_of("UPSTREAM_METADATA") == 0) { | ||
| field_extractor_ = | ||
| parseUpstreamMetadataField(field_name.substr(sizeof("UPSTREAM_METADATA") - 1)); |
There was a problem hiding this comment.
can we have a STATIC_STRLEN macro in common/common/macros.h or similar?
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
mattklein123
left a comment
There was a problem hiding this comment.
LGTM with small nits, assuming we have full coverage over all of the error handling, which seems like you do. @jmarantz any other comments from you?
| class CompoundHeaderFormatter : public HeaderFormatter { | ||
| public: | ||
| CompoundHeaderFormatter(std::vector<HeaderFormatterPtr>&& formatters, bool append) | ||
| : formatters_(std::move(formatters)), append_(append){}; |
There was a problem hiding this comment.
super nit: remove trailing ';'
| format)); | ||
| absl::string_view format(header_value_option.header().value()); | ||
| if (format.empty()) { | ||
| return HeaderFormatterPtr{new PlainHeaderFormatter("", append)}; |
| return std::move(formatters[0]); | ||
| } | ||
|
|
||
| return HeaderFormatterPtr{new CompoundHeaderFormatter(std::move(formatters), append)}; |
|
Maybe I missed it, but I was still looking for a unit-test for the new parser. Other than that, LGTM. |
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
|
If really you just want an array of optionally quoted strings, I'd just use some kind of token-splitting (maybe a little more advanced than what's in common/common/utility.cc), rather than something JSON-esque, just to avoid having to document, parse, and test a more complex format. But what you have here looks great if we unit-test it. |
|
Thanks -- that looks great. I'm mystified why that doesn't show up in https://github.com/envoyproxy/envoy/pull/2409/files though. This is probably me not understanding github well enough. Assuming that is in the PR, LGTM! |
|
@jmarantz not all diffs expand by default if they are large. Look towards the bottom of the diff, and you have to manually expand it. |
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
Updates the version of Xcode Envoy Mobile is compatible with to the latest stable version. Risk Level: Low. Testing: Local & CI. Docs Changes: Done. Release Notes: Done. Fixes #2100 Signed-off-by: JP Simard <jp@jpsim.com>
Updates the version of Xcode Envoy Mobile is compatible with to the latest stable version. Risk Level: Low. Testing: Local & CI. Docs Changes: Done. Release Notes: Done. Fixes #2100 Signed-off-by: JP Simard <jp@jpsim.com>
Description:
Modifies the parsing of
request_headers_to_addandresponse_headers_to_addto allow multiple variables and to allow static content around variable values. The % delimiters used to denote variables can be escaped by doubling the % sign.Risk Level: Medium
Modifies existing features in a way that would make old configurations invalid (if they have % signs in them).
Testing: Added unit tests for full coverage of header formatters and parser.
Docs Changes: envoyproxy/data-plane-api#424
Release Notes: added
Fixes: #2250
Signed-off-by: Stephan Zuercher stephan@turbinelabs.io