Skip to content

router: allow headers with multiple variables and static data#2409

Merged
zuercher merged 11 commits intoenvoyproxy:masterfrom
turbinelabs:stephan/2250-improved-header-value
Jan 26, 2018
Merged

router: allow headers with multiple variables and static data#2409
zuercher merged 11 commits intoenvoyproxy:masterfrom
turbinelabs:stephan/2250-improved-header-value

Conversation

@zuercher
Copy link
Copy Markdown
Member

Description:
Modifies the parsing of request_headers_to_add and response_headers_to_add to allow multiple variables and to allow static content around variable values. The % delimiters used to denote variables can be escaped by doubling the % sign.

Risk Level: Medium
Modifies existing features in a way that would make old configurations invalid (if they have % signs in them).

Testing: Added unit tests for full coverage of header formatters and parser.

Docs Changes: envoyproxy/data-plane-api#424

Release Notes: added

Fixes: #2250

Signed-off-by: Stephan Zuercher stephan@turbinelabs.io

…fixes envoyproxy#2250)

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
static std::regex params_regex(
R"EOF(^\(\s*\[\s*)EOF" + string_match +
R"EOF(\s*(?:,\s*)EOF" + string_match +
R"EOF(\s*)*\]\s*\))EOF");
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely happy with this part. I contend, however, that's exceedingly difficult to correctly parse certain header values without knowing whether or not a particular character is within the JSON part of the expression. For example:

%UPSTREAM_METADATA(["X", "%%(Y)%%"])%%PROTOCOL%
%UPSTREAM_METADATA(["X", "Y"])%%PROTOCOL%

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, my headache started within about 5s of looking at this. 😉

At the very least, can you add a bunch of concrete examples in comments of what this is doing? I'm guessing that a proper recursive descent parser (or something like that) is what is needed here, but that is beyond me. @jmarantz seems to like this kind of stuff so maybe he has some additional thoughts.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll think about it some more. Maybe I can write a thing that handles the non-parameterized variables and UPSTREAM_METADATA in one place.

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool stuff. Few comments to get started. @jmarantz can you help review this also?

const EnvoyException* cause = nullptr) {
std::string reason;
if (cause != nullptr) {
reason = fmt::format("{}", cause->what());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: You don't really need this format call. You can probably just use the tertiary operator directly inside the function call below.

static std::regex params_regex(
R"EOF(^\(\s*\[\s*)EOF" + string_match +
R"EOF(\s*(?:,\s*)EOF" + string_match +
R"EOF(\s*)*\]\s*\))EOF");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, my headache started within about 5s of looking at this. 😉

At the very least, can you add a bunch of concrete examples in comments of what this is doing? I'm guessing that a proper recursive descent parser (or something like that) is what is needed here, but that is beyond me. @jmarantz seems to like this kind of stuff so maybe he has some additional thoughts.

// parameters it's very difficult to distinguish the sequence ")%" within the params from the
// sequence ")%" at the end of the params (especially given that another %-expression may
// immediately follow).
static std::string string_match = R"EOF("(?:[^"\\[:cntrl:]]|\\["\\bfnrtu])*")EOF";
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const here and next line

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be a static const char[], right? If you want to leave it a string, can you lazy-init it?

From https://github.com/envoyproxy/envoy/blob/master/STYLE.md :

The Google C++ style guide points out that non-PoD static and global variables are forbidden. This includes types such as std::string. We encourage the use of the advice in the C++ FAQ on the static initialization fiasco for how to best handle this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you describe more about the string syntax and how the different json fragments are delimited? A regex seems fundamentally unsuitable for recognizing the end of JSON generally, as it doesn't allow for arbitrarily nested state.

return Envoy::AccessLog::AccessLogFormatUtils::protocolToString(request_info.protocol());
};
} else if (field_name == "CLIENT_IP" || field_name == "DOWNSTREAM_REMOTE_ADDRESS_WITHOUT_PORT") {
#define CLIENT_IP_VAR "%CLIENT_IP%"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use real typed variables. const char[] is probably fine.


// HeaderFormatter::format
const std::string format(const Envoy::RequestInfo::RequestInfo& request_info) const override {
std::ostringstream buf;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably just use a raw string here and append. Any reason to use a string stream?

std::string buf;
buf.reserve(format_view.size());

size_t prev = 0;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get a bunch more comments on this code. I personally find it very hard to read this type of logic quickly and comments can help a lot.

…aderParser

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
@zuercher
Copy link
Copy Markdown
Member Author

So I removed the regular expressions and the need to handle % in two places by parsing the header format with a state machine. In the end HeaderFormatter is largely unchanged from master (modulo some use of absl::string_view to reduce string copying) and the brains of dealing with % and finding the boundaries of a %VAR(["X","Y","Z"])% is purely the responsibility of HeaderParser.

// parameters it's very difficult to distinguish the sequence ")%" within the params from the
// sequence ")%" at the end of the params (especially given that another %-expression may
// immediately follow).
static std::string string_match = R"EOF("(?:[^"\\[:cntrl:]]|\\["\\bfnrtu])*")EOF";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be a static const char[], right? If you want to leave it a string, can you lazy-init it?

From https://github.com/envoyproxy/envoy/blob/master/STYLE.md :

The Google C++ style guide points out that non-PoD static and global variables are forbidden. This includes types such as std::string. We encourage the use of the advice in the C++ FAQ on the static initialization fiasco for how to best handle this.

// parameters it's very difficult to distinguish the sequence ")%" within the params from the
// sequence ")%" at the end of the params (especially given that another %-expression may
// immediately follow).
static std::string string_match = R"EOF("(?:[^"\\[:cntrl:]]|\\["\\bfnrtu])*")EOF";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you describe more about the string syntax and how the different json fragments are delimited? A regex seems fundamentally unsuitable for recognizing the end of JSON generally, as it doesn't allow for arbitrarily nested state.

return nullptr;
}

if (var.compare(0, sizeof(PROTOCOL_VAR) - 1, PROTOCOL_VAR) == 0) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add a STATIC_STRLEN macro to factor out this sizeof(str)-1 pattern?

Moreover, you can use absl::StartsWith(var, PROTOCOL_VAR, STATIC_STRLEN(PROTOCOL_VAR)); (declared in match.h).

ExpectVariableEnd // expect closing % in %VAR(...)%
};

std::string unescape(absl::string_view sv) { return absl::StrJoin(absl::StrSplit(sv, "%%"), "%"); }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth a comment at least why you didn't use absl::StrReplaceAll().

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll switch to that function. It's not on their docs website (that I saw), so I didn't realize it existed.


if (ch == '(') {
// Variable with arguments, search for start of arg array.
state = ParserState::ExpectArray;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting you to bump a depth-count here. Scanning forward a bit, it looks like the algorithm is very sensitive to a particular json structure. I think it might actually be just as easy (and maybe very few if any additional lines of code) to have your scanner track the paren depth for () [] {} and the quoting state for " and '. In a way it might be easier to review as the reviewer would just need to take into account basic lexical nesting and not the schema you are tuned to.

Perhaps this is speculative generality but I also wonder if the json structure you expect might change over time requiring the next person to then update this routine.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The history of using JSON here is that when custom headers only supported the entire header value being custom (vs. what this PR is doing) we thought it would be easier to just use the JSON parser to handle arguments rather than writing an argument parser. In retrospect, I wish I had just handled arguments. FWIW, I'd be willing to toss the JSON business and rewrite this parser to just take quoted arguments (e.g. %UPSTREAM_METADATA("a", "b", "c")%. But if we keep JSON, the upshot is that I don't expect anyone to want anything other than an array. Something more complicated would (IMO) warrant modifying the protobuf definition of HeaderValueOption.

The JSON spec doesn't seem to allow single-quoted strings, so I don't think we need to handle them in any event.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment spawned a bunch of thought from me, but not that much typing. As I'm relatively new to Envoy I still don't have a great sense of what needs to be fast (e.g. runs per request) and what can't be changed (because it's already released and documented).

If this runs per request, I suspect it'd be worth considering dumping JSON parsing and just scan using string_view-related routines. If this only runs at startup, I'm a little less concerned, however I'm also looking at istio/istio#2554 (comment) and #2373 which are all around startup speed. That doesn't mean this would be a problem but I think it's worth understanding it.

The second part of this: is the syntax you need to parse set in stone? If so, and JSON generality is not needed, it'd be nice to drop it in favor of something that could be more easily and quickly parsed.

For the moment I'll continue the review assuming the answers are "the speed does not matter" and "the format cannot be changed".

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parsing these strings only happens only at configuration time.

The syntax needn't be set in stone, but there's also some user friction related to changing this stuff. It looks to me that all this was added after 1.5 and 1.6 isn't out yet, so if we were going to change it, now is the time. @mattklein123 what do you think?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion. This seems fine to me. What's the proposed alternative?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now the parser does just enough parsing to find the start/end of % expressions, but needs to know something about JSON to do it. We could just make the parser smart enough to parse the params and ditch the JSON parser. While we're there we could drop the square brackets. If you want to do this I'll put in another PR.


std::string unescape(absl::string_view sv) { return absl::StrJoin(absl::StrSplit(sv, "%%"), "%"); }

// Implements a state machine to parse custom headers. Each character of the custom header format
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice and readable code in general. I like this better than the regex. Also I think it can potentially be made more general, as described below.

However I think this parser does deserve some rigorous unit tests :)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added enough tests to exercise each case in the switch statement and the branches within them. Is there a particular part you'd like more test cases for?

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
@zuercher
Copy link
Copy Markdown
Member Author

ping @jmarantz

@jmarantz
Copy link
Copy Markdown
Contributor

Sorry for the delay; will look at this today.

Copy link
Copy Markdown
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current state, my main request with this code is for rigorous unit-testing of your new parser.

"UPSTREAM_METADATA([\"namespace\", \"k\", ...]), actual format "
"UPSTREAM_METADATA{}{}",
params, reason);
absl::StrCat(params), reason);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are just trying to convert a string_view to a std::string? Use std::string(params)

throw EnvoyException(formatUpstreamMetadataParseException(params_str));
}

absl::string_view json = params_str;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also write this
absl::string_view json = params_str.substr(1, params_str.size() - 2); // trim parens.

try {
Json::ObjectSharedPtr parsed_params =
Json::Factory::loadFromString(StringUtil::subspan(params_str, 1, params_str.size() - 1));
Json::ObjectSharedPtr parsed_params = Json::Factory::loadFromString(absl::StrCat(json));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::string(json)

} else {
throw EnvoyException(fmt::format("field '{}' not supported as custom header", field_name));
throw EnvoyException(
fmt::format("field '{}' not supported as custom header", absl::StrCat(field_name)));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::string(field_name). Actually it'd be nice if fmt::format could be taught somehow about string_view as this will be a common pattern.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #2457.

} else if (StringUtil::startsWith(field_name.c_str(), "UPSTREAM_METADATA")) {
} else if (field_name.find_first_of("UPSTREAM_METADATA") == 0) {
field_extractor_ =
parseUpstreamMetadataField(field_name.substr(sizeof("UPSTREAM_METADATA") - 1));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have a STATIC_STRLEN macro in common/common/macros.h or similar?

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with small nits, assuming we have full coverage over all of the error handling, which seems like you do. @jmarantz any other comments from you?

class CompoundHeaderFormatter : public HeaderFormatter {
public:
CompoundHeaderFormatter(std::vector<HeaderFormatterPtr>&& formatters, bool append)
: formatters_(std::move(formatters)), append_(append){};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: remove trailing ';'

format));
absl::string_view format(header_value_option.header().value());
if (format.empty()) {
return HeaderFormatterPtr{new PlainHeaderFormatter("", append)};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: std::make_unique

return std::move(formatters[0]);
}

return HeaderFormatterPtr{new CompoundHeaderFormatter(std::move(formatters), append)};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: std::make_unique

@jmarantz
Copy link
Copy Markdown
Contributor

jmarantz commented Jan 26, 2018

Maybe I missed it, but I was still looking for a unit-test for the new parser. Other than that, LGTM.

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
@jmarantz
Copy link
Copy Markdown
Contributor

jmarantz commented Jan 26, 2018

If really you just want an array of optionally quoted strings, I'd just use some kind of token-splitting (maybe a little more advanced than what's in common/common/utility.cc), rather than something JSON-esque, just to avoid having to document, parse, and test a more complex format.

But what you have here looks great if we unit-test it.

@zuercher
Copy link
Copy Markdown
Member Author

@jmarantz
Copy link
Copy Markdown
Contributor

Thanks -- that looks great. I'm mystified why that doesn't show up in https://github.com/envoyproxy/envoy/pull/2409/files though. This is probably me not understanding github well enough. Assuming that is in the PR, LGTM!

@mattklein123
Copy link
Copy Markdown
Member

@jmarantz not all diffs expand by default if they are large. Look towards the bottom of the diff, and you have to manually expand it.

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
@zuercher zuercher merged commit 22c63af into envoyproxy:master Jan 26, 2018
@zuercher zuercher deleted the stephan/2250-improved-header-value branch January 26, 2018 19:55
Shikugawa pushed a commit to Shikugawa/envoy that referenced this pull request Mar 28, 2020
jpsim added a commit that referenced this pull request Nov 28, 2022
Updates the version of Xcode Envoy Mobile is compatible
with to the latest stable version.

Risk Level: Low.
Testing: Local & CI.
Docs Changes: Done.
Release Notes: Done.
Fixes #2100

Signed-off-by: JP Simard <jp@jpsim.com>
jpsim added a commit that referenced this pull request Nov 29, 2022
Updates the version of Xcode Envoy Mobile is compatible
with to the latest stable version.

Risk Level: Low.
Testing: Local & CI.
Docs Changes: Done.
Release Notes: Done.
Fixes #2100

Signed-off-by: JP Simard <jp@jpsim.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants