Skip to content

access_log, router: add subsecond specifier for START_TIME#3269

Merged
htuch merged 53 commits intoenvoyproxy:masterfrom
dio:start-time
Jun 6, 2018
Merged

access_log, router: add subsecond specifier for START_TIME#3269
htuch merged 53 commits intoenvoyproxy:masterfrom
dio:start-time

Conversation

@dio
Copy link
Member

@dio dio commented May 2, 2018

access_log, router: add subsecond specifier for START_TIME

Description:
This change adds %f, %[1-9]f specifier to get subseconds for START_TIME.
As an example, START_TIME(%s%3f) gets a timestamp in milliseconds.

This also adds START_TIME as one of the supported variables in header formatter.

Risk Level: Low, since this is an optional feature.

Testing: unit and manual tests.

Docs Changes:

  • added subsecond specifier for START_TIME for both access_log and router header formatter.
  • added START_TIME as one of the supported variables in header formatter.

Release Notes:

  • added subsecond specifier for START_TIME for both access_log and router header formatter.
  • added START_TIME as one of the supported variables in header formatter.

Fixes #1966
Fixes #2877

dio added 2 commits May 2, 2018 17:43
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
@htuch htuch self-assigned this May 3, 2018
@htuch
Copy link
Member

htuch commented May 4, 2018

@junr03 can you take first pass on this? Thanks.

Copy link
Member

@junr03 junr03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two comments

Http::FilterHeadersStatus Filter::decodeHeaders(Http::HeaderMap& headers, bool end_stream) {
downstream_headers_ = &headers;

if (!config_.start_timestamp_header_.get().empty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add an explanation comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 4c80a6d.

expectResponseTimerCreate();

Http::TestHeaderMapImpl headers;
HttpTestUtility::addDefaultHeaders(headers);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test that the x-request-start header is not present before the call to decodeHeaders

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 3110ce0.

dio added 2 commits May 8, 2018 11:54
#3269 (comment)

Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
…e call to decodeHeaders

#3269 (comment)

Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Copy link
Member

@junr03 junr03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @dio. @htuch can you take a look?

//
// start_timestamp_header: x-request-start
//
string start_timestamp_header = 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse the existing user defined headers mechanism for this? E.g. you could configure x-request-start with value %START_TIMESTAMP%?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, +1. That seems better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@htuch thanks for the hint! Yes, I think that's the right direction (I should went to that direction in the first place, since apparently RequestInfo has that useful .startTime() 😔).

I added the corresponding changes using that approach. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's cleaner, thanks for suggesting!

dio added 4 commits May 9, 2018 09:10
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach looks good. Mostly wonder if we can generalize this even more.

namespace and key(s) are specified as a JSON array of strings. Finally, percent symbols in the
parameters **do not** need to be escaped by doubling them.

%START_TIMESTAMP%
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have START_TIME in access logs. I wonder, could we just use START_TIME and allow a custom format specifier that is effectively milliseconds since epoch? Then we could use the same code in both places. If we can't do that what about START_TIME_SINCE_EPOCH or something? Though I like the former better. Thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattklein123 do you think specifically handling since_epoch_ms in start time formatting makes sense?

i.e. the start time formatting is either:

  1. Using std::put_time's fmt string for %START_TIME(%Y/%m/%dT%H:%M:%S%z %s)%, or
  2. Using fmt::format for %START_TIME(since_epoch_ms)%

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I think for now we probably can go with START_TIME_SINCE_EPOCH for header formatter, since currently the way AccessLogFormatParser build the formatter is a bit isolated to itself. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is somewhat related to #3256. It would be nice to support custom formatters apart from what is currently supported. cc @bplotnick.

I think if we did this we could support time since epoch and some other things and I think this is a better long term direction, though more work. Are you interested in picking up #3256 as well and working on both?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattklein123 sure let me take a look at the possibilities.

dio added 2 commits May 10, 2018 06:58
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

{":method", "POST"}, {"static-header", "old-value"}, {"x-client-ip", "0.0.0.0"}};

NiceMock<Envoy::RequestInfo::MockRequestInfo> request_info;
time_t start_time_epoch = 1522280158;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: const here and below

@dio
Copy link
Member Author

dio commented May 10, 2018

@htuch WDYT of this: #3269 (comment)?

Seems like I need update this PR to fulfil that requirement.

@htuch
Copy link
Member

htuch commented May 10, 2018

@dio sure, if you want to followup on that it'd be nice.

@mattklein123
Copy link
Member

@dio if adding more custom formatters looks too hard we can punt, but I think it's worth investigating just to see how difficult it would be since I think it would be a cleaner solution.

@dio
Copy link
Member Author

dio commented May 11, 2018

@mattklein123 sure. Will give it a try.

dio added 3 commits May 14, 2018 07:21
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
@dio
Copy link
Member Author

dio commented May 14, 2018

@mattklein123 I played around with this a little bit, so far I can have:

  1. "%START_TIME(%since_epoch_ms)%" to get timestamp since epoch in ms (we could have us and ns too).
  2. "%START_TIME(%ms)%" to get the ms component of a start time (we could also extend it to get us and ns component). And yes, it can be combined with the currently strftime supported params.

Do you think this is the right direction?

@mattklein123
Copy link
Member

@dio I think if it looks possible to introduce custom formatters, it's the way to go. It's a much cleaner solution and we can solve multiple issues at once.

@htuch
Copy link
Member

htuch commented May 15, 2018

@dio SGTM. Instread of %ms%, how about following other existing conventions for sub-second granularity, e.g. https://bugs.python.org/issue1982?

@mattklein123
Copy link
Member

@dio as long as @htuch is good with it, let's move forward with the time formatting extensions. Agree we should use existing conventions in other languages if possible. Let us know when this is updated and ready for review. Thank you!!

Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
dio added 2 commits June 1, 2018 06:32
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
@dio
Copy link
Member Author

dio commented Jun 1, 2018

@htuch this is updated. When you have time, PTAL. Thanks!

ASSERT(placeholder_index != std::string::npos);
formatted.replace(placeholder_index, width, subsecond);
for (const auto subsecond : subseconds_) {
// TODO(dio): Infer the length of second from parsing step. Currently, it is defaulted to 10.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably needs to be fixed before merging.

SubsecondConstants::get().PLACEHOLDER.substr(0, width));

const std::string part = new_format_string.substr(previous, matched.position() - previous);
const size_t formatted_length = strftime(&buf[0], buf.size(), part.c_str(), &current_tm);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the part above now looks good, but I don't think you want to do anything that uses the current time representation here (it seems dangerous, since the formatted time at parse may be very different than the formatted time at request time). Here's my understanding:

  1. When we parse, we are looking at the string without any substitutions, e.g. %s-%3f-asdf-%9f.
  2. When we substitute, we replace the %Nf before doing the strftime. So, we don't need to do anything here that is post-strftime.

The reason you need to do this additional complexity right now is due to how the cached time string works. I think what makes sense is to do the offset computation when you regenerate the cached item, in conjunctino with the strftime.

Sorry I missed this the first couple of rounds, that's the crux of it I think.

Copy link
Member Author

@dio dio Jun 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@htuch I decided to have fromTimeAndPrepareSubsecondOffsets to figure out the subsecond offsets of a format string after it is formatted by strftime. Hence I can use it here.

I'm not super happy about it since it is a bit complex. Want to get your input on this. Thanks!

dio added 4 commits June 3, 2018 05:26
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
@dio
Copy link
Member Author

dio commented Jun 3, 2018

@htuch yet another update. Please take a look when you have time. Thanks!

Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time to try and clean this up. I agree that it's not too pretty either way :/

std::string str;
};
// A map is used to keep different formatted format strings at a given second.
std::unordered_map<std::string, const Formatted> formatted;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we slowly leak via this map? E.g. if I replace route configuration multiple times and change the format strings, do we accumulate and never clear entries?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that is true. We need to find a way to detect when to clear this.

Copy link
Member Author

@dio dio Jun 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to clear it at each second as exhibited in 513d4bf.


if (cached_time.formatted.find(format_string_) == cached_time.formatted.end() ||
cached_time.epoch_time_seconds != epoch_time_seconds) {
time_t current_time = std::chrono::system_clock::to_time_t(time);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: const

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

const std::string nanoseconds = fmt::FormatInt(epoch_time_ns.count()).str();
for (size_t i = 0; i < subseconds_.size(); ++i) {
const auto& subsecond = subseconds_.at(i);
const std::string digits = nanoseconds.substr(cached_time.seconds_length, subsecond.width_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be absl::string_view I think to avoid copies.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems I couldn't do this since it hits stack-use-after-scope.

const std::string last_segment = format_string_.substr(step);
strftime(&buf[0], buf.size(), last_segment.c_str(), &current_tm);
absl::StrAppend(&formatted, &buf[0]);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, yeah, this is complicated, but I'm not sure if there's a simpler way. I'm OK with the above if you clean it up and add comments. Alternatively, we could fragment into segments during the initial parse (which I think is what you may have had in an earlier iteration) and just do the append + substitute here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do the pre-computation 96f915e.

dio added 2 commits June 5, 2018 09:30
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
@dio
Copy link
Member Author

dio commented Jun 5, 2018

@htuch I have another set of updates. Now I tried to pre-compute the segments and add clearance for the cached map of formatted. PTAL. 😄

dio added 2 commits June 5, 2018 15:39
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dio thanks for your patience and diligence here, I think what you have now looks great. A bunch of feedback, and I think you should have a number of tests for the parse/format code that explore crazy boundary conditions that might arise.

std::string DateFormatter::fromTime(const SystemTime& time) const {
return fromTime(std::chrono::system_clock::to_time_t(time));
struct CachedTime {
size_t seconds_length;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment would be useful here.

struct Formatted {
std::string str;
SubsecondOffsets subsecond_offsets;
std::chrono::seconds epoch_time_seconds;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you comment these fields and struct?

const std::chrono::seconds epoch_time_seconds =
std::chrono::duration_cast<std::chrono::seconds>(epoch_time_ns);

// Remove all the expired cached items.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could push this loop underneath the if (item == cached_time.formatted.end()) scope, to avoid doing this on the hot path. We only need to GC every now and then, once per second will be fine.

// To capture the segment after the last subsecond pattern of a format string. E.g.
// %3f-this-is-the-last-%s-segment-%Y-until-this.
if (step < new_format_string.size()) {
last_segment_ = new_format_string.substr(step);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you create a dummy zero length subsecond entry to allow this to be treated more uniformly later?

std::array<char, 1024> buf;
std::string formatted;

int32_t previous = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: prefer using signed types where possible, since they describe the allowed range.


class SubsecondConstantValues {
public:
const std::string PLACEHOLDER{"?????????"};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably don't need this to be a constant. You can just do something like std::string(N, '?') later on to generate a string of length N.

: position_(position), width_(width), segment_(segment) {}

const size_t position_;
const size_t width_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments on all these fields please.

dio added 4 commits June 6, 2018 17:21
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
@dio
Copy link
Member Author

dio commented Jun 6, 2018

I think you should have a number of tests for the parse/format code that explore crazy boundary conditions that might arise.

We have:

  1. "%START_TIME(segment1:%s.%3f|segment2:%s.%4f|seg3:%s.%6f|%s-%3f-asdf-%9f|.%7f:segm5:%Y)%" that tests segment with different lengths in between subsecond specifiers

  2. "%START_TIME(%|%%%%|%%%%%f|%s%%%%%3f|%1f%%%%%s)%" that tests if a set of shorter segments with negative offsets matters.

If you have another idea please let me know. 🙂

Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two very minor comments and LGTM.


// This computes and saves offset of each subsecond pattern to correct its position after the
// previous string segment is formatted.
const int32_t offset = formatted_length - subsecond.segment_.size();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be negative?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, e.g. %%%% is formatted as %%.

private:
std::function<std::string(const Envoy::RequestInfo::RequestInfo&)> field_extractor_;
const bool append_;
std::map<std::string, std::vector<AccessLog::FormatterPtr>> start_time_formatters_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unordered_map?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, OK. Why did I put it as a map? 😅.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@htuch updated.

dio added 2 commits June 6, 2018 23:37
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rad, thanks!

@htuch htuch merged commit bc1b39c into envoyproxy:master Jun 6, 2018
@mattklein123
Copy link
Member

Super cool @dio. Awesome work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants