access_log, router: add subsecond specifier for START_TIME#3269
access_log, router: add subsecond specifier for START_TIME#3269htuch merged 53 commits intoenvoyproxy:masterfrom dio:start-time
Conversation
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
|
@junr03 can you take first pass on this? Thanks. |
source/common/router/router.cc
Outdated
| Http::FilterHeadersStatus Filter::decodeHeaders(Http::HeaderMap& headers, bool end_stream) { | ||
| downstream_headers_ = &headers; | ||
|
|
||
| if (!config_.start_timestamp_header_.get().empty()) { |
test/common/router/router_test.cc
Outdated
| expectResponseTimerCreate(); | ||
|
|
||
| Http::TestHeaderMapImpl headers; | ||
| HttpTestUtility::addDefaultHeaders(headers); |
There was a problem hiding this comment.
test that the x-request-start header is not present before the call to decodeHeaders
#3269 (comment) Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
…e call to decodeHeaders #3269 (comment) Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
| // | ||
| // start_timestamp_header: x-request-start | ||
| // | ||
| string start_timestamp_header = 4; |
There was a problem hiding this comment.
Can we reuse the existing user defined headers mechanism for this? E.g. you could configure x-request-start with value %START_TIMESTAMP%?
There was a problem hiding this comment.
Oh yes, +1. That seems better.
There was a problem hiding this comment.
@htuch thanks for the hint! Yes, I think that's the right direction (I should went to that direction in the first place, since apparently RequestInfo has that useful .startTime() 😔).
I added the corresponding changes using that approach. Thanks!
There was a problem hiding this comment.
That's cleaner, thanks for suggesting!
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
mattklein123
left a comment
There was a problem hiding this comment.
Approach looks good. Mostly wonder if we can generalize this even more.
| namespace and key(s) are specified as a JSON array of strings. Finally, percent symbols in the | ||
| parameters **do not** need to be escaped by doubling them. | ||
|
|
||
| %START_TIMESTAMP% |
There was a problem hiding this comment.
We already have START_TIME in access logs. I wonder, could we just use START_TIME and allow a custom format specifier that is effectively milliseconds since epoch? Then we could use the same code in both places. If we can't do that what about START_TIME_SINCE_EPOCH or something? Though I like the former better. Thoughts?
There was a problem hiding this comment.
@mattklein123 do you think specifically handling since_epoch_ms in start time formatting makes sense?
i.e. the start time formatting is either:
- Using
std::put_time'sfmtstring for%START_TIME(%Y/%m/%dT%H:%M:%S%z %s)%, or - Using
fmt::formatfor%START_TIME(since_epoch_ms)%
There was a problem hiding this comment.
However, I think for now we probably can go with START_TIME_SINCE_EPOCH for header formatter, since currently the way AccessLogFormatParser build the formatter is a bit isolated to itself. WDYT?
There was a problem hiding this comment.
This is somewhat related to #3256. It would be nice to support custom formatters apart from what is currently supported. cc @bplotnick.
I think if we did this we could support time since epoch and some other things and I think this is a better long term direction, though more work. Are you interested in picking up #3256 as well and working on both?
There was a problem hiding this comment.
@mattklein123 sure let me take a look at the possibilities.
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
| {":method", "POST"}, {"static-header", "old-value"}, {"x-client-ip", "0.0.0.0"}}; | ||
|
|
||
| NiceMock<Envoy::RequestInfo::MockRequestInfo> request_info; | ||
| time_t start_time_epoch = 1522280158; |
|
@htuch WDYT of this: #3269 (comment)? Seems like I need update this PR to fulfil that requirement. |
|
@dio sure, if you want to followup on that it'd be nice. |
|
@dio if adding more custom formatters looks too hard we can punt, but I think it's worth investigating just to see how difficult it would be since I think it would be a cleaner solution. |
|
@mattklein123 sure. Will give it a try. |
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
|
@mattklein123 I played around with this a little bit, so far I can have:
Do you think this is the right direction? |
|
@dio I think if it looks possible to introduce custom formatters, it's the way to go. It's a much cleaner solution and we can solve multiple issues at once. |
|
@dio SGTM. Instread of |
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
|
@htuch this is updated. When you have time, PTAL. Thanks! |
source/common/common/utility.cc
Outdated
| ASSERT(placeholder_index != std::string::npos); | ||
| formatted.replace(placeholder_index, width, subsecond); | ||
| for (const auto subsecond : subseconds_) { | ||
| // TODO(dio): Infer the length of second from parsing step. Currently, it is defaulted to 10. |
There was a problem hiding this comment.
Probably needs to be fixed before merging.
source/common/common/utility.cc
Outdated
| SubsecondConstants::get().PLACEHOLDER.substr(0, width)); | ||
|
|
||
| const std::string part = new_format_string.substr(previous, matched.position() - previous); | ||
| const size_t formatted_length = strftime(&buf[0], buf.size(), part.c_str(), ¤t_tm); |
There was a problem hiding this comment.
I think the part above now looks good, but I don't think you want to do anything that uses the current time representation here (it seems dangerous, since the formatted time at parse may be very different than the formatted time at request time). Here's my understanding:
- When we parse, we are looking at the string without any substitutions, e.g.
%s-%3f-asdf-%9f. - When we substitute, we replace the
%Nfbefore doing thestrftime. So, we don't need to do anything here that is post-strftime.
The reason you need to do this additional complexity right now is due to how the cached time string works. I think what makes sense is to do the offset computation when you regenerate the cached item, in conjunctino with the strftime.
Sorry I missed this the first couple of rounds, that's the crux of it I think.
There was a problem hiding this comment.
@htuch I decided to have fromTimeAndPrepareSubsecondOffsets to figure out the subsecond offsets of a format string after it is formatted by strftime. Hence I can use it here.
I'm not super happy about it since it is a bit complex. Want to get your input on this. Thanks!
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
|
@htuch yet another update. Please take a look when you have time. Thanks! |
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
htuch
left a comment
There was a problem hiding this comment.
Thanks for taking the time to try and clean this up. I agree that it's not too pretty either way :/
| std::string str; | ||
| }; | ||
| // A map is used to keep different formatted format strings at a given second. | ||
| std::unordered_map<std::string, const Formatted> formatted; |
There was a problem hiding this comment.
Do we slowly leak via this map? E.g. if I replace route configuration multiple times and change the format strings, do we accumulate and never clear entries?
There was a problem hiding this comment.
It seems that is true. We need to find a way to detect when to clear this.
There was a problem hiding this comment.
I tried to clear it at each second as exhibited in 513d4bf.
source/common/common/utility.cc
Outdated
|
|
||
| if (cached_time.formatted.find(format_string_) == cached_time.formatted.end() || | ||
| cached_time.epoch_time_seconds != epoch_time_seconds) { | ||
| time_t current_time = std::chrono::system_clock::to_time_t(time); |
source/common/common/utility.cc
Outdated
| const std::string nanoseconds = fmt::FormatInt(epoch_time_ns.count()).str(); | ||
| for (size_t i = 0; i < subseconds_.size(); ++i) { | ||
| const auto& subsecond = subseconds_.at(i); | ||
| const std::string digits = nanoseconds.substr(cached_time.seconds_length, subsecond.width_); |
There was a problem hiding this comment.
This should be absl::string_view I think to avoid copies.
There was a problem hiding this comment.
It seems I couldn't do this since it hits stack-use-after-scope.
source/common/common/utility.cc
Outdated
| const std::string last_segment = format_string_.substr(step); | ||
| strftime(&buf[0], buf.size(), last_segment.c_str(), ¤t_tm); | ||
| absl::StrAppend(&formatted, &buf[0]); | ||
| } |
There was a problem hiding this comment.
Wow, yeah, this is complicated, but I'm not sure if there's a simpler way. I'm OK with the above if you clean it up and add comments. Alternatively, we could fragment into segments during the initial parse (which I think is what you may have had in an earlier iteration) and just do the append + substitute here.
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
|
@htuch I have another set of updates. Now I tried to pre-compute the segments and add clearance for the cached map of formatted. PTAL. 😄 |
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
htuch
left a comment
There was a problem hiding this comment.
@dio thanks for your patience and diligence here, I think what you have now looks great. A bunch of feedback, and I think you should have a number of tests for the parse/format code that explore crazy boundary conditions that might arise.
| std::string DateFormatter::fromTime(const SystemTime& time) const { | ||
| return fromTime(std::chrono::system_clock::to_time_t(time)); | ||
| struct CachedTime { | ||
| size_t seconds_length; |
| struct Formatted { | ||
| std::string str; | ||
| SubsecondOffsets subsecond_offsets; | ||
| std::chrono::seconds epoch_time_seconds; |
There was a problem hiding this comment.
Can you comment these fields and struct?
source/common/common/utility.cc
Outdated
| const std::chrono::seconds epoch_time_seconds = | ||
| std::chrono::duration_cast<std::chrono::seconds>(epoch_time_ns); | ||
|
|
||
| // Remove all the expired cached items. |
There was a problem hiding this comment.
You could push this loop underneath the if (item == cached_time.formatted.end()) scope, to avoid doing this on the hot path. We only need to GC every now and then, once per second will be fine.
source/common/common/utility.cc
Outdated
| // To capture the segment after the last subsecond pattern of a format string. E.g. | ||
| // %3f-this-is-the-last-%s-segment-%Y-until-this. | ||
| if (step < new_format_string.size()) { | ||
| last_segment_ = new_format_string.substr(step); |
There was a problem hiding this comment.
Can you create a dummy zero length subsecond entry to allow this to be treated more uniformly later?
source/common/common/utility.cc
Outdated
| std::array<char, 1024> buf; | ||
| std::string formatted; | ||
|
|
||
| int32_t previous = 0; |
There was a problem hiding this comment.
Nit: prefer using signed types where possible, since they describe the allowed range.
source/common/common/utility.cc
Outdated
|
|
||
| class SubsecondConstantValues { | ||
| public: | ||
| const std::string PLACEHOLDER{"?????????"}; |
There was a problem hiding this comment.
You probably don't need this to be a constant. You can just do something like std::string(N, '?') later on to generate a string of length N.
| : position_(position), width_(width), segment_(segment) {} | ||
|
|
||
| const size_t position_; | ||
| const size_t width_; |
There was a problem hiding this comment.
Comments on all these fields please.
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
We have:
If you have another idea please let me know. 🙂 |
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
htuch
left a comment
There was a problem hiding this comment.
Two very minor comments and LGTM.
|
|
||
| // This computes and saves offset of each subsecond pattern to correct its position after the | ||
| // previous string segment is formatted. | ||
| const int32_t offset = formatted_length - subsecond.segment_.size(); |
There was a problem hiding this comment.
Yes, e.g. %%%% is formatted as %%.
| private: | ||
| std::function<std::string(const Envoy::RequestInfo::RequestInfo&)> field_extractor_; | ||
| const bool append_; | ||
| std::map<std::string, std::vector<AccessLog::FormatterPtr>> start_time_formatters_; |
There was a problem hiding this comment.
Ah, OK. Why did I put it as a map? 😅.
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
|
Super cool @dio. Awesome work! |
access_log, router: add subsecond specifier for
START_TIMEDescription:
This change adds
%f,%[1-9]fspecifier to get subseconds forSTART_TIME.As an example,
START_TIME(%s%3f)gets a timestamp in milliseconds.This also adds
START_TIMEas one of the supported variables in header formatter.Risk Level: Low, since this is an optional feature.
Testing: unit and manual tests.
Docs Changes:
START_TIMEfor both access_log and router header formatter.START_TIMEas one of the supported variables in header formatter.Release Notes:
START_TIMEfor both access_log and router header formatter.START_TIMEas one of the supported variables in header formatter.Fixes #1966
Fixes #2877