Skip to content

[Bandwidth-Limit] Add response trailers for bandwidth delay#18267

Merged
htuch merged 20 commits intoenvoyproxy:mainfrom
zhiyong-gayang:main
Nov 2, 2021
Merged

[Bandwidth-Limit] Add response trailers for bandwidth delay#18267
htuch merged 20 commits intoenvoyproxy:mainfrom
zhiyong-gayang:main

Conversation

@zhiyong-gayang
Copy link
Copy Markdown
Contributor

@zhiyong-gayang zhiyong-gayang commented Sep 27, 2021

Commit Message:

  • Add response trailers for the bandwidth limit filter delays.
  • Add new metric request_enforced and response_enforced.
  • Change the following metrics type from Gauge to Counter (request_incoming_size, response_incoming_size, request_allowed_size, response_allowed_size) to better calculate the network bytes per second.

Additional Description: This is useful when downstream want to understand how much delays that is caused by bandwidth limit filter.

Risk Level: Low
Testing: UT added.
Docs Changes: Updated.
Release Notes: Added.
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Deprecated:]
[Optional API Considerations:]

Signed-off-by: gayang <gayang@microsoft.com>
@repokitteh-read-only
Copy link
Copy Markdown

Hi @garyyang2002, welcome and thank you for your contribution.

We will try to review your Pull Request as quickly as possible.

In the meantime, please take a look at the contribution guidelines if you have not done so already.

🐱

Caused by: #18267 was opened by garyyang2002.

see: more, trace.

@repokitteh-read-only
Copy link
Copy Markdown

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/envoy/.
envoyproxy/api-shepherds assignee is @htuch
CC @envoyproxy/api-watchers: FYI only for changes made to api/envoy/.

🐱

Caused by: #18267 was opened by garyyang2002.

see: more, trace.

Signed-off-by: gayang <gayang@microsoft.com>
Signed-off-by: gayang <gayang@microsoft.com>
Signed-off-by: gayang <gayang@microsoft.com>
Signed-off-by: gayang <gayang@microsoft.com>
Signed-off-by: gayang <gayang@microsoft.com>
@yanavlasov yanavlasov self-assigned this Sep 28, 2021
Copy link
Copy Markdown
Contributor

@yanavlasov yanavlasov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/wait

stats_(generateStats(config.stat_prefix(), scope)),
request_delay_trailer_(config.response_trailer_prefix().empty()
? DefaultRequestDelayTrailer
: Http::LowerCaseString(config.response_trailer_prefix() + "-" +
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

absl::StrCat might be a better option here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strcat seems can't concatenate two const strings, maybe another way I don't know.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See suggestion below

if (response_latency_) {
const auto& config = getConfig();

auto response_duration = response_latency_.get()->elapsed().count();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are planning to make stats write only, except for stat sinks. This code will end up broken in this case. Can a timestamp be used in this case?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get from the stats will give me a consistent delay value for response trail and the stats.
Make stats write only seems a challenge work, can we read the stats here before you have a concrete plan how to remove the stats read reference.

// request: response_trailer_prefix + "-bandwidth-request-delay-ms"
// response: response_trailer_prefix + "-bandwidth-response-delay-ms"
// If EnableMode is DISABLED or REQUEST or delay time = 0, the trailer will not be set.
string response_trailer_prefix = 6;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to make these trailers optional, such that an operator can enable or disable it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense, added property to enable it.

std::unique_ptr<Envoy::Extensions::HttpFilters::Common::StreamRateLimiter> response_limiter_;
Stats::TimespanPtr request_latency_;
Stats::TimespanPtr response_latency_;
uint64_t request_duration_ = 0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you want to use std::chrono::milliseconds to make units explicit?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the source of request_duration_ is uint64, and this variable will be write to response trailer as string, use milliseconds will need additional convert from unit64 to milliseconds, then to string.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The source is CompletableTimespan::ellapsed which returns std::milliseconds. I think you should keep the type around until you need to write it into the trailer.

Signed-off-by: gayang <gayang@microsoft.com>
Signed-off-by: gayang <gayang@microsoft.com>
Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/wait

Signed-off-by: gayang <gayang@microsoft.com>
Signed-off-by: gayang <gayang@microsoft.com>
Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much clearer comments, a bunch of RST nits.
/wait

Signed-off-by: gayang <gayang@microsoft.com>
Signed-off-by: gayang <gayang@microsoft.com>
Signed-off-by: gayang <gayang@microsoft.com>
Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM API modulo nits.

response_incoming_size, GAUGE, Size in bytes of incoming response data to bandwidth limiter
response_allowed_size, GAUGE, Size in bytes of outgoing response data from bandwidth limiter
response_incoming_size, Counter, Size in bytes of incoming response data to bandwidth limiter
response_allowed_size, Counter, Size in bytes of outgoing response data from bandwidth limiter
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean to change from gauge to counter here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Counter seems more reasonable for incoming size and allowed size, which allow us to calculate the incoming and allowed size per second for better monitoring of the bandwidth usage.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not a histogram?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is counter use less memory than histogram, and for this case, counter is good enough.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original change looks buggy. I suspect the original intent was to keep track how much data the bandwidth limit filter has currently in its buffers. But I think the call to decrement the gauge when data is removed from the buffer is missing. This may have happened when the stream_rate_limiter.cc was refactored.

I suggest filing an Issue for the gauge value not being tracked correctly and keeping it as gauge. I do not think there is particular value in the counter, the number is almost meaningless.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, changed back to Gauge, and open the issue to track.
#18703

@htuch
Copy link
Copy Markdown
Member

htuch commented Oct 11, 2021

@yanavlasov can you take this review through implementation review? Thanks.

Signed-off-by: gayang <gayang@microsoft.com>
Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm api

Signed-off-by: gayang <gayang@microsoft.com>
Copy link
Copy Markdown
Contributor

@yanavlasov yanavlasov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/wait

Comment on lines +40 to +41
: Http::LowerCaseString(config.response_trailer_prefix() + "-" +
DefaultResponseDelayTrailer.get())),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
: Http::LowerCaseString(config.response_trailer_prefix() + "-" +
DefaultResponseDelayTrailer.get())),
: Http::LowerCaseString(absl::StrCat(config.response_trailer_prefix(), "-",
DefaultResponseDelayTrailer.get()))),

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, thanks.

stats_(generateStats(config.stat_prefix(), scope)),
request_delay_trailer_(config.response_trailer_prefix().empty()
? DefaultRequestDelayTrailer
: Http::LowerCaseString(config.response_trailer_prefix() + "-" +
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See suggestion below

std::unique_ptr<Envoy::Extensions::HttpFilters::Common::StreamRateLimiter> response_limiter_;
Stats::TimespanPtr request_latency_;
Stats::TimespanPtr response_latency_;
uint64_t request_duration_ = 0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The source is CompletableTimespan::ellapsed which returns std::milliseconds. I think you should keep the type around until you need to write it into the trailer.

// Adds encoded trailers. May only be called in encodeData when end_stream is set to true.
// If upstream has trailers, addEncodedTrailers won't be called
bool trailer_added = false;
if (end_stream) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also check enableResponseTrailers()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, thanks.

const auto* config = dynamic_cast<const FilterConfig*>(route_config.get());
EXPECT_EQ(config->limit(), 10);
EXPECT_EQ(config->fillInterval().count(), 50);
// default trailers
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add tests for the enable_response_trailers config value, please?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added, thanks.

Signed-off-by: gayang <gayang@microsoft.com>
…incoming_size, response_allowed_size back to gauge

Signed-off-by: gayang <gayang@microsoft.com>
Signed-off-by: gayang <gayang@microsoft.com>
Signed-off-by: gayang <gayang@microsoft.com>
Signed-off-by: gayang <gayang@microsoft.com>
@zhiyong-gayang zhiyong-gayang requested a review from htuch November 1, 2021 03:54
Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@htuch htuch merged commit 8b8ece8 into envoyproxy:main Nov 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants