Skip to content

thrift: supported setting max requests for per downstream connection#15125

Merged
zuercher merged 9 commits intoenvoyproxy:mainfrom
wbpcode:thrift-connection
Mar 2, 2021
Merged

thrift: supported setting max requests for per downstream connection#15125
zuercher merged 9 commits intoenvoyproxy:mainfrom
wbpcode:thrift-connection

Conversation

@wbpcode
Copy link
Copy Markdown
Member

@wbpcode wbpcode commented Feb 20, 2021

Signed-off-by: wbpcode comems@msn.com

Commit Message: thrift: supported setting max requests for per downstream connection
Additional Description:

Supported setting max requests for per downstream connection. By setting max_requests_per_connection, Envoy can actively disconnect the thrift client after reaching a certain number of requests.
Due to the limitations of the thrift protocol itself, we cannot achieve a clean disconnection.

Check #14560 get more information.

Risk Level: Normal
Testing: Added
Docs Changes: N/A
Release Notes: Added

@repokitteh-read-only
Copy link
Copy Markdown

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/envoy/.
API shepherd assignee is @htuch
CC @envoyproxy/api-watchers: FYI only for changes made to api/envoy/.

🐱

Caused by: #15125 was opened by wbpcode.

see: more, trace.

@zyfjeff
Copy link
Copy Markdown
Member

zyfjeff commented Feb 20, 2021

I feel that this requirement is not universal. Why not do this by removing the listener from the control plane? @wbpcode

@wbpcode
Copy link
Copy Markdown
Member Author

wbpcode commented Feb 22, 2021

I feel that this requirement is not universal. Why not do this by removing the listener from the control plane? @wbpcode

That's also an option. However, if there are multiple dubbo services in the cluster, although some of them have generated the corresponding listener for it, such as 1.2.3.4:10000, there may still be traffic accessing other dubbo services that need to be proxied through 0.0.0.0:10000.

In addition, we have a similar configuration for upstream connections, and I personally don't see any harm in making the same restrictions for downstream connections. 😄

@wbpcode
Copy link
Copy Markdown
Member Author

wbpcode commented Feb 22, 2021

@rgs1 Would you mind helping to give this pr a review or some suggestions? 😃

std::list<ThriftFilters::FilterFactoryCb> filter_factories_;
const bool payload_passthrough_;

uint64_t max_requests_per_connection_ = UINT64_MAX;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

uint64_t max_requests_per_connection_{};

initializing this to zero and skipping the limit check when zero is probably easier (and more common across the code base).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get it.

TimeSource& time_source_;

// The number of requests accumulated on the current connection. A connection is processed by only
// one thread, so there is no need to consider the thread safety of the count.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can drop the 2nd sentence since it's a common assumption that conn managers are per thread.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌


if (parent_.accumulated_requests_ >= parent_.config_.maxRequestsPerConnection()) {
parent_.read_callbacks_->connection().readDisable(true);
parent_.requests_overflow_ = true;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a stat for this and increment it here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get.

EXPECT_EQ(0U, store_.counter("test.response_invalid_type").value());
EXPECT_EQ(1U, store_.counter("test.response_success").value());
EXPECT_EQ(0U, store_.counter("test.response_error").value());
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also test:

  • limit set but not reached
  • limit set, reached, cleared and next request goes through

And also test the suggested stat from above is properly incremented in each case.

Copy link
Copy Markdown
Member

@rgs1 rgs1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! We are going to use this too, so excited to see this feature.

Could you also add a comment about this new feature in docs/root/configuration/listeners/network_filters/thrift_proxy_filter.rst?

cc: @fishcakez to get additional feedback

@htuch
Copy link
Copy Markdown
Member

htuch commented Feb 23, 2021

/lgtm api

wbpcode added 3 commits February 23, 2021 20:35
Signed-off-by: wbpcode <comems@msn.com>
Signed-off-by: wbpcode <comems@msn.com>
Signed-off-by: wbpcode <comems@msn.com>
@wbpcode wbpcode requested a review from rgs1 February 24, 2021 11:04
Signed-off-by: wbpcode <comems@msn.com>
@@ -1,5 +1,6 @@
#pragma once

#include <cstdint>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed anymore i think

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 yes, it is not needed. I just forget to remove it.

TimeSource& time_source_;

// The maximum number of requests remaining to be processed on the current connection.
uint64_t remaining_streams_{};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this the other way around? Having a counter of seen requests and a the limit in a separate var? E.g.:

  uint64_t requests_{};
  uint64_t max_requests_{}; 

If max_requests_ is 0, there are no limits. I rather avoid depending on std::numeric_limits<uint64_t>::max().

Copy link
Copy Markdown
Member Author

@wbpcode wbpcode Feb 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no objection. The reason for choosing current method is to reduce one if judgment. If using this method, we need to determine whether max_requests is 0, and then determine whether the accumulated request value is greater than max_requests.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it should be ok.

for (size_t i = 0; i < 4; i++) {
mock_new_connection();
EXPECT_EQ(5, sendSomeThriftRequest(6));
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this loop needed?

Copy link
Copy Markdown
Member Author

@wbpcode wbpcode Feb 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to simulate the situation of multiple disconnections and observe whether the stat value grows correctly.

Copy link
Copy Markdown
Member

@rgs1 rgs1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates, left a few more comments.

Signed-off-by: wbpcode <comems@msn.com>
}

// Return the number of requests actually sent.
uint32_t sendSomeThriftRequest(uint32_t request_number) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: uint32_t sendRequests(uint32_t count) {

EXPECT_EQ(0U, store_.counter("test.response_exception").value());
EXPECT_EQ(0U, store_.counter("test.response_invalid_type").value());
EXPECT_EQ(50U, store_.counter("test.response_success").value());
EXPECT_EQ(0U, store_.counter("test.response_error").value());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we check here if cx_destroy_local_with_active_rq was incremented? I think it probably is incremented when a close happens after max requests is reached.

Copy link
Copy Markdown
Member

@rgs1 rgs1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thanks! I have one final nit and one question about whether we should check if cx_destroy_local_with_active_rq is incremented after a local connection close. Other than that, LGTM.

(I also added some comments previously and deleted them, since I realized I was wrong)

@rgs1
Copy link
Copy Markdown
Member

rgs1 commented Feb 26, 2021

The linux multiarch failure is transient, happened on other PRs too.

@wbpcode
Copy link
Copy Markdown
Member Author

wbpcode commented Mar 1, 2021

@rgs1
I think cx_destroy_local_with_active_rq will never increase. After we get the complete request (in finalizeRequest), we will increase the accumulated_requests count without waiting for the response to end. When the count is greater than max_requests, the request_overflow flag will be set. No new requests will be accepted after that. After all accepted requests are over, we will actively disconnect.

The advantage of this is that when the connection is disconnected, we don't have to worry about those requests that exceed the limit being reset and introducing idempotent problems. Because those requests that exceed the limit will not be forwarded by the thrift proxy at all.

@rgs1
Copy link
Copy Markdown
Member

rgs1 commented Mar 1, 2021

@rgs1
I think cx_destroy_local_with_active_rq will never increase. After we get the complete request (in finalizeRequest), we will increase the accumulated_requests count without waiting for the response to end. When the count is greater than max_requests, the request_overflow flag will be set. No new requests will be accepted after that. After all accepted requests are over, we will actively disconnect.

The advantage of this is that when the connection is disconnected, we don't have to worry about those requests that exceed the limit being reset and introducing idempotent problems. Because those requests that exceed the limit will not be forwarded by the thrift proxy at all.

Ok makes sense.

@rgs1
Copy link
Copy Markdown
Member

rgs1 commented Mar 1, 2021

You'll want to merge main once #15238 lands.

Copy link
Copy Markdown
Member

@rgs1 rgs1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@wbpcode
Copy link
Copy Markdown
Member Author

wbpcode commented Mar 2, 2021

@rgs1 main merged.

@rgs1
Copy link
Copy Markdown
Member

rgs1 commented Mar 2, 2021

@wbpcode LGTM, thanks!

@zuercher over to you for merging.

@zuercher zuercher merged commit 8a2685e into envoyproxy:main Mar 2, 2021
@wbpcode wbpcode deleted the thrift-connection branch April 6, 2021 07:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants