Skip to content

ext_auth: add option to measure timeout when check request is created.#12778

Merged
htuch merged 22 commits intoenvoyproxy:masterfrom
yuval-k:auth-timeout
Sep 21, 2020
Merged

ext_auth: add option to measure timeout when check request is created.#12778
htuch merged 22 commits intoenvoyproxy:masterfrom
yuval-k:auth-timeout

Conversation

@yuval-k
Copy link
Contributor

@yuval-k yuval-k commented Aug 21, 2020

Commit Message:
ext_auth: add option to measure timeout when check request is created. In the current default, the timeout timer starts when the request is active. If the request is temporary pending due to the circuit breaker, the overall timeout may exceed the time out configured.

Additional Description:
Currently timeout is handled by setting the RequestOptions on the async client. The async client starts the timeout timer when the request processing starts.
When circuit breakers to the ext authz cluster are configured, a request may be temporary in pending state (if the number of active requests is above the circuit breaker threshold). this period of time is not accounted for with the current timer, which can impact over-all latency.

To address this, I've added a new config option to http ext_authz filter: measure_timeout_on_check_created. When set, the timeout timer will be started when the check request is created.

Risk Level: Low (opt-in)
Testing: Unit tests added
Docs Changes: Inline proto docs
Release Notes: Release notes added.

Signed-off-by: Yuval Kohavi <yuval.kohavi@gmail.com>
Signed-off-by: Yuval Kohavi <yuval.kohavi@gmail.com>
@yuval-k yuval-k requested a review from dio as a code owner August 21, 2020 23:50
@repokitteh-read-only
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/envoy/.
CC @envoyproxy/api-watchers: FYI only for changes made to api/envoy/.

🐱

Caused by: #12778 was opened by yuval-k.

see: more, trace.

Signed-off-by: Yuval Kohavi <yuval.kohavi@gmail.com>
@yuval-k
Copy link
Contributor Author

yuval-k commented Aug 22, 2020

/retest
//test/common/http/http2:codec_impl_test failed which seems unrelated

@repokitteh-read-only
Copy link

Retrying Azure Pipelines, to retry CircleCI checks, use /retest-circle.
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #12778 (comment) was created by @yuval-k.

see: more, trace.

Signed-off-by: Yuval Kohavi <yuval.kohavi@gmail.com>
Signed-off-by: Yuval Kohavi <yuval.kohavi@gmail.com>
Signed-off-by: Yuval Kohavi <yuval.kohavi@gmail.com>
Signed-off-by: Yuval Kohavi <yuval.kohavi@gmail.com>
Signed-off-by: Yuval Kohavi <yuval.kohavi@gmail.com>
// difference between the two, if requests to the ext authz server become pending due to circuit
// breaking (i.e. by default, the duration while the request is pending is not counted towards
// the configured timeout). This is ignored for google_grpc service type, as it does not use the envoy
// io facilities.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way of approaching this would be to change the existing behavior (under runtime flag guard) to start the timeout from when the request is created, arguably this is the behavior end users expect. We've had a recent discussion with some of the gRPC folks on some other timeouts that fit this pattern (@markdroth @dfawley).

In your proposal, you add an internal timer; if we use this exclusively, this would avoid having to have end users worry about the difference between Envoy and Google gRPC implementations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback @htuch !

i agree that using the runtime is preferable. the only reason i added it as a config option is for backwards compatibility.

are there any other stake holders in the community that you think might object? or should I go ahead and change this?

should the new behavior (internal timer) be the default, with runtime flag to revert? or the other way around?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My vote would be runtime flag, default on, since most folks won't be relying on behavior that occurs during circuit breaker activation. That said, it is kind of hard to reason about the ideal behavior here under circuit breaker overload; currently we queue and drop when the queue is too long, we'll be encouraging more drop due to timeouts. @mattklein123 WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless there are objections - I will adjust the code to perform the new behavior by default, with runtime flag to turn off.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My vote would be runtime flag, default on, since most folks won't be relying on behavior that occurs during circuit breaker activation. That said, it is kind of hard to reason about the ideal behavior here under circuit breaker overload; currently we queue and drop when the queue is too long, we'll be encouraging more drop due to timeouts. @mattklein123 WDYT?

Sorry. I very much missed this ping and I wished I hadn't. FWIW I would have probably figured out how to make this change in a more core way (potentially via a different optional param fed to the async client), mainly so that it would apply to other side calls also. Since I think ratelimit and other services would face the same problem. We can potentially consider doing this in a follow up.

@htuch htuch self-assigned this Aug 24, 2020
Signed-off-by: Yuval Kohavi <yuval.kohavi@gmail.com>
Signed-off-by: Yuval Kohavi <yuval.kohavi@gmail.com>
@yuval-k
Copy link
Contributor Author

yuval-k commented Sep 10, 2020

@htuch, this is ready for another pass

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, a few comments..
/wait

RawAsyncStreamCallbacks& callbacks,
const Http::AsyncClient::StreamOptions& options) override;
// This does not use the worker thread's dispatcher, so we must return nullptr.
Event::Dispatcher* dispatcher() override { return nullptr; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to plumb the dispatcher from the gRPC client. The filter should be able to obtain this from stream decoder callbacks, see

* @return Event::Dispatcher& the thread local dispatcher for allocating timers, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do this by changing the client check (here:

virtual void check(RequestCallbacks& callback,
) and add a dispatcher argument.

I thought it was cleaner to not introduce a dispatcher arg to the check request:

  1. interface remains the same
  2. to my understanding, we still want to use the google client's native timeout? if so, i won't have a way to tell them google and native client apart (i.e. they will both use a dispatcher for timeout)

Would be happy to to change this if you still think it's the better way to go.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. It's fine to add to the check args a dispatcher ref. It could also be done at client creation time, but I think we will make this TLS in the near future, so fine to add to check.
  2. Not sure I fully grok how the Google client's native timeout interacts with the choice of how we plumb dispatcher.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. sounds good
  2. The auth-client doesn't know if it has a google grpc client or native grpc client. if it gets a dispatcher in the check method args, it will always use it for timeout purposes (as it can't tell the grpc clients apart). If an optional dispatcher is returned from the grpc client, the auth-client will only use it if it exists (i.e. will not use it for google grpc, instead it will set the request options timeout).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see on (2). Two options I see:
A. Avoid using Google gRPC client timeout, just do a single local timeout in ext_authz no matter what.
B. Fix Envoy gRPC client to do a timeout consistent with Google gRPC. I'm fine punting this to a TODO for the future. It seems this would reduce confusion systematically though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good! thank you for the feedback

context.runtime(), context.httpContext(), stats_prefix);
Http::FilterFactoryCb callback;

bool internal_timeout = Runtime::runtimeFeatureEnabled(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: const bool and maybe const bool timeout_starts_at_check_creation

client_->cancel();
EXPECT_EQ(timer_destroyed, true);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a full integration test in test/extensions/filters/http/ext_authz/ext_authz_integration_test.cc etc?

Signed-off-by: Yuval Kohavi <yuval.kohavi@gmail.com>
@yuval-k
Copy link
Contributor Author

yuval-k commented Sep 16, 2020

/retest
looks like disk space error

@repokitteh-read-only
Copy link

Retrying Azure Pipelines, to retry CircleCI checks, use /retest-circle.

🐱

Caused by: a #12778 (comment) was created by @yuval-k.

see: more, trace.

Signed-off-by: Yuval Kohavi <yuval.kohavi@gmail.com>
…base class

Signed-off-by: Yuval Kohavi <yuval.kohavi@gmail.com>
Signed-off-by: Yuval Kohavi <yuval.kohavi@gmail.com>
Signed-off-by: Yuval Kohavi <yuval.kohavi@gmail.com>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, neat. A few comments/questions but seems basically ready to ship.

if (timeoutStartsAtCheckCreation()) {
// TODO(yuval-k): We currently use dispatcher based timeout even if the underlying client is
// google gRPC client, which has it's own timeout mechanism. We may want to change that if
// using the google gRPC client timeout is more efficient.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking it's not really going to be an efficiency call, maybe it should really be that one day we will hopefully converge the two implementations of gRPC in Envoy, but that's going to be far off AFAICT. FYI @markdroth

Signed-off-by: Yuval Kohavi <yuval.kohavi@gmail.com>
Signed-off-by: Yuval Kohavi <yuval.kohavi@gmail.com>
Signed-off-by: Yuval Kohavi <yuval.kohavi@gmail.com>
Signed-off-by: Yuval Kohavi <yuval.kohavi@gmail.com>
Signed-off-by: Yuval Kohavi <yuval.kohavi@gmail.com>
@yuval-k
Copy link
Contributor Author

yuval-k commented Sep 17, 2020

i'm seeing weird clang tidy failures:

2020-09-17T18:05:16.6310135Z /usr/lib/gcc/x86_64-linux-gnu/9/include/xmmintrin.h:602:19: error: use of undeclared identifier '__builtin_ia32_cvtsi2ss'; did you mean '__builtin_ia32_cvtsd2ss'? [clang-diagnostic-error]
2020-09-17T18:05:16.6310904Z   return (__m128) __builtin_ia32_cvtsi2ss ((__v4sf) __A, __B);
2020-09-17T18:05:16.6311362Z                   ^~~~~~~~~~~~~~~~~~~~~~~
2020-09-17T18:05:16.6311788Z                   __builtin_ia32_cvtsd2ss
2020-09-17T18:05:16.6312577Z /usr/lib/gcc/x86_64-linux-gnu/9/include/xmmintrin.h:602:19: note: '__builtin_ia32_cvtsd2ss' declared here
2020-09-17T18:05:16.6313074Z 

not sure what I'm doing wrong here :/

full log: https://dev.azure.com/cncf/4684fb3d-0389-4e0b-8251-221942316e06/_apis/build/builds/51528/logs/90

@htuch
Copy link
Member

htuch commented Sep 17, 2020

@yuval-k I think this is happening across CI. CC @lizan and Envoy maintainer on-call @asraa

@yuval-k
Copy link
Contributor Author

yuval-k commented Sep 17, 2020

my usage of EXPECT_LOG_CONTAINS breaks tsan;

tsan error
WARNING: ThreadSanitizer: data race (pid=15)
  Write of size 8 at 0x7fffef749828 by thread T10 (mutexes: read M2779, write M0):
    #0 std::__1::vector, std::__1::allocator >, std::__1::allocator, std::__1::allocator > > >::_ConstructTransaction::~_ConstructTransaction() /opt/llvm/bin/../include/c++/v1/vector:903:19 (ext_authz_integration_test+0x2d7287a)
    #1 void std::__1::vector, std::__1::allocator >, std::__1::allocator, std::__1::allocator > > >::__construct_one_at_end, std::__1::allocator > >(std::__1::basic_string, std::__1::allocator >&&) /opt/llvm/bin/../include/c++/v1/vector:927:3 (ext_authz_integration_test+0x2d7249e)
    #2 std::__1::vector, std::__1::allocator >, std::__1::allocator, std::__1::allocator > > >::push_back(std::__1::basic_string, std::__1::allocator >&&) /opt/llvm/bin/../include/c++/v1/vector:1652:9 (ext_authz_integration_test+0x2d712ae)
    #3 Envoy::LogRecordingSink::log(std::__1::basic_string_view >) /proc/self/cwd/test/test_common/logging.cc:37:13 (ext_authz_integration_test+0x599f723)
    #4 Envoy::Logger::DelegatingLogSink::log(spdlog::details::log_msg const&) /proc/self/cwd/source/common/common/logger.cc:90:12 (ext_authz_integration_test+0x818fde2)
    #5 spdlog::logger::sink_it_(spdlog::details::log_msg const&) /proc/self/cwd/external/com_github_gabime_spdlog/include/spdlog/logger-inl.h:186:23 (ext_authz_integration_test+0x2c8b3ab)
    #6 spdlog::logger::log_it_(spdlog::details::log_msg const&, bool, bool) /proc/self/cwd/external/com_github_gabime_spdlog/include/spdlog/logger-inl.h:170:9 (ext_authz_integration_test+0x2cdb37a)
    #7 void spdlog::logger::log_, unsigned long, unsigned long>(spdlog::source_loc, spdlog::level::level_enum, fmt::v7::basic_string_view const&, unsigned long const&, unsigned long const&) /proc/self/cwd/external/com_github_gabime_spdlog/include/spdlog/logger.h:342:13 (ext_authz_integration_test+0x2e86b18)
    #8 void spdlog::logger::log(spdlog::source_loc, spdlog::level::level_enum, fmt::v7::basic_string_view, unsigned long const&, unsigned long const&) /proc/self/cwd/external/com_github_gabime_spdlog/include/spdlog/logger.h:87:9 (ext_authz_integration_test+0x2e7522a)
    #9 Envoy::Http::Legacy::Http2::ConnectionImpl::onFrameReceived(nghttp2_frame const*) /proc/self/cwd/source/common/http/http2/codec_impl_legacy.cc:656:3 (ext_authz_integration_test+0x5456057)
    #10 Envoy::Http::Legacy::Http2::ConnectionImpl::Http2Callbacks::Http2Callbacks()::$_15::operator()(nghttp2_session*, nghttp2_frame const*, void*) const /proc/self/cwd/source/common/http/http2/codec_impl_legacy.cc:1158:57 (ext_authz_integration_test+0x5463620)
    #11 Envoy::Http::Legacy::Http2::ConnectionImpl::Http2Callbacks::Http2Callbacks()::$_15::__invoke(nghttp2_session*, nghttp2_frame const*, void*) /proc/self/cwd/source/common/http/http2/codec_impl_legacy.cc:1157:19 (ext_authz_integration_test+0x5463598)
    #12 session_call_on_frame_received /b/f/w/external/com_github_nghttp2_nghttp2/lib/nghttp2_session.c:3299:10 (ext_authz_integration_test+0x65127ae)
    #13 nghttp2_session_on_settings_received /b/f/w/external/com_github_nghttp2_nghttp2/lib/nghttp2_session.c:4422:12 (ext_authz_integration_test+0x65137a6)
    #14 session_process_settings_frame /b/f/w/external/com_github_nghttp2_nghttp2/lib/nghttp2_session.c:4588:10 (ext_authz_integration_test+0x652380a)
    #15 nghttp2_session_mem_recv /b/f/w/external/com_github_nghttp2_nghttp2/lib/nghttp2_session.c:6332:12 (ext_authz_integration_test+0x651d088)
    #16 Envoy::Http::Legacy::Http2::ConnectionImpl::innerDispatch(Envoy::Buffer::Instance&) /proc/self/cwd/source/common/http/http2/codec_impl_legacy.cc:574:9 (ext_authz_integration_test+0x54544be)
    #17 Envoy::Http::Legacy::Http2::ServerConnectionImpl::innerDispatch(Envoy::Buffer::Instance&) /proc/self/cwd/source/common/http/http2/codec_impl_legacy.cc:1484:26 (ext_authz_integration_test+0x5461b34)
    #18 Envoy::Http::Legacy::Http2::ServerConnectionImpl::dispatch(Envoy::Buffer::Instance&)::$_25::operator()(Envoy::Buffer::Instance&) const /proc/self/cwd/source/common/http/http2/codec_impl_legacy.cc:1470:60 (ext_authz_integration_test+0x5478d44)
    #19 decltype(std::__1::forward(fp)(std::__1::forward(fp0))) std::__1::__invoke(Envoy::Http::Legacy::Http2::ServerConnectionImpl::dispatch(Envoy::Buffer::Instance&)::$_25&, Envoy::Buffer::Instance&) /opt/llvm/bin/../include/c++/v1/type_traits:3539:1 (ext_authz_integration_test+0x5478c54)
    #20 absl::Status std::__1::__invoke_void_return_wrapper::__call(Envoy::Http::Legacy::Http2::ServerConnectionImpl::dispatch(Envoy::Buffer::Instance&)::$_25&, Envoy::Buffer::Instance&) /opt/llvm/bin/../include/c++/v1/__functional_base:317:16 (ext_authz_integration_test+0x5478b84)
    #21 std::__1::__function::__alloc_func, absl::Status (Envoy::Buffer::Instance&)>::operator()(Envoy::Buffer::Instance&) /opt/llvm/bin/../include/c++/v1/functional:1540:16 (ext_authz_integration_test+0x5478af4)
    #22 std::__1::__function::__func, absl::Status (Envoy::Buffer::Instance&)>::operator()(Envoy::Buffer::Instance&) /opt/llvm/bin/../include/c++/v1/functional:1714:12 (ext_authz_integration_test+0x5476d93)
    #23 std::__1::__function::__value_func::operator()(Envoy::Buffer::Instance&) const /opt/llvm/bin/../include/c++/v1/functional:1867:16 (ext_authz_integration_test+0x4382d5b)
    #24 std::__1::function::operator()(Envoy::Buffer::Instance&) const /opt/llvm/bin/../include/c++/v1/functional:2473:12 (ext_authz_integration_test+0x4382c6f)
    #25 Envoy::Http::Utility::exceptionToStatus(std::__1::function, Envoy::Buffer::Instance&) /proc/self/cwd/source/common/http/utility.cc:41:14 (ext_authz_integration_test+0x5f64aa0)
    #26 Envoy::Http::Legacy::Http2::ServerConnectionImpl::dispatch(Envoy::Buffer::Instance&) /proc/self/cwd/source/common/http/http2/codec_impl_legacy.cc:1469:10 (ext_authz_integration_test+0x54615e2)
    #27 Envoy::Http::ConnectionManagerImpl::onData(Envoy::Buffer::Instance&, bool) /proc/self/cwd/source/common/http/conn_manager_impl.cc:281:35 (ext_authz_integration_test+0x4ec4205)
    #28 Envoy::Network::FilterManagerImpl::onContinueReading(Envoy::Network::FilterManagerImpl::ActiveReadFilter*, Envoy::Network::ReadBufferSource&) /proc/self/cwd/source/common/network/filter_manager_impl.cc:66:48 (ext_authz_integration_test+0x5be5ce8)
    #29 Envoy::Network::FilterManagerImpl::onRead() /proc/self/cwd/source/common/network/filter_manager_impl.cc:76:3 (ext_authz_integration_test+0x5be5fe1)
    #30 Envoy::Network::ConnectionImpl::onRead(unsigned long) /proc/self/cwd/source/common/network/connection_impl.cc:290:19 (ext_authz_integration_test+0x5bba8b2)
    #31 Envoy::Network::ConnectionImpl::onReadReady() /proc/self/cwd/source/common/network/connection_impl.cc:572:5 (ext_authz_integration_test+0x5bc1bba)
    #32 Envoy::Network::ConnectionImpl::onFileEvent(unsigned int) /proc/self/cwd/source/common/network/connection_impl.cc:532:5 (ext_authz_integration_test+0x5bbf38f)
    #33 Envoy::Network::ConnectionImpl::ConnectionImpl(Envoy::Event::Dispatcher&, std::__1::unique_ptr >&&, std::__1::unique_ptr >&&, Envoy::StreamInfo::StreamInfo&, bool)::$_6::operator()(unsigned int) const /proc/self/cwd/source/common/network/connection_impl.cc:70:54 (ext_authz_integration_test+0x5bd6707)
    #34 decltype(std::__1::forward >&&, std::__1::unique_ptr >&&, Envoy::StreamInfo::StreamInfo&, bool)::$_6&>(fp)(std::__1::forward(fp0))) std::__1::__invoke >&&, std::__1::unique_ptr >&&, Envoy::StreamInfo::StreamInfo&, bool)::$_6&, unsigned int>(Envoy::Network::ConnectionImpl::ConnectionImpl(Envoy::Event::Dispatcher&, std::__1::unique_ptr >&&, std::__1::unique_ptr >&&, Envoy::StreamInfo::StreamInfo&, bool)::$_6&, unsigned int&&) /opt/llvm/bin/../include/c++/v1/type_traits:3539:1 (ext_authz_integration_test+0x5bd664c)
    #35 void std::__1::__invoke_void_return_wrapper::__call >&&, std::__1::unique_ptr >&&, Envoy::StreamInfo::StreamInfo&, bool)::$_6&, unsigned int>(Envoy::Network::ConnectionImpl::ConnectionImpl(Envoy::Event::Dispatcher&, std::__1::unique_ptr >&&, std::__1::unique_ptr >&&, Envoy::StreamInfo::StreamInfo&, bool)::$_6&, unsigned int&&) /opt/llvm/bin/../include/c++/v1/__functional_base:348:9 (ext_authz_integration_test+0x5bd657d)
    #36 std::__1::__function::__alloc_func >&&, std::__1::unique_ptr >&&, Envoy::StreamInfo::StreamInfo&, bool)::$_6, std::__1::allocator >&&, std::__1::unique_ptr >&&, Envoy::StreamInfo::StreamInfo&, bool)::$_6>, void (unsigned int)>::operator()(unsigned int&&) /opt/llvm/bin/../include/c++/v1/functional:1540:16 (ext_authz_integration_test+0x5bd64fd)
    #37 std::__1::__function::__func >&&, std::__1::unique_ptr >&&, Envoy::StreamInfo::StreamInfo&, bool)::$_6, std::__1::allocator >&&, std::__1::unique_ptr >&&, Envoy::StreamInfo::StreamInfo&, bool)::$_6>, void (unsigned int)>::operator()(unsigned int&&) /opt/llvm/bin/../include/c++/v1/functional:1714:12 (ext_authz_integration_test+0x5bd47ac)
    #38 std::__1::__function::__value_func::operator()(unsigned int&&) const /opt/llvm/bin/../include/c++/v1/functional:1867:16 (ext_authz_integration_test+0x43c1684)
    #39 std::__1::function::operator()(unsigned int) const /opt/llvm/bin/../include/c++/v1/functional:2473:12 (ext_authz_integration_test+0x43c1578)
    #40 Envoy::Event::FileEventImpl::mergeInjectedEventsAndRunCb(unsigned int) /proc/self/cwd/source/common/event/file_event_impl.cc:133:3 (ext_authz_integration_test+0x5ba3ea9)
    #41 Envoy::Event::FileEventImpl::assignEvents(unsigned int, event_base*)::$_1::operator()(int, short, void*) const /proc/self/cwd/source/common/event/file_event_impl.cc:106:16 (ext_authz_integration_test+0x5ba419d)
    #42 Envoy::Event::FileEventImpl::assignEvents(unsigned int, event_base*)::$_1::__invoke(int, short, void*) /proc/self/cwd/source/common/event/file_event_impl.cc:90:7 (ext_authz_integration_test+0x5ba3f26)
    #43 event_persist_closure /b/f/w/external/com_github_libevent_libevent/event.c:1645:9 (ext_authz_integration_test+0x810c49f)
    #44 event_process_active_single_queue /b/f/w/external/com_github_libevent_libevent/event.c:1704:4 (ext_authz_integration_test+0x810adec)
    #45 event_process_active /b/f/w/external/com_github_libevent_libevent/event.c:1805:9 (ext_authz_integration_test+0x80ff897)
    #46 event_base_loop /b/f/w/external/com_github_libevent_libevent/event.c:2047:12 (ext_authz_integration_test+0x80fd24f)
    #47 Envoy::Event::LibeventScheduler::run(Envoy::Event::Dispatcher::RunType) /proc/self/cwd/source/common/event/libevent_scheduler.cc:60:3 (ext_authz_integration_test+0x64f685b)
    #48 Envoy::Event::DispatcherImpl::run(Envoy::Event::Dispatcher::RunType) /proc/self/cwd/source/common/event/dispatcher_impl.cc:204:19 (ext_authz_integration_test+0x5b8e671)
    #49 Envoy::Server::WorkerImpl::threadRoutine(Envoy::Server::GuardDog&) /proc/self/cwd/source/server/worker_impl.cc:133:16 (ext_authz_integration_test+0x33ac5a5)
    #50 Envoy::Server::WorkerImpl::start(Envoy::Server::GuardDog&)::$_4::operator()() const /proc/self/cwd/source/server/worker_impl.cc:99:38 (ext_authz_integration_test+0x33bac87)
    #51 decltype(std::__1::forward(fp)()) std::__1::__invoke(Envoy::Server::WorkerImpl::start(Envoy::Server::GuardDog&)::$_4&) /opt/llvm/bin/../include/c++/v1/type_traits:3539:1 (ext_authz_integration_test+0x33babc0)
    #52 void std::__1::__invoke_void_return_wrapper::__call(Envoy::Server::WorkerImpl::start(Envoy::Server::GuardDog&)::$_4&) /opt/llvm/bin/../include/c++/v1/__functional_base:348:9 (ext_authz_integration_test+0x33bab20)
    #53 std::__1::__function::__alloc_func, void ()>::operator()() /opt/llvm/bin/../include/c++/v1/functional:1540:16 (ext_authz_integration_test+0x33baac0)
    #54 std::__1::__function::__func, void ()>::operator()() /opt/llvm/bin/../include/c++/v1/functional:1714:12 (ext_authz_integration_test+0x33b8d8f)
    #55 std::__1::__function::__value_func::operator()() const /opt/llvm/bin/../include/c++/v1/functional:1867:16 (ext_authz_integration_test+0x3005896)
    #56 std::__1::function::operator()() const /opt/llvm/bin/../include/c++/v1/functional:2473:12 (ext_authz_integration_test+0x3005528)
    #57 Envoy::Thread::ThreadImplPosix::ThreadImplPosix(std::__1::function, std::__1::optional const&)::'lambda'(void*)::operator()(void*) const /proc/self/cwd/source/common/common/posix/thread_impl.cc:49:11 (ext_authz_integration_test+0x80de94b)
    #58 Envoy::Thread::ThreadImplPosix::ThreadImplPosix(std::__1::function, std::__1::optional const&)::'lambda'(void*)::__invoke(void*) /proc/self/cwd/source/common/common/posix/thread_impl.cc:48:9 (ext_authz_integration_test+0x80de8d8)

Previous read of size 8 at 0x7fffef749828 by main thread:
#0 std::__1::vector<std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > > >::empty() const /opt/llvm/bin/../include/c++/v1/vector:662:41 (ext_authz_integration_test+0x2d7be4b)
#1 Envoy::ExtAuthzGrpcIntegrationTest::expectCheckRequestTimedout(bool) /proc/self/cwd/test/extensions/filters/http/ext_authz/ext_authz_integration_test.cc:364:7 (ext_authz_integration_test+0x2c805bd)
#2 Envoy::ExtAuthzGrpcIntegrationTest_CheckTimesOutFromCheckCreation_Test::TestBody() /proc/self/cwd/test/extensions/filters/http/ext_authz/ext_authz_integration_test.cc:632:3 (ext_authz_integration_test+0x2c766dd)
#3 non-virtual thunk to Envoy::ExtAuthzGrpcIntegrationTest_CheckTimesOutFromCheckCreation_Test::TestBody() /proc/self/cwd/test/extensions/filters/http/ext_authz/ext_authz_integration_test.cc (ext_authz_integration_test+0x2c7673f)
#4 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::)(), char const) /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2433:10 (ext_authz_integration_test+0x82dd5fc)
#5 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::)(), char const) /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2469:14 (ext_authz_integration_test+0x82c1cae)
#6 testing::Test::Run() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2508:5 (ext_authz_integration_test+0x82a70d1)
#7 testing::TestInfo::Run() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2684:11 (ext_authz_integration_test+0x82a7f73)
#8 testing::TestSuite::Run() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2816:28 (ext_authz_integration_test+0x82a89aa)
#9 testing::internal::UnitTestImpl::RunAllTests() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:5338:44 (ext_authz_integration_test+0x82b814d)
#10 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::)(), char const) /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2433:10 (ext_authz_integration_test+0x82e4b7c)
#11 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::)(), char const) /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2469:14 (ext_authz_integration_test+0x82c59be)
#12 testing::UnitTest::Run() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:4925:10 (ext_authz_integration_test+0x82b7a0b)
#13 RUN_ALL_TESTS() /proc/self/cwd/external/com_google_googletest/googletest/include/gtest/gtest.h:2473:46 (ext_authz_integration_test+0x5ab1a47)
#14 Envoy::TestRunner::RunTests(int, char**) /proc/self/cwd/test/test_runner.cc:155:10 (ext_authz_integration_test+0x5ab0d03)
#15 main /proc/self/cwd/test/main.cc:34:10 (ext_authz_integration_test+0x5aae784)

Location is stack of main thread.

I'm trying to see how to fix this. if anyone else has an idea - feel free to send it my way

@htuch
Copy link
Member

htuch commented Sep 17, 2020

@yuval-k not sure, but maybe the stats approach wouldn't be too much work and be more reliable here.

Signed-off-by: Yuval Kohavi <yuval.kohavi@gmail.com>
@yuval-k
Copy link
Contributor Author

yuval-k commented Sep 17, 2020

adding the timeout stat was a bit more involved than I thought, but i think it's a good idea regardless; thanks @htuch

Signed-off-by: Yuval Kohavi <yuval.kohavi@gmail.com>
@yuval-k
Copy link
Contributor Author

yuval-k commented Sep 18, 2020

/retest
"Upload fuzz_coverage Report to GCS" failed - I assume this is a flake

@repokitteh-read-only
Copy link

Retrying Azure Pipelines, to retry CircleCI checks, use /retest-circle.
Cannot retry non-completed check: envoy-presubmit (Linux multi-arch docker), please wait.

🐱

Caused by: a #12778 (comment) was created by @yuval-k.

see: more, trace.

@asraa
Copy link
Contributor

asraa commented Sep 18, 2020

Yep, tihs is a flake from catching bugs and failing a test run with the bug, I'm on it now
#13171

@yuval-k
Copy link
Contributor Author

yuval-k commented Sep 18, 2020

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines, to retry CircleCI checks, use /retest-circle.
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #12778 (comment) was created by @yuval-k.

see: more, trace.

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! Thanks for iterating, this looks really solid.

@htuch htuch merged commit dfa85e8 into envoyproxy:master Sep 21, 2020
yuval-k added a commit to yuval-k/envoy that referenced this pull request Nov 12, 2020
… created. (envoyproxy#12778)"

This reverts commit dfa85e8.

Signed-off-by: Yuval Kohavi <yuval.kohavi@gmail.com>
yuval-k added a commit to yuval-k/envoy that referenced this pull request Nov 23, 2020
… created. (envoyproxy#12778)"

This reverts commit dfa85e8.

Signed-off-by: Yuval Kohavi <yuval.kohavi@gmail.com>
mattklein123 pushed a commit that referenced this pull request Nov 24, 2020
… created. (#12778)" (#14152)

This reverts commit dfa85e8.

Signed-off-by: Yuval Kohavi <yuval.kohavi@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants