Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions source/server/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ envoy_cc_library(
"@envoy_api//envoy/config/bootstrap/v3:pkg_cc_proto",
"@envoy_api//envoy/config/metrics/v3:pkg_cc_proto",
"@envoy_api//envoy/config/trace/v3:pkg_cc_proto",
"@envoy_api//envoy/extensions/watchdog/abort_action/v3alpha:pkg_cc_proto",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is going to be built-in, it shouldn't actually be an extension. Can you move it back into the core code? (Even if it is still registered as an extension.)

Copy link
Contributor Author

@KBaichoo KBaichoo Sep 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify a bit more on what you mean by "move it back into the core core"? For example, would I just have this as an extension that'll register on all platforms?

Thanks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I just mean it shouldn't be in the extensions directory IMO as it's no longer an extension. It's always linked. It's not a big deal and you can do as a follow up if you like.

],
)

Expand Down
26 changes: 25 additions & 1 deletion source/server/configuration_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
#include "envoy/config/bootstrap/v3/bootstrap.pb.h"
#include "envoy/config/metrics/v3/stats.pb.h"
#include "envoy/config/trace/v3/http_tracer.pb.h"
#include "envoy/extensions/watchdog/abort_action/v3alpha/abort_action.pb.h"
#include "envoy/network/connection.h"
#include "envoy/runtime/runtime.h"
#include "envoy/server/instance.h"
Expand Down Expand Up @@ -171,7 +172,30 @@ WatchdogImpl::WatchdogImpl(const envoy::config::bootstrap::v3::Watchdog& watchdo
multikill_timeout_ =
std::chrono::milliseconds(PROTOBUF_GET_MS_OR_DEFAULT(watchdog, multikill_timeout, 0));
multikill_threshold_ = PROTOBUF_PERCENT_TO_DOUBLE_OR_DEFAULT(watchdog, multikill_threshold, 0.0);
actions_ = watchdog.actions();
auto actions = watchdog.actions();

// Add abort_action if it's available on the given platform and killing is
// enabled since it's more helpful than the default kill action.
#ifndef WIN32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think this part should be platform agnostic. And it will be up to the implementation of this events to handle cross platform differences.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, yes, but the Abort Action implementation is currently marked to be excluded from Windows since it doesn't support Windows yet.

See the prior PR: #12860 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for missing the review on the initial PR, but did we consider on platforms like Windows to enable the extension but fall back to the basic watchdog full process kill/panic behavior (and log the thread id to be killed and maybe more diagnostic info)?

We could use Bazel platform selection instead of ifdefs to build a small platform specific "thread kill" library and that would be localized to the abort action extension (and allow us to iterate further if we are able to support the extension fully on Windows).

Meta question: Not sure if the philosophy of the project is to just disable a feature/extension rather than offer a degraded experience on a different platform

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sunjayBhatia yes, if you look at this intermediate commit I was originally just having windows print out unimplemented, and the default panic behavior would kill the process.

Regarding the philosophy, I've seen some other extensions that are also just disabled for windows -- some of which might just arise from lack of resources, but ultimately across platforms there are some differences in capabilities (one example being signals).

Any thoughts @envoyproxy/api-shepherds ?

Copy link
Member

@sunjayBhatia sunjayBhatia Sep 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen some other extensions that are also just disabled for windows

Yeah, the difference there is that the other currently disabled extensions have external dependencies that require a decent amount of work to get compiling on Windows, we haven't gotten around to those yet so we disabled them, ultimately we will get these working with parity to other platforms

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK sorry I didn't realize that this is the direction we are going to go. Here is what I would do:

  1. Per my other comment I would move the code back to core, since it's not really an extension anymore.
  2. I would have a platform function to kill a thread.
  3. On Windows just abort/panic the process as it did before. On Linux do what you have now. This should allow the extension to be compiled on all platforms and do something sane.

Does that work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM. Will close this PR while making those changes,

envoy::extensions::watchdog::abort_action::v3alpha::AbortActionConfig abort_config;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question for the reviewers: There's some tension here given that the action AbortAction extension and folks can choose to omit it, but if they do this might break and they'd need to patch it.

Thoughts?

// Wait one second for the aborted thread to abort.
abort_config.mutable_wait_duration()->set_seconds(1);

if (kill_timeout > 0) {
envoy::config::bootstrap::v3::Watchdog::WatchdogAction* abort_action_config = actions.Add();
abort_action_config->set_event(envoy::config::bootstrap::v3::Watchdog::WatchdogAction::KILL);
abort_action_config->mutable_config()->mutable_typed_config()->PackFrom(abort_config);
}

if (multikill_timeout_.count() > 0) {
envoy::config::bootstrap::v3::Watchdog::WatchdogAction* abort_action_config = actions.Add();
abort_action_config->set_event(
envoy::config::bootstrap::v3::Watchdog::WatchdogAction::MULTIKILL);
abort_action_config->mutable_config()->mutable_typed_config()->PackFrom(abort_config);
}
#endif

actions_ = actions;
}

InitialImpl::InitialImpl(const envoy::config::bootstrap::v3::Bootstrap& bootstrap) {
Expand Down
43 changes: 39 additions & 4 deletions test/server/configuration_impl_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -754,8 +754,8 @@ TEST_F(ConfigurationImplTest, KillTimeoutWithoutSkew) {
MainImpl config;
config.initialize(bootstrap, server_, cluster_manager_factory_);

EXPECT_EQ(std::chrono::milliseconds(1000), config.workerWatchdogConfig().killTimeout());
EXPECT_EQ(std::chrono::milliseconds(1000), config.mainThreadWatchdogConfig().killTimeout());
EXPECT_EQ(config.workerWatchdogConfig().killTimeout(), std::chrono::milliseconds(1000));
EXPECT_EQ(config.mainThreadWatchdogConfig().killTimeout(), std::chrono::milliseconds(1000));
}

TEST_F(ConfigurationImplTest, CanSkewsKillTimeout) {
Expand Down Expand Up @@ -793,10 +793,45 @@ TEST_F(ConfigurationImplTest, DoesNotSkewIfKillTimeoutDisabled) {
MainImpl config;
config.initialize(bootstrap, server_, cluster_manager_factory_);

EXPECT_EQ(std::chrono::milliseconds(0), config.mainThreadWatchdogConfig().killTimeout());
EXPECT_EQ(std::chrono::milliseconds(0), config.workerWatchdogConfig().killTimeout());
EXPECT_EQ(config.mainThreadWatchdogConfig().killTimeout(), std::chrono::milliseconds(0));
EXPECT_EQ(config.workerWatchdogConfig().killTimeout(), std::chrono::milliseconds(0));
}

// AbortAction not yet supported on Windows.
#ifndef WIN32
TEST_F(ConfigurationImplTest, ShouldAddsAbortActionIfKillingIsEnabled) {
envoy::config::bootstrap::v3::Bootstrap bootstrap;
MainImpl config;
const std::string kill_json = R"EOF(
{
"watchdog": {
"kill_timeout": "1.0s",
"multikill_timeout" : "1.0s"
},
})EOF";

TestUtility::loadFromJson(kill_json, bootstrap);
config.initialize(bootstrap, server_, cluster_manager_factory_);

// We should have the abort action added to both KILL and MULTIKILL events.
EXPECT_EQ(config.workerWatchdogConfig().actions().size(), 2);
EXPECT_EQ(config.mainThreadWatchdogConfig().actions().size(), 2);
}

TEST_F(ConfigurationImplTest, ShouldNotAddAbortActionIfKillingIsDisabled) {
envoy::config::bootstrap::v3::Bootstrap bootstrap;
MainImpl config;
const std::string killing_disabled_json = R"EOF( { "watchdog": { "miss_timeout": "1.0s" }})EOF";

TestUtility::loadFromJson(killing_disabled_json, bootstrap);
config.initialize(bootstrap, server_, cluster_manager_factory_);

// We should have the abort action added
EXPECT_EQ(config.workerWatchdogConfig().actions().size(), 0);
EXPECT_EQ(config.mainThreadWatchdogConfig().actions().size(), 0);
}
#endif

TEST_F(ConfigurationImplTest, ShouldErrorIfBothWatchdogsAndWatchdogSet) {
const std::string json = R"EOF( { "watchdogs": {}, "watchdog": {}})EOF";

Expand Down