Skip to content

Replace "backward" stack trace with absl::Stacktrace.#5723

Merged
dnoe merged 11 commits intoenvoyproxy:masterfrom
dnoe:replace-backward
Jan 31, 2019
Merged

Replace "backward" stack trace with absl::Stacktrace.#5723
dnoe merged 11 commits intoenvoyproxy:masterfrom
dnoe:replace-backward

Conversation

@dnoe
Copy link
Contributor

@dnoe dnoe commented Jan 25, 2019

Description:

Use the absl::Stacktrace library along with absl::Symbolize to generate
symbolized stack traces without requiring any external tool or post
processing.

Remove the external dependency on "backward" library.

This deprecates stack_decode.py, which is removed. References to the
stack_decode.py process are removed from README.

Risk Level: Low (may break tooling that interpreted stack traces in logs)
Testing: bazel test //test/... , manually examined symbolized backtrace in killed binary.
Docs Changes: Changed bazel/README.md
Release Notes: The stack_decode.py script is removed, stack traces should include decoded names without it when the binary has appropriate debugging symbols.

Signed-off-by: Dan Noé dpn@google.com

Use the absl::Stacktrace library along with absl::Symbolize to generate
symbolized stack traces without requiring any external tool or post
processing.

Remove the external dependency on "backward" library.

This deprecates stack_decode.py, which is removed. References to the
stack_decode.py process are removed from README.

Signed-off-by: Dan Noé <dpn@google.com>
jmarantz
jmarantz previously approved these changes Jan 28, 2019
@jmarantz
Copy link
Contributor

Do you know what's up with CI? Maybe a local test with asan or tsan might help? I didn't see anything in the log.

@htuch
Copy link
Member

htuch commented Jan 28, 2019

/retest

@repokitteh-read-only
Copy link

🔨 rebuilding ci/circleci: asan (failed build)
🔨 rebuilding ci/circleci: tsan (failed build)
🔨 rebuilding ci/circleci: release (failed build)
🔨 rebuilding ci/circleci: mac (failed build)

🐱

Caused by: a #5723 (comment) was created by @htuch.

see: more, trace.

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, do you have a before/after stack trace to compare?

private:
static const int MAX_STACK_DEPTH = 64;
backward::StackTrace stack_trace_;
static constexpr int MAX_STACK_DEPTH = 64;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: prefer moving to the newer Envoy convention of MaxStackDepth while mucking around here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, done.

@dnoe
Copy link
Contributor Author

dnoe commented Jan 28, 2019

Investigating the CI failures.

dnoe added 2 commits January 28, 2019 09:36
stack_depth_ wasn't zero initialized, it should be to make attempts to
log a trace without first capturing a trace safe.

Signed-off-by: Dan Noé <dpn@google.com>
@dnoe
Copy link
Contributor Author

dnoe commented Jan 28, 2019

Here's what they look like out of the box now:

[2019-01-28 14:55:46.142][15][critical][backtrace] [bazel-out/k8-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:67] Backtrace:
[2019-01-28 14:55:46.146][15][critical][backtrace] [bazel-out/k8-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:73] #0: Envoy::Backward_Basic_Test::TestBody()
[2019-01-28 14:55:46.150][15][critical][backtrace] [bazel-out/k8-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:73] #1: testing::internal::HandleSehExceptionsInMethodIfSupported<>()
[2019-01-28 14:55:46.154][15][critical][backtrace] [bazel-out/k8-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:73] #2: testing::internal::HandleExceptionsInMethodIfSupported<>()
[2019-01-28 14:55:46.158][15][critical][backtrace] [bazel-out/k8-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:73] #3: testing::Test::Run()
[2019-01-28 14:55:46.162][15][critical][backtrace] [bazel-out/k8-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:73] #4: testing::TestInfo::Run()
[2019-01-28 14:55:46.166][15][critical][backtrace] [bazel-out/k8-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:73] #5: testing::TestSuite::Run()
[2019-01-28 14:55:46.170][15][critical][backtrace] [bazel-out/k8-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:73] #6: testing::internal::UnitTestImpl::RunAllTests()
[2019-01-28 14:55:46.174][15][critical][backtrace] [bazel-out/k8-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:73] #7: testing::internal::HandleSehExceptionsInMethodIfSupported<>()
[2019-01-28 14:55:46.178][15][critical][backtrace] [bazel-out/k8-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:73] #8: testing::internal::HandleExceptionsInMethodIfSupported<>()
[2019-01-28 14:55:46.182][15][critical][backtrace] [bazel-out/k8-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:73] #9: testing::UnitTest::Run()
[2019-01-28 14:55:46.186][15][critical][backtrace] [bazel-out/k8-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:73] #10: RUN_ALL_TESTS()
[2019-01-28 14:55:46.190][15][critical][backtrace] [bazel-out/k8-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:73] #11: Envoy::TestRunner::RunTests()
[2019-01-28 14:55:46.194][15][critical][backtrace] [bazel-out/k8-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:73] #12: main
[2019-01-28 14:55:46.194][15][critical][backtrace] [bazel-out/k8-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #13: 0x7f47713d72b1

Having looked at this, I'm going to make one more change which is to include the address even if the symbol can be resolved by absl::Symbolize. That gives the maximum flexibility to whoever is trying to debug the stack trace.

@jmarantz
Copy link
Contributor

Can you show the output with a debug build? I'm hoping there will be line numbers.

@dnoe
Copy link
Contributor Author

dnoe commented Jan 28, 2019

Sadly, there are not. Maybe with a future version of absl::Symbolize.

But, with the revised version where I print the decoded symbol and address, you can use addr2line on it and get the full information.

I think this is probably the best compromise between providing useful information in the log out of the box, and providing the ability to get more information as needed later with addr2line.

Signed-off-by: Dan Noé <dpn@google.com>
@dnoe
Copy link
Contributor Author

dnoe commented Jan 28, 2019

/review @jmarantz

bazel/README.md Outdated
than resolved symbols.
than resolved symbols. If the symbols were resolved, the address is also included at
the end of the line. It can be used with a debugging tool like `addr2line` to
get a precise line number if the required debugging info is available.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a suggested bazel --run_under=.... snippet describing how we can get line numbers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addr2line isn't a pass through by itself, although I implemented such a functionality into stack_decode.py. If you think it'll actually be useful I can modify stack_decode.py to work with the new format backtraces rather than eliminate it. Then you'll be able to use bazel --run_under= like before.

Alternatively, I can just write instructions for manually invoking addr2line with the right args and the addresses of interest (it just won't be a --run_under=).

I have a mild preference for the latter (and let stack_decode.py die), WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like run_under is easier to use unless addr2line is destructive to lines that are not part of stack traces. What's the reason you prefer to ask users to do this manually?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's revive stack_decode.py. Sorry about the futz here, but I love my line numbers! :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries! I didn't think anyone was using it! Happy to fix it up to get the best of both worlds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, revert this doc change? The previous flow still works, yeah? So the additional detail is not strictly necessary.

Copy link
Contributor Author

@dnoe dnoe Jan 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flow is slightly different. Previously, stack_decode.py autodetected the object file name from the log, because backward had this information and we could log it. But we don't have it anymore and the executable file needs to be passed in all circumstances so the usage details of stack_decode.py changed slightly. I'm going to update the doc now that mac CI is confirmed to be passing.

@dnoe
Copy link
Contributor Author

dnoe commented Jan 29, 2019

New output post stack_decode.py:

[2019-01-29 16:34:52.153][15][critical][backtrace] [bazel-out/k8-dbg/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:74] #0: Envoy::Backward_Basic_Test::TestBody() [0x42c477] /proc/self/cwd/test/server/backtrace_test.cc:12
[2019-01-29 16:34:52.157][15][critical][backtrace] [bazel-out/k8-dbg/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:74] #1: testing::internal::HandleSehExceptionsInMethodIfSupported<>() [0xcda96e] /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2424
[2019-01-29 16:34:52.161][15][critical][backtrace] [bazel-out/k8-dbg/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:74] #2: testing::internal::HandleExceptionsInMethodIfSupported<>() [0xcc9b7e] /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2460
[2019-01-29 16:34:52.165][15][critical][backtrace] [bazel-out/k8-dbg/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:74] #3: testing::Test::Run() [0xcb41e3] /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2506
[2019-01-29 16:34:52.169][15][critical][backtrace] [bazel-out/k8-dbg/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:74] #4: testing::TestInfo::Run() [0xcb4df8] /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2679
[2019-01-29 16:34:52.173][15][critical][backtrace] [bazel-out/k8-dbg/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:74] #5: testing::TestSuite::Run() [0xcb54d3] /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2802
[2019-01-29 16:34:52.177][15][critical][backtrace] [bazel-out/k8-dbg/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:74] #6: testing::internal::UnitTestImpl::RunAllTests() [0xcc293a] /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:5240
[2019-01-29 16:34:52.181][15][critical][backtrace] [bazel-out/k8-dbg/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:74] #7: testing::internal::HandleSehExceptionsInMethodIfSupported<>() [0xcdd8fe] /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2424
[2019-01-29 16:34:52.185][15][critical][backtrace] [bazel-out/k8-dbg/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:74] #8: testing::internal::HandleExceptionsInMethodIfSupported<>() [0xccc234] /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2460
[2019-01-29 16:34:52.189][15][critical][backtrace] [bazel-out/k8-dbg/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:74] #9: testing::UnitTest::Run() [0xcc2613] /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:4843
[2019-01-29 16:34:52.193][15][critical][backtrace] [bazel-out/k8-dbg/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:74] #10: RUN_ALL_TESTS() [0x44a991] /proc/self/cwd/external/com_google_googletest/googletest/include/gtest/gtest.h:2499
[2019-01-29 16:34:52.197][15][critical][backtrace] [bazel-out/k8-dbg/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:74] #11: Envoy::TestRunner::RunTests() [0x44a3b2] /proc/self/cwd/./test/test_runner.h:49
[2019-01-29 16:34:52.201][15][critical][backtrace] [bazel-out/k8-dbg/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:74] #12: main [0x449a06] /proc/self/cwd/test/main.cc:39
[2019-01-29 16:34:52.201][15][critical][backtrace] [bazel-out/k8-dbg/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:76] #13: [0x7fd846a862b1]

Function names come directly from absl::Symbolize, the file names and line numbers are appended by stack_decode.py using addr2line.

Signed-off-by: Dan Noé <dpn@google.com>
@dnoe
Copy link
Contributor Author

dnoe commented Jan 29, 2019

@htuch I'm not sure why /proc/self/cwd stuff is showing up in the filenames in the binary debugging info, but that's what's there. I blame bazel. Thoughts on making the script trim /proc/self/cwd/ if it's at the beginning of the path that comes back from addr2line?

Signed-off-by: Dan Noé <dpn@google.com>
@dnoe
Copy link
Contributor Author

dnoe commented Jan 29, 2019

mac build fails on //test/extensions/filters/listener/proxy_protocol:proxy_protocol_test

I suspect this is a flake.

/retest

@repokitteh-read-only
Copy link

🔨 rebuilding ci/circleci: mac (failed build)

🐱

Caused by: a #5723 (comment) was created by @dnoe.

see: more, trace.

Signed-off-by: Dan Noé <dpn@google.com>
@dnoe
Copy link
Contributor Author

dnoe commented Jan 30, 2019

@htuch this is ready for a senior maintainer pass.

jmarantz
jmarantz previously approved these changes Jan 30, 2019
@htuch
Copy link
Member

htuch commented Jan 30, 2019

@dnoe the /proc/self/cwd thing has to do with how Bazel execs tests, it's fine to trim.

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but needs master merge and some Python nit picking.

stdout=subprocess.PIPE)
output_stdout, _ = addr2line.communicate(piped_input)
return output_stdout.split("\n")
output_stdout, _ = addr2line.communicate()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think we could just use subprocess.check_output here now to simplify.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, done.

else:
decode_stacktrace_log(sys.stdin)
print "Usage (execute subprocess): stack_decode.py executable_file [additional args]"
print "Usage (read from stdin): stack_decode.py -s executable_file"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: sys.exit(1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done; will sys.exit(0) in stdin use case.

return file_and_line_number


def trim_from_start(string, trim):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: prefer not using string here, since it can be confused with https://docs.python.org/3/library/string.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced the whole thing with re.sub.

def trim_from_start(string, trim):
if string.startswith(trim):
return string[len(trim):]
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: no need for else, just fall-thru

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced the whole thing with re.sub.

def trim_proc_cwd(file_and_line_number):
# Try the longer trim first, it will leave the string untouched if it doesn't
# match.
file_and_line_number = trim_from_start(file_and_line_number, "/proc/self/cwd/./")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not do a regex search and replace with re.sub or the like? Performance is likely fine here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent idea, thanks.

Signed-off-by: Dan Noé <dpn@google.com>
@dnoe
Copy link
Contributor Author

dnoe commented Jan 31, 2019

bazel/README.md also updated at this time to show a stdin invocation example as requested by @jmarantz

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@dnoe
Copy link
Contributor Author

dnoe commented Jan 31, 2019

/retest

@repokitteh-read-only
Copy link

🔨 rebuilding ci/circleci: tsan (failed build)
🔨 rebuilding ci/circleci: asan (failed build)

🐱

Caused by: a #5723 (comment) was created by @dnoe.

see: more, trace.

@dnoe dnoe merged commit 92881c1 into envoyproxy:master Jan 31, 2019
fredlas pushed a commit to fredlas/envoy that referenced this pull request Mar 5, 2019
Description:

Use the absl::Stacktrace library along with absl::Symbolize to generate
symbolized stack traces without requiring any external tool or post
processing.

Remove the external dependency on "backward" library.

The stack_decode.py script can still be used to obtain additional information,
notably file and line numbers. See bazel/README.md for revised usage instructions.

Risk Level: Low (may break tooling that interpreted stack traces in logs)
Testing: bazel test //test/... , manually examined symbolized backtrace in killed binary.
Docs Changes: Changed bazel/README.md
Release Notes: Usage details of the stack_decode.py script have changed. Stack traces should include decoded names without it when the binary has appropriate debugging symbols. The script can be used to obtain file and line numbers, see bazel/README.md for details.

Signed-off-by: Dan Noé <dpn@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
@clyang82
Copy link

@dnoe before your changes, If I want to test a new arch, I just added

#elif defined(__s390x__)
    error_pc =  reinterpret_cast<void*>(ucontext->uc_mcontext.psw.addr);

insignal_action.cc

Could you point me where I should add in the new code? I have successfully compiled the envoy in s390x, but failed to run with this error.

[2019-05-23 02:28:13.551][9111][critical][assert] [external/envoy/source/exe/signal_action.cc:72] assert failure: mprotect(altstack_, guard_size_, PROT_NONE) == 0.

Thanks.

def _com_github_bombela_backward():
_repository_impl(
name = "com_github_bombela_backward",
build_file = "@envoy//bazel/external:backward.BUILD",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed that https://github.com/envoyproxy/envoy/blob/master/bazel/external/backward.BUILD is still in the codebase. This can be removed now I believe @dnoe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants