-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support pkg/errors Stacktracer interface #303
Comments
I'm generally +1 on supporting anything that's "almost" cannon; for me that heuristic looks like:
In particular, I think the case of errors' Stacktracer a good fit. |
What kind of support were you thinking? Logging a separate field like "errorStackTrace" with the stack trace? I'd prefer not to take a dependency on It's unfortunate that we can't check whether there's a stacktrace we can log without depending on the package. There are discussions around changing this (pkg/errors#79). I think we should wait till there's a 1.0 tagged before taking on the dependency. There's one "trick" to getting stack traces in logs if we really want -- instead of calling |
|
Actually, it looks like we can just cast into the |
<!-- Describe what has changed in this PR --> **What changed?** I upgraded our zap version from v1.24.0 to v1.26.0, which contains support for pkg/errors. See [this issue](uber-go/zap#303) and [this commit](uber-go/zap@5fc2db7). <img width="448" alt="image" src="https://github.com/temporalio/temporal/assets/5942963/7d15d91a-27d3-45f6-9628-30bdcac38771"> <!-- Tell your future self why have you made these changes --> **Why?** Before this change, our logs would only contain the stack trace from where the logger itself was invoked, not from the source of where the error was generated or wrapped. This provided very little useful information. <!-- How have you verified this change? Tested locally? Added a unit test? Checked in staging env? --> **How did you test it?** I ran a custom [build of server](https://gist.github.com/MichaelSnowden/c649dfd1efeb92f10bc72a040a792a8d) which overwrote some deep code in history to return an error. I then set up docker-compose to output logs -> promtail -> loki -> grafana. Then, I queried Grafana to verify that the error log contained an "errorVerbose" field with the stack trace from where my error was generated. As you can see from the below image, the stack trace does appear under this field, and if you turn on JSON parsing and newline escaping, you can both see it rendered correctly, and you can copy-paste the stack trace. <img width="983" alt="image" src="https://github.com/temporalio/temporal/assets/5942963/cb9b0c83-b146-4060-9eac-3ccf9b807657"> <img width="683" alt="image" src="https://github.com/temporalio/temporal/assets/5942963/6fa28f2f-5d04-47ae-b8a3-7512c3dd85e7"> The stack trace from Grafana: ``` oopsie woopsie main.(*faultyShardEngine).StartWorkflowExecution /Users/mikey/src/temporalio/temporal/.scratches/main.go:39 go.temporal.io/server/service/history.(*Handler).StartWorkflowExecution /Users/mikey/src/temporalio/temporal/service/history/handler.go:595 go.temporal.io/server/api/historyservice/v1._HistoryService_StartWorkflowExecution_Handler.func1 /Users/mikey/src/temporalio/temporal/api/historyservice/v1/service.pb.go:1300 ... ``` <!-- Assuming the worst case, what can be broken when deploying this change to production? --> **Potential risks** The stack traces are pretty deep because of all our gRPC interceptors. However, we can definitely fix that later if we want by filtering the `pkg/errors.StackTrace`. I'd rather do that in a follow-up after getting support for this initial change first, though. <!-- Is this PR a hotfix candidate or require that a notification be sent to the broader community? (Yes/No) --> **Is hotfix candidate?** No.
Consider whether we should take on a third-party dependency to support
pkg/errors
. If we really think that this package is garnering widespread adoption, we should support its error type and include both stacktraces and causes in log output.Thoughts, @prashantv and @jcorbin?
The text was updated successfully, but these errors were encountered: