fix(sequencer): improve and fix instrumentation#1255
Conversation
joroshiba
left a comment
There was a problem hiding this comment.
In general we have added some context information on db writes. I'm unclear on how useful this information is in logging and traces. Want to make sure we avoid noise. Background here of st one point before adding many of the "skip all" pieces logs where so noisy from tracing data they were unusable.
I think it might also be good to add instrumentation in more places (ie mempool service, and app mempool uninstrumented) but good general additions.
Agreed. The changes I originally made created strictly less noise - I only replaced verbose debug output with more succinct display output. But I agree that even that change led to noisy logs/tracing, so I went ahead and removed all fields from the tracing instrumentation in sequencer. We can re-add any as we see fit in the future, but for now I don't see any of them being useful. I might well have missed places where they would provide useful info in log lines, but in that case we can add them to just the log macro rather than the instrument one.
The original commit included instrumentation for the mempool service ( |
|
Commenting on part of the reasoning behind this PR:
I tend to view OTEL as our primary way of consuming the data (especially as we go to production), whereas I am worried that we throw away information that might be potentially useful to understand how data flows through our system. |
SuperFluffy
left a comment
There was a problem hiding this comment.
Approval for instrumenting the spawns (with suggestions) and tentatively on all the skip_alls.
I left a comment above re what information we should be putting into spans. It might be fine to have a look at the parent spans (which AFAIK are set up through penumbra-tower-trace), and just inject the requests that trigger the handlers into the parent spans.
* main: (24 commits) chore: update `bytes` and `ics23` crates (#1279) fix(sequencer): improve and fix instrumentation (#1255) feature(charts): hermes chart fixes, bech32 updates, ibc bridge test (#1130) chore(cli): remove unused rollup cli code (#1275) chore(test): use a temporary file to not pollute the workspace (#1269) chore(sequencer): add mempool benchmarks (#1238) fix(bridge-withdrawer)!: fix nonce handling (#1215) feat(cli, bridge-withdrawer)!: share code between cli and service (#1270) feat(cli): add cmd to collect withdrawal events and submit as actions (#1261) fix(core, bridge, sequencer)!: dismabiguate return addresses (#1266) fix(withdrawer): support withdrawer address that differs from bridge address (#1262) (core, sequencer)!: generate serde traits impls for all protocol protobufs (#1260) fix(charts): add resources for sequencer/cometbft (#1254) chore(sequencer)!: add metrics (#1248) fix(sequencer-utils): fixes issue in `parse_blob` tests (#1243) feat(core, proto)!: make bridge unlock memo string (#1244) fix(conductor): don't panic during panic (#1252) feat(core)!: lowerCamelCase for protobuf json mapping (#1250) refactor(bridge-withdrawer)!: refactor startup to a separate subtask and remove balance check from startup (#1190) fix: rollup archive node configurations (#1249) ...
## Summary Generally improved instrumentation, including a fix for `App::execute_transaction`. ## Background There were a couple of async blocks spawned in tokio tasks which were not instrumented, resulting in misleading tracing data for `App::execute_transaction`. While investigating this, I discovered several instances of tracing fields using `Debug` output, and also a few functions which seemed to me like they would benefit from being instrumented. ## Changes - Applied the parent tracing span to the two spawned tasks. - ~Replaced many `Debug` fields with `Display` ones. They were all down to not being skipped via `skip_all`, so I replaced all instances of `skip(...)` with `skip_all` meaning any fields to be included have to be explicitly listed.~ - Removed all fields from instrumentation to cut down on tracing/log noise. - Added instrumentation to the transaction-checking functions and mempool. **Note:** ~I tried to largely preserve the fields which were previously being instrumented, just changing them from `Debug` to `Display`. A very few had no `Display` impl, so I excluded them. I think that almost all of the fields could be omitted in a follow-up PR, since I don't think we get much benefit from including things like addresses or balances in instrumentation, and that only serves to clutter the output. Potentially some of these fields are duplicated in the call chain, so even restricting their inclusion to the relevant top-level function would help.~ ## Testing Manually ran some tests with instrumentation enabled and eyeballed the output.
Summary
Generally improved instrumentation, including a fix for
App::execute_transaction.Background
There were a couple of async blocks spawned in tokio tasks which were not instrumented, resulting in misleading tracing data for
App::execute_transaction. While investigating this, I discovered several instances of tracing fields usingDebugoutput, and also a few functions which seemed to me like they would benefit from being instrumented.Changes
Replaced manyDebugfields withDisplayones. They were all down to not being skipped viaskip_all, so I replaced all instances ofskip(...)withskip_allmeaning any fields to be included have to be explicitly listed.Note:
I tried to largely preserve the fields which were previously being instrumented, just changing them fromDebugtoDisplay. A very few had noDisplayimpl, so I excluded them. I think that almost all of the fields could be omitted in a follow-up PR, since I don't think we get much benefit from including things like addresses or balances in instrumentation, and that only serves to clutter the output. Potentially some of these fields are duplicated in the call chain, so even restricting their inclusion to the relevant top-level function would help.Testing
Manually ran some tests with instrumentation enabled and eyeballed the output.