Skip to content

Comments

fix(sequencer): improve and fix instrumentation#1255

Merged
Fraser999 merged 4 commits intomainfrom
fraser/improve-sequencer-tracing
Jul 17, 2024
Merged

fix(sequencer): improve and fix instrumentation#1255
Fraser999 merged 4 commits intomainfrom
fraser/improve-sequencer-tracing

Conversation

@Fraser999
Copy link
Contributor

@Fraser999 Fraser999 commented Jul 9, 2024

Summary

Generally improved instrumentation, including a fix for App::execute_transaction.

Background

There were a couple of async blocks spawned in tokio tasks which were not instrumented, resulting in misleading tracing data for App::execute_transaction. While investigating this, I discovered several instances of tracing fields using Debug output, and also a few functions which seemed to me like they would benefit from being instrumented.

Changes

  • Applied the parent tracing span to the two spawned tasks.
  • Replaced many Debug fields with Display ones. They were all down to not being skipped via skip_all, so I replaced all instances of skip(...) with skip_all meaning any fields to be included have to be explicitly listed.
  • Removed all fields from instrumentation to cut down on tracing/log noise.
  • Added instrumentation to the transaction-checking functions and mempool.

Note: I tried to largely preserve the fields which were previously being instrumented, just changing them from Debug to Display. A very few had no Display impl, so I excluded them. I think that almost all of the fields could be omitted in a follow-up PR, since I don't think we get much benefit from including things like addresses or balances in instrumentation, and that only serves to clutter the output. Potentially some of these fields are duplicated in the call chain, so even restricting their inclusion to the relevant top-level function would help.

Testing

Manually ran some tests with instrumentation enabled and eyeballed the output.

@Fraser999 Fraser999 requested a review from a team as a code owner July 9, 2024 23:51
@Fraser999 Fraser999 requested a review from SuperFluffy July 9, 2024 23:51
@github-actions github-actions bot added the sequencer pertaining to the astria-sequencer crate label Jul 9, 2024
Copy link
Member

@joroshiba joroshiba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general we have added some context information on db writes. I'm unclear on how useful this information is in logging and traces. Want to make sure we avoid noise. Background here of st one point before adding many of the "skip all" pieces logs where so noisy from tracing data they were unusable.

I think it might also be good to add instrumentation in more places (ie mempool service, and app mempool uninstrumented) but good general additions.

@Fraser999
Copy link
Contributor Author

In general we have added some context information on db writes. I'm unclear on how useful this information is in logging and traces. Want to make sure we avoid noise. Background here of st one point before adding many of the "skip all" pieces logs where so noisy from tracing data they were unusable.

Agreed. The changes I originally made created strictly less noise - I only replaced verbose debug output with more succinct display output. But I agree that even that change led to noisy logs/tracing, so I went ahead and removed all fields from the tracing instrumentation in sequencer. We can re-add any as we see fit in the future, but for now I don't see any of them being useful. I might well have missed places where they would provide useful info in log lines, but in that case we can add them to just the log macro rather than the instrument one.

I think it might also be good to add instrumentation in more places (ie mempool service, and app mempool uninstrumented) but good general additions.

The original commit included instrumentation for the mempool service (handle_check_tx and all the checks were instrumented), but I've now added instrumentation for the public mempool methods.

@SuperFluffy
Copy link
Contributor

Commenting on part of the reasoning behind this PR:

Removed all fields from instrumentation to cut down on tracing/log noise.
The point of contention here is what we view as our primary way of viewing tracing: if OTEL is the primary, then we might want to add even more information in the span. If the tracing-subsciber fmt pretty-printer is the primary, then one needs to be mindful of how cluttered the output becomes.

I tend to view OTEL as our primary way of consuming the data (especially as we go to production), whereas tracing-subscriber is mainly a debugging tool.

I am worried that we throw away information that might be potentially useful to understand how data flows through our system.

Copy link
Contributor

@SuperFluffy SuperFluffy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approval for instrumenting the spawns (with suggestions) and tentatively on all the skip_alls.

I left a comment above re what information we should be putting into spans. It might be fine to have a look at the parent spans (which AFAIK are set up through penumbra-tower-trace), and just inject the requests that trigger the handlers into the parent spans.

@Fraser999 Fraser999 enabled auto-merge July 17, 2024 15:47
@Fraser999 Fraser999 added this pull request to the merge queue Jul 17, 2024
Merged via the queue into main with commit 45a6415 Jul 17, 2024
@Fraser999 Fraser999 deleted the fraser/improve-sequencer-tracing branch July 17, 2024 15:59
steezeburger added a commit that referenced this pull request Jul 19, 2024
* main: (24 commits)
  chore: update `bytes` and `ics23` crates (#1279)
  fix(sequencer): improve and fix instrumentation (#1255)
  feature(charts): hermes chart fixes, bech32 updates, ibc bridge test (#1130)
  chore(cli): remove unused rollup cli code (#1275)
  chore(test): use a temporary file to not pollute the workspace (#1269)
  chore(sequencer): add mempool benchmarks (#1238)
  fix(bridge-withdrawer)!: fix nonce handling (#1215)
  feat(cli, bridge-withdrawer)!: share code between cli and service (#1270)
  feat(cli): add cmd to collect withdrawal events and submit as actions (#1261)
  fix(core, bridge, sequencer)!: dismabiguate return addresses (#1266)
  fix(withdrawer): support withdrawer address that differs from bridge address   (#1262)
  (core, sequencer)!: generate serde traits impls for all protocol protobufs (#1260)
  fix(charts): add resources for sequencer/cometbft (#1254)
  chore(sequencer)!: add metrics (#1248)
  fix(sequencer-utils): fixes issue in `parse_blob` tests (#1243)
  feat(core, proto)!: make bridge unlock memo string (#1244)
  fix(conductor): don't panic during panic (#1252)
  feat(core)!: lowerCamelCase for protobuf json mapping (#1250)
  refactor(bridge-withdrawer)!: refactor startup to a separate subtask and remove balance check from startup (#1190)
  fix: rollup archive node configurations (#1249)
  ...
bharath-123 pushed a commit that referenced this pull request Jul 25, 2024
## Summary
Generally improved instrumentation, including a fix for
`App::execute_transaction`.

## Background
There were a couple of async blocks spawned in tokio tasks which were
not instrumented, resulting in misleading tracing data for
`App::execute_transaction`. While investigating this, I discovered
several instances of tracing fields using `Debug` output, and also a few
functions which seemed to me like they would benefit from being
instrumented.

## Changes
- Applied the parent tracing span to the two spawned tasks.
- ~Replaced many `Debug` fields with `Display` ones. They were all down
to not being skipped via `skip_all`, so I replaced all instances of
`skip(...)` with `skip_all` meaning any fields to be included have to be
explicitly listed.~
- Removed all fields from instrumentation to cut down on tracing/log
noise.
- Added instrumentation to the transaction-checking functions and
mempool.

**Note:** ~I tried to largely preserve the fields which were previously
being instrumented, just changing them from `Debug` to `Display`. A very
few had no `Display` impl, so I excluded them. I think that almost all
of the fields could be omitted in a follow-up PR, since I don't think we
get much benefit from including things like addresses or balances in
instrumentation, and that only serves to clutter the output. Potentially
some of these fields are duplicated in the call chain, so even
restricting their inclusion to the relevant top-level function would
help.~

## Testing
Manually ran some tests with instrumentation enabled and eyeballed the
output.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sequencer pertaining to the astria-sequencer crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants