fix(sequencer): improve and fix instrumentation by Fraser999 · Pull Request #1255 · astriaorg/astria

Fraser999 · 2024-07-09T23:51:27Z

Summary

Generally improved instrumentation, including a fix for App::execute_transaction.

Background

There were a couple of async blocks spawned in tokio tasks which were not instrumented, resulting in misleading tracing data for App::execute_transaction. While investigating this, I discovered several instances of tracing fields using Debug output, and also a few functions which seemed to me like they would benefit from being instrumented.

Changes

Applied the parent tracing span to the two spawned tasks.
Replaced many Debug fields with Display ones. They were all down to not being skipped via skip_all, so I replaced all instances of skip(...) with skip_all meaning any fields to be included have to be explicitly listed.
Removed all fields from instrumentation to cut down on tracing/log noise.
Added instrumentation to the transaction-checking functions and mempool.

Note: I tried to largely preserve the fields which were previously being instrumented, just changing them from Debug to Display. A very few had no Display impl, so I excluded them. I think that almost all of the fields could be omitted in a follow-up PR, since I don't think we get much benefit from including things like addresses or balances in instrumentation, and that only serves to clutter the output. Potentially some of these fields are duplicated in the call chain, so even restricting their inclusion to the relevant top-level function would help.

Testing

Manually ran some tests with instrumentation enabled and eyeballed the output.

joroshiba

In general we have added some context information on db writes. I'm unclear on how useful this information is in logging and traces. Want to make sure we avoid noise. Background here of st one point before adding many of the "skip all" pieces logs where so noisy from tracing data they were unusable.

I think it might also be good to add instrumentation in more places (ie mempool service, and app mempool uninstrumented) but good general additions.

crates/astria-sequencer/src/authority/state_ext.rs

crates/astria-sequencer/src/bridge/state_ext.rs

Fraser999 · 2024-07-10T22:49:15Z

In general we have added some context information on db writes. I'm unclear on how useful this information is in logging and traces. Want to make sure we avoid noise. Background here of st one point before adding many of the "skip all" pieces logs where so noisy from tracing data they were unusable.

Agreed. The changes I originally made created strictly less noise - I only replaced verbose debug output with more succinct display output. But I agree that even that change led to noisy logs/tracing, so I went ahead and removed all fields from the tracing instrumentation in sequencer. We can re-add any as we see fit in the future, but for now I don't see any of them being useful. I might well have missed places where they would provide useful info in log lines, but in that case we can add them to just the log macro rather than the instrument one.

I think it might also be good to add instrumentation in more places (ie mempool service, and app mempool uninstrumented) but good general additions.

The original commit included instrumentation for the mempool service (handle_check_tx and all the checks were instrumented), but I've now added instrumentation for the public mempool methods.

SuperFluffy · 2024-07-16T13:17:07Z

Commenting on part of the reasoning behind this PR:

Removed all fields from instrumentation to cut down on tracing/log noise.
The point of contention here is what we view as our primary way of viewing tracing: if OTEL is the primary, then we might want to add even more information in the span. If the tracing-subsciber fmt pretty-printer is the primary, then one needs to be mindful of how cluttered the output becomes.

I tend to view OTEL as our primary way of consuming the data (especially as we go to production), whereas tracing-subscriber is mainly a debugging tool.

I am worried that we throw away information that might be potentially useful to understand how data flows through our system.

SuperFluffy

Approval for instrumenting the spawns (with suggestions) and tentatively on all the skip_alls.

I left a comment above re what information we should be putting into spans. It might be fine to have a look at the parent spans (which AFAIK are set up through penumbra-tower-trace), and just inject the requests that trigger the handlers into the parent spans.

crates/astria-sequencer/src/app/mod.rs

…encer-tracing

* main: (24 commits) chore: update `bytes` and `ics23` crates (#1279) fix(sequencer): improve and fix instrumentation (#1255) feature(charts): hermes chart fixes, bech32 updates, ibc bridge test (#1130) chore(cli): remove unused rollup cli code (#1275) chore(test): use a temporary file to not pollute the workspace (#1269) chore(sequencer): add mempool benchmarks (#1238) fix(bridge-withdrawer)!: fix nonce handling (#1215) feat(cli, bridge-withdrawer)!: share code between cli and service (#1270) feat(cli): add cmd to collect withdrawal events and submit as actions (#1261) fix(core, bridge, sequencer)!: dismabiguate return addresses (#1266) fix(withdrawer): support withdrawer address that differs from bridge address (#1262) (core, sequencer)!: generate serde traits impls for all protocol protobufs (#1260) fix(charts): add resources for sequencer/cometbft (#1254) chore(sequencer)!: add metrics (#1248) fix(sequencer-utils): fixes issue in `parse_blob` tests (#1243) feat(core, proto)!: make bridge unlock memo string (#1244) fix(conductor): don't panic during panic (#1252) feat(core)!: lowerCamelCase for protobuf json mapping (#1250) refactor(bridge-withdrawer)!: refactor startup to a separate subtask and remove balance check from startup (#1190) fix: rollup archive node configurations (#1249) ...

## Summary Generally improved instrumentation, including a fix for `App::execute_transaction`. ## Background There were a couple of async blocks spawned in tokio tasks which were not instrumented, resulting in misleading tracing data for `App::execute_transaction`. While investigating this, I discovered several instances of tracing fields using `Debug` output, and also a few functions which seemed to me like they would benefit from being instrumented. ## Changes - Applied the parent tracing span to the two spawned tasks. - ~Replaced many `Debug` fields with `Display` ones. They were all down to not being skipped via `skip_all`, so I replaced all instances of `skip(...)` with `skip_all` meaning any fields to be included have to be explicitly listed.~ - Removed all fields from instrumentation to cut down on tracing/log noise. - Added instrumentation to the transaction-checking functions and mempool. **Note:** ~I tried to largely preserve the fields which were previously being instrumented, just changing them from `Debug` to `Display`. A very few had no `Display` impl, so I excluded them. I think that almost all of the fields could be omitted in a follow-up PR, since I don't think we get much benefit from including things like addresses or balances in instrumentation, and that only serves to clutter the output. Potentially some of these fields are duplicated in the call chain, so even restricting their inclusion to the relevant top-level function would help.~ ## Testing Manually ran some tests with instrumentation enabled and eyeballed the output.

improve instrumentation including fix for App::execute_transaction

d9479e9

Fraser999 requested a review from a team as a code owner July 9, 2024 23:51

Fraser999 requested a review from SuperFluffy July 9, 2024 23:51

github-actions bot added the sequencer pertaining to the astria-sequencer crate label Jul 9, 2024

Fraser999 mentioned this pull request Jul 9, 2024

Investigate sequencer block production slowdown when under stress #1246

Closed

joroshiba reviewed Jul 10, 2024

View reviewed changes

crates/astria-sequencer/src/authority/state_ext.rs Outdated Show resolved Hide resolved

crates/astria-sequencer/src/bridge/state_ext.rs Outdated Show resolved Hide resolved

remove fields from tracing instrumentation in sequencer

1ceff06

SuperFluffy approved these changes Jul 16, 2024

View reviewed changes

crates/astria-sequencer/src/app/mod.rs Outdated Show resolved Hide resolved

crates/astria-sequencer/src/app/mod.rs Outdated Show resolved Hide resolved

Fraser999 added 2 commits July 17, 2024 16:41

Merge remote-tracking branch 'upstream/main' into fraser/improve-sequ…

afd34c9

…encer-tracing

change to use in_current_span

45c90ab

Fraser999 enabled auto-merge July 17, 2024 15:47

Fraser999 added this pull request to the merge queue Jul 17, 2024

Merged via the queue into main with commit 45a6415 Jul 17, 2024

Fraser999 deleted the fraser/improve-sequencer-tracing branch July 17, 2024 15:59

Fraser999 mentioned this pull request Jul 17, 2024

feat(sequencer): integrate connect oracle and vote extension logic #1236

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

fix(sequencer): improve and fix instrumentation#1255

fix(sequencer): improve and fix instrumentation#1255
Fraser999 merged 4 commits intomainfrom
fraser/improve-sequencer-tracing

Fraser999 commented Jul 9, 2024 •

edited

Loading

Uh oh!

joroshiba left a comment

Uh oh!

Uh oh!

Uh oh!

Fraser999 commented Jul 10, 2024

Uh oh!

SuperFluffy commented Jul 16, 2024

Uh oh!

SuperFluffy left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

Fraser999 commented Jul 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Background

Changes

Testing

Uh oh!

joroshiba left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Fraser999 commented Jul 10, 2024

Uh oh!

SuperFluffy commented Jul 16, 2024

Uh oh!

SuperFluffy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fraser999 commented Jul 9, 2024 •

edited

Loading