-
-
Notifications
You must be signed in to change notification settings - Fork 411
fix: refactor blobs and data columns metrics #8198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| (blockInput.blockData.fork === ForkName.deneb || blockInput.blockData.fork === ForkName.electra) | ||
| ) { | ||
| const {blobsSource} = blockInput.blockData; | ||
| this.metrics?.importBlock.blobsBySource.inc({blobsSource}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to seenGossipBlockInput.ts.
| validationTime: register.histogram({ | ||
| name: "lodestar_gossip_data_column_gossip_validate_time", | ||
| help: "Time elapsed for data column validation", | ||
| buckets: [0.05, 0.1, 0.2, 0.5, 1, 1.5, 2, 4], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point overlaps with the specs' beacon_data_column_sidecar_gossip_verification_seconds metric. Probably, should be deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it's already defined in the spec should prefer using those metrics if they track the same
| help: "Time elapsed between block slot time and the time data column sidecar reconstructed", | ||
| buckets: [2, 4, 6, 8, 10, 12], | ||
| gossipDataColumn: { | ||
| recvToValidation: register.histogram({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be added to the PeerDAS dashboard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason not to visualize it?
| blobs: { | ||
| bySource: register.gauge<{source: BlobsSource}>({ | ||
| name: "lodestar_blobs_by_source", | ||
| help: "Number of received blobs by source", | ||
| labelNames: ["source"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved out from importBlock section - blobsBySource.
|
Are blob metrics from |
nflaig
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blobs metrics seem incomplete and data column likely as well (need to double check), at least for blobs we might also get them from engine or req/resp, not only gossip and api.
you can search for EventType.blobSidecar, also see changes in #7967
| } as BlockInputBlobs; | ||
|
|
||
| blockInputs.push(getBlockInput.availableData(config, block.data, blockSource, blockData)); | ||
| metrics?.blobs.bySource.inc({source: blobsSource}, allBlobSidecars.length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function also called from ByRoot.
|
|
||
| let allDataColumnsSidecars: fulu.DataColumnSidecar[]; | ||
| logger?.debug("allDataColumnsSidecars partialDownload", { | ||
| let allDataColumnSidecars: fulu.DataColumnSidecar[]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed for consistency, has been used twice in this module, as allDataColumnSidecars and allDataColumnsSidecars. Probably, this change should be moved into other PR.
| // for e.g. a blockInput that might be awaiting blobs promise fullfillment in | ||
| // verifyBlocksDataAvailability | ||
| cachedData.blobsCache.set(blobSidecar.index, blobSidecar); | ||
| metrics?.blobs.bySource.inc({source: BlobsSource.engine}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how to count here: all the blobs we've got from EL or those from EL which are not cached yet. Same for data columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EL which are not cached yet.
otherwise we would get incorrect numbers, we care about from where we've received the data first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
based on check cachedData.blobsCache.has(engineReqIdentifiers[j].index) === false it seems we already count this correctly, ie. we only increment if we've received a blob from EL which we haven't received yet from gossip or other sources
| }, | ||
| gossipDataColumn: { | ||
| recvToValidation: register.histogram({ | ||
| name: "lodestar_gossip_data_column_received_to_gossip_validate", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unit is missing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the metric name might also suggest that it tracks time between receiving until before validation, not including validation time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The [start]To[end] metric naming convention is a standard practice for measuring the total latency of a process. It is used to define the full time elapsed from the moment a process starts until the moment it completes. Adding unit makes sense.
| help: "Time elapsed between data column received and data column validation", | ||
| buckets: [0.05, 0.1, 0.2, 0.5, 1, 1.5, 2, 4], | ||
| }), | ||
| validationTime: register.histogram({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any difference between recvToValidation and validationTime? we shouldn't be doing a lot of work besides deserialzation which is pretty fast
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a difference. Deleted validationTime in favor of verificationTime from the metrics specs.
| validationTime: register.histogram({ | ||
| name: "lodestar_gossip_data_column_gossip_validate_time", | ||
| help: "Time elapsed for data column validation", | ||
| buckets: [0.05, 0.1, 0.2, 0.5, 1, 1.5, 2, 4], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it's already defined in the spec should prefer using those metrics if they track the same
| help: "Time elapsed between block slot time and the time data column sidecar reconstructed", | ||
| buckets: [2, 4, 6, 8, 10, 12], | ||
| gossipDataColumn: { | ||
| recvToValidation: register.histogram({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason not to visualize it?
| // for e.g. a blockInput that might be awaiting blobs promise fullfillment in | ||
| // verifyBlocksDataAvailability | ||
| cachedData.blobsCache.set(blobSidecar.index, blobSidecar); | ||
| metrics?.blobs.bySource.inc({source: BlobsSource.engine}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
based on check cachedData.blobsCache.has(engineReqIdentifiers[j].index) === false it seems we already count this correctly, ie. we only increment if we've received a blob from EL which we haven't received yet from gossip or other sources
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## unstable #8198 +/- ##
=========================================
Coverage 54.24% 54.25%
=========================================
Files 847 847
Lines 63829 63839 +10
Branches 4833 4837 +4
=========================================
+ Hits 34626 34636 +10
Misses 29126 29126
Partials 77 77 🚀 New features to boost your workflow:
|
Performance Report🚀🚀 Significant benchmark improvement detected
Full benchmark results
|
|
@KatyaRyazantseva are you planning to continue working on this? needs to be updated against latest unstable since we did quite a lot of changes in block input refactor |
If the block input refactor has finished, then yes, I am going to finish this PR. Will merge the updates. |
all changes have been merged, seems like we submitted the grafana dashboard changes already to unstable in #8443 |
Motivation
This PR aims to solve inconsistencies and tech debt associated with the metrics for blobs and columns, as outlined in #8174.
Description
The metrics have been refactored for better consistency between blobs and data columns.
Steps to test or reproduce
Blobs metrics are not widely used on dashboards. For some data column metrics, use the PeerDAS dashboard.