feat: send state metrics to xatu#3
Closed
weiihann wants to merge 419 commits into
Closed
Conversation
469728a to
137c8c3
Compare
137c8c3 to
6447d7a
Compare
The`plucky` and `oracular` have reached end of life. That's why launchpad isn't building them anymore: https://launchpad.net/~ethereum/+archive/ubuntu/ethereum/+packages.
We didn't upgrade to 1.25, so this jumps over one version. I want to upgrade all builds to Go 1.26 soon, but let's start with the Docker build to get a sense of any possible issues.
…33900) The endianness was wrong, which means that the code chunks were stored in the wrong location in the tree.
fix the flaky test found in https://ci.appveyor.com/project/ethereum/go-ethereum/builds/53601688/job/af5ccvufpm9usq39 1. increase the timeout from 3+1s to 15s, and use timer instead of sleep(in the CI env, it may need more time to sync the 1024 blocks) 2. add `synced.Load()` to ensure the full async chain is finished Signed-off-by: Delweng <delweng@gmail.com>
With this, we are dropping support for protocol version eth/68. The only supported version is eth/69 now. The p2p receipt encoding logic can be simplified a lot, and processing of receipts during sync gets a little faster because we now transform the network encoding into the database encoding directly, without decoding the receipts first. --------- Co-authored-by: Felix Lange <fjl@twurst.com>
I noticed that some autonomous agents have a tendency to commit binaries if asked to create a PR.
Fixes priceheap comparison in some edge cases. --------- Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
This PR introduces a threshold (relative to current market base fees), below which we suppress the diffusion of low fee transactions. Once base fees go down, and if the transactions were not evicted in the meantime, we release these transactions. The PR also updates the bucketing logic to be more sensitive, removing the extra logarithm. Blobpool description is also updated to reflect the new behavior. EIP-7918 changed the maximim blob fee decrease that can happen in a slot. The PR also updates fee jump calculation to reflect this. --------- Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
…ethereum#33908) The payload rebuild loop resets the timer with the full Recommit duration after generateWork returns, making the actual interval generateWork_elapsed + Recommit instead of Recommit alone. Since fillTransactions uses Recommit (2s) as its timeout ceiling, the effective rebuild interval can reach ~4s under heavy blob workloads — only 1–2 rebuilds in a 6s half-slot window instead of the intended 3. Fix by subtracting elapsed time from the timer reset. ### Before this fix ``` t=0s timer fires, generateWork starts t=2s fillTransactions times out, timer.Reset(2s) t=4s second rebuild starts t=6s CL calls getPayload — gets the t=2s result (1 effective rebuild) ``` ### After ``` t=0s timer fires, generateWork starts t=2s fillTransactions times out, timer.Reset(2s - 2s = 0) t=2s second rebuild starts immediately t=4s timer.Reset(0), third rebuild starts t=6s CL calls getPayload — gets the t=4s result (3 effective rebuilds) ```
We got a report that after v1.17.0 a geth-teku node starts to time out on engine_getBlobsV2 after around 3h of operation. The culprit seems to be our optional http2 service which Teku attempts first. The exact cause of the timeout is still unclear. This PR is more of a workaround than proper fix until we figure out the underlying issue. But I don't expect http2 to particularly benefit engine API throughput and latency. Hence it should be fine to disable it for now.
…reum#33946) ethereum#33916 + cmd/keeper go mod tidy --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
In `buildPayload()`, the background goroutine uses a `select` to wait on the recommit timer, the stop channel, and the end timer. When both `timer.C` and `payload.stop` are ready simultaneously, Go's `select` picks a case non-deterministically. This means the loop can enter the `timer.C` case and perform an unnecessary `generateWork` call even after the payload has been resolved. Add a non-blocking check of `payload.stop` at the top of the `timer.C` case to exit immediately when the payload has already been delivered.
Return the Amsterdam instruction set from `LookupInstructionSet` when `IsAmsterdam` is true, so Amsterdam rules no longer fall through to the Osaka jump table. --------- Co-authored-by: rjl493456442 <garyrong0905@gmail.com>
…reum#33869) For bal-devnet-3 we need to update the EIP-8024 implementation to the latest spec changes: ethereum/EIPs#11306 > Note: I deleted tests not specified in the EIP bc maintaining them through EIP changes is too error prone.
Pebble maintains a batch pool to recycle the batch object. Unfortunately batch object must be explicitly returned via `batch.Close` function. This PR extends the batch interface by adding the close function and also invoke batch.Close in some critical code paths. Memory allocation must be measured before merging this change. What's more, it's an open question that whether we should apply batch.Close as much as possible in every invocation.
…eum#33593) Implements https://eips.ethereum.org/EIPS/eip-7778 --------- Co-authored-by: Gary Rong <garyrong0905@gmail.com>
Mainnet was already overriding --cache to 4096. This PR just makes this the default.
…thereum#33927) The BatchSpanProcessor queue size was incorrectly set to DefaultMaxExportBatchSize (512) instead of DefaultMaxQueueSize (2048). I noticed the issue on bloatnet when analyzing the block building traces. During a particular run, the miner was including 1000 transactions in a single block. When telemetry is enabled, the miner creates a span for each transaction added to the block. With the queue capped at 512, spans were silently dropped when production outpaced the span export, resulting in incomplete traces with orphaned spans. While this doesn't eliminate the possibility of drops under extreme load, using the correct default restores the 4x buffer between queue capacity and export batch size that the SDK was designed around.
This fixes a theoretical overflow condition if an account has an impossibly high nonce.
Removes the appveyor.yml since we moved to github runners. --------- Co-authored-by: Sina Mahmoodi <itz.s1na@gmail.com> Co-authored-by: Felix Lange <fjl@twurst.com>
Passing `--dev=false` currently still enters the dev-mode startup path because a couple of branches check whether the flag was set, not its boolean value. This switches those branches to use `ctx.Bool`, so explicit false does not start dev mode or emit a dev genesis, while `--dev` keeps its existing behavior.
Changes core.Message to use Uint256 which is faster --------- Co-authored-by: Gary Rong <garyrong0905@gmail.com>
…ereum#34827) Every tracer that implements Stop/GetResult held a `reason error` field that is written by Stop (called from the trace-timeout watchdog goroutine in api.go) and read by GetResult (called by the RPC handler main goroutine). These accesses were unsynchronized.
In the --create path, execFunc returns gasLeft as the second return value, but the rest of the code treats this value as "gas used" (printed as such, and compared in timedExec). This makes gas reporting incorrect and can cause benchmark consistency checks to fail.
This is a refactoring PR to wrap all pre/post-execution system calls as the exported functions, eliminating the duplicated system calls across the codebase. There are a few things unchanged but worths highlight: - ChainMaker is left as unchanged, a significant rewrite is required - BeaconRoot in header should be non-nil if Cancun is enabled --------- Co-authored-by: jwasinger <j-wasinger@hotmail.com>
…m#34939) Fixes the regression caught by https://hive.ethpandaops.io/#/test/generic/1778481210-e59b7465e1d04f7ed1b0200838584b16?testnumber=137. engine.AssembleBlock explicitly expects withdrawals to be non-nil for pre-Shanghai blocks as opposed to FinaliseAndAssemble which stripped off the withdrawal.
In b2843a1, metrics check len(res) == len(hashes) but res is pre-allocated with make(), so length is always equal. Partial hit metric never fires. Count non-nil elements instead. --------- Co-authored-by: Bosul Mun <bsbs8645@snu.ac.kr>
This PR introduces a separate transaction pool type for sparse blobpool. In sparse blobpool, PooledTransactions message delivers transactions without blobs, partial or full cells are downloaded by Cells message. Blobpool no longer stores transactions with complete sidecars, and it stores transactions without blobs, along with the corresponding cells. Because of this, a dedicated type distinct from types.Transaction is required. This PR introduces a type called `BlobTxForPool` and stores each sidecar field independently, in order to bypass the assumption that a sidecar always exists as a complete unit. Reintroducing the conversion queue was considered, but was ultimately omitted because type conversion should be sufficiently fast. With sparse blobpool, blob -> cell computation would take about ~13ms per blob. Not sure whether this is fast enough, but otherwise we can add the conversion queue later on the sparse blobpool branch.
1. should use !reflect.DeepEqual.
2. big.NewInt(0).SetBits([]big.Word{}) work around for DeepEqual when
big.Int is zero, unpack return a []big.Word{}.
Passing `--v2=false` currently still selects the v2 binding generator because the command checks whether the flag was set. This switches generation to use the boolean flag value, so explicit false continues to generate legacy bindings while `--v2` keeps selecting v2.
6447d7a to
5e97196
Compare
This PR introduces OnGasChangeV2 tracing hook, as the pre-requisite for landing EIP-8037. --------- Co-authored-by: Sina M <1591639+s1na@users.noreply.github.com>
This PR extends the journal to track the pre-transaction values of mutated balances, nonces, and code. At the end of the transaction, these values are used to filter out no-op changes, such as balance transitions from a-> b->a. These changes are excluded from the block-level access list. Additionally, there is a dedicated `bal.ConstructionBlockAccessList` objects for gathering the state reads and writes within the current transaction. These state writes will be keyed by the block accessList index. --------- Co-authored-by: jwasinger <j-wasinger@hotmail.com>
## Summary The `--rpc.telemetry.sample-ratio` flag declares `Value: 1.0` and `geth --help` advertises `(default: 1)`. In practice, however, omitting the flag produces a sample ratio of `0`, causing `sdktrace.TraceIDRatioBased(0)` to drop 100% of spans. Users who enable `--rpc.telemetry` see the `OpenTelemetry trace export enabled` log line and a clean startup, but no traces ever leave the process. The root cause is the interaction between two pieces of code: 1. `cmd/utils/flags.go:setOpenTelemetry` (added in ethereum#34062) only copies the flag value when `ctx.IsSet(...)` returns true: ```go if ctx.IsSet(RPCTelemetrySampleRatioFlag.Name) { tcfg.SampleRatio = ctx.Float64(RPCTelemetrySampleRatioFlag.Name) } ``` That is the right pattern for "don't clobber a config-file value with the CLI default," but it implies that something else must initialise the field when neither source sets it. 2. `node/defaults.go:DefaultConfig` never initialises `OpenTelemetry.SampleRatio`, leaving it at the float64 zero value. The result for the common CLI-only user (no TOML config) is `SampleRatio = 0` → every span is silently dropped, despite the documented default of 1. ## Change Seed `OpenTelemetry: OpenTelemetryConfig{SampleRatio: 1.0}` in `node.DefaultConfig` so the documented default matches runtime behavior and the `ctx.IsSet` guard in `setOpenTelemetry` continues to do what it was designed to do.
Adds a new CLI flag --state.size-tracking-depth to control how many recent block state sizes are tracked in memory. The default is set to 10000 blocks (previously hardcoded at 128). This allows users to tune memory usage vs historical depth based on their monitoring needs. Setting to 0 uses the default value. Memory impact per block tracked: ~292 bytes (map entry + heap entry)
…relation This change updates the state tracking mechanism to include the block hash alongside the block number and state root when calculating and publishing size statistics. This allows for more precise correlation between state size changes and specific blocks, improving observability and debugging capabilities for state growth.
count and bytes feat: send state metrics to xatu feat: change module and add network name
The state size tracer previously emitted a single signed delta per category (account, storage, contract code, account trie nodes, storage trie nodes). That representation lost information because a block-level update appeared the same as no activity once writes and deletes cancelled. The tracer now emits per-block writes and deletes counts and bytes separately, nested under top-level "writes" and "deletes" keys in the "State metrics" JSON log. An update is accounted as BOTH a write of the new value AND a delete of the prev value, so consumers can recover the net delta as (writes - deletes). The four symmetric categories use this three-arm switch (create / update / delete). Contract code remains write-only with hash dedup (reliable ref-counting would be needed for deletions); its delete counters stay 0. The mpt-depth side of the log is untouched. Consumer side: xatu's sentry-logs Vector pipeline normalises the new shape and ClickHouse derives the original delta fields via MATERIALIZED columns.
5e97196 to
f5efc52
Compare
Author
|
Closing in favour of a fresh PR built on a rebased base. The state-size infra in bump-state-size-tracker conflicted heavily with current ethereum/go-ethereum master (414 commits behind, pre-dates PR ethereum#33490 which introduced the OnStateUpdate hook our tracer depends on). The remaining useful commits from bump-state-size-tracker (configurable depth + block-hash tracking) have been integrated into a new branch on weiihann/go-ethereum and a successor PR will follow. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.