-
Notifications
You must be signed in to change notification settings - Fork 931
Anchor Pre-PR: unstable, #7783, #7761, #7016
#7788
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Closes: - sigp#7363 - Change default state cache size back to 128. - Make state pruning properly LRU rather than MSU after skipping the cull-exempt states.
Bump all required version numbers to `v7.0.1` to prepare for the next release.
* Add additional mergify rules to automate triaging. * Update mergify config.
* Fix mergify infinite loop. * Update rule for `ready-for-review` label. * More fix to prevent infinite loop
… - to reduce noise and avoid updating stale PRs. (sigp#7494)
…igp#7505) Update `engine_getBlobsV2` response type to `Option<Vec<BlobsAndProofV2>>`. See recent spec change [here](ethereum/execution-apis#630). Added some tests to cover basic fetch blob scenarios.
…umns (sigp#7493) sigp#7461 and partly sigp#6439. Desired behaviour after receiving `engine_getBlobs` response: 1. Gossip verify the blobs and proofs, but don't mark them as observed yet. This is because not all blobs are published immediately (due to staggered publishing). If we mark them as observed and not publish them, we could end up blocking the gossip propagation. 2. Blobs are marked as observed _either_ when: * They are received from gossip and forwarded to the network . * They are published by the node. Current behaviour: - ❗ We only gossip verify `engine_getBlobsV1` responses, but not `engine_getBlobsV2` responses (PeerDAS). - ❗ After importing EL blobs AND before they're published, if the same blobs arrive via gossip, they will get re-processed, which may result in a re-import. 1. Perform gossip verification on data columns computed from EL `getBlobsV2` response. We currently only do this for `getBlobsV1` to prevent importing blobs with invalid proofs into the `DataAvailabilityChecker`, this should be done on V2 responses too. 2. Add additional gossip verification to make sure we don't re-process a ~~blob~~ or data column that was imported via the EL `getBlobs` but not yet "seen" on the gossip network. If an "unobserved" gossip blob is found in the availability cache, then we know it has passed verification so we can immediately propagate the `ACCEPT` result and forward it to the network, but without re-processing it. **UPDATE:** I've left blobs out for the second change mentioned above, as the likelihood and impact is very slow and we haven't seen it enough, but under PeerDAS this issue is a regular occurrence and we do see the same block getting imported many times.
Partly addresses: - sigp#7379 Handle attestation validation errors from `get_attesting_indices` to prevent an error log, downscore the peer, and reject the message.
…at's passing CI. Remove noisy comments.
sigp#6875 - Enabled the linter in rate-limiter and fixed errors. - Changed the type of `Quota::max_tokens` from `u64` to `NonZeroU64` because `max_tokens` cannot be zero. - Added a test to ensure that a large value for `tokens`, which causes an overflow, is handled properly.
Update mergify to prevent incorrectly firing rule to update `ready-for-review` label
…igp#7332) Remove deprecated database migrations prior to v22 along with v22 migration specific code.
sigp#7466 Expanded the margin from 100ms to 500ms.
…to import (sigp#7533) Addresses a regression recently introduced when we started gossip verifying data columns from EL blobs ``` failures: network_beacon_processor::tests::accept_processed_gossip_data_columns_without_import test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 90 filtered out; finished in 16.60s stderr ─── thread 'network_beacon_processor::tests::accept_processed_gossip_data_columns_without_import' panicked at beacon_node/network/src/network_beacon_processor/tests.rs:829:10: should put data columns into availability cache: Unexpected("empty columns") note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace ``` https://github.com/sigp/lighthouse/actions/runs/15309278812/job/43082341868?pr=7521 If an empty `Vec` is passed to the DA checker, it causes an unexpected error. This PR addresses it by not passing an empty `Vec` for processing, and not spawning a task to publish.
The simulator works but always emits the following message: ``` $ cargo run --release --bin simulator basic-sim ... ... Failed to initialize dependency logging: attempted to set a logger after the logging system was already initialized ... ... ``` This PR removes the initialization with `env_logger`. (Update) With sigp#7433 merged, the libp2p/discv5 logs are saved in separate files and respect the `RUST_LOG` env var for log level configuration.
Add `console-subscriber` feature for debugging tokio async tasks. Supersedes sigp#7420 to work with `unstable`. Usage: - Build Lighthouse with `RUSTFLAGS=--cfg tokio_unstable` and `--features console-subscriber`, e.g.: ``` RUSTFLAGS=-"-cfg=tokio_unstable --remap-path-prefix=$(pwd)=." FEATURES=console-subscriber make ``` - Run the Lighthouse binary. - Install `tokio-console` and run it in a terminal.
Added Assertoor tests to the local-testnet CI. - The assertoor logs are included in the `logs-local-testnet` that is uploaded to GitHub Artifacts. - Use `start_local_testnet.sh` so that we can also easily run the test locally.
) Getting this error on a non-PeerDAS network: ``` May 29 13:30:13.484 ERROR Error fetching or processing blobs from EL error: BlobProcessingError(AvailabilityCheck(Unexpected("empty blobs"))), block_root: 0x98aa3927056d453614fefbc79eb1f9865666d1f119d0e8aa9e6f4d02aa9395d9 ``` It appears we're passing an empty `Vec` to DA checker, because all blobs were already seen on gossip and filtered out, this causes a `AvailabilityCheckError::Unexpected("empty blobs")`. I've added equivalent unit tests for `getBlobsV1` to cover all the scenarios we test in `getBlobsV2`. This would have caught the bug if I had added it earlier. It also caught another bug which could trigger duplicate block import. Thanks Santito for reporting this! 🙏
[EIP-7892: Blob Parameter Only Hardforks](https://eips.ethereum.org/EIPS/eip-7892) sigp#7467
The `console-subscriber` feature was added in sigp#7529. However, the names of the running tasks are blank: <img width="780" alt="image" src="https://github.com/user-attachments/assets/73332a03-20c6-43ba-b810-3d0a898bb236" /> Set the task name using `tokio::task::Builder`, which is availble when the `tokio_unstable` is enabled. <img width="924" alt="image" src="https://github.com/user-attachments/assets/26bdac1a-348b-4f83-84b0-adfd2ba3a8cb" />
Lighthouse currently requires checkpoint sync to be performed against a supernode in a PeerDAS network, as only supernodes can serve blobs. This PR lifts that requirement, enabling Lighthouse to checkpoint sync from either a fullnode or a supernode (See sigp#6837 (comment)) Missing data columns for the checkpoint block isn't a big issue, but we should be able to easily implement backfill once we have the logic to backfill data columns.
Update to EF tests v1.6.0-alpha.0
sigp#7518 breaks the key generation process in `validator_manager/test_vectors/generate.py` after updating to `ethstaker-deposit-cli`. This PR updates the key generation process, tested and successfully generated the deposit data JSON files.
This PR adds the following sync tests to CI workflow - triggered when a PR is labeled `syncing` - to ensure we have some e2e coverage on basic sync scenarios: - [x] checkpoint sync to a live network (covers range and backfill sync for _current_ fork) - [x] checkpoint sync to a running devnet (covers range and backfill sync for _next_ fork) It seems to work fine running on github hosted runners - but if performance become an issue we could switch to using self hosted runners for sepolia sync test. (standard CPU runners have 4 CPU, 16 GB ram - i think it _should_ be enough on sepolia / devnet networks) The following tests have been **removed** from this PR and moved to a separate issue *(sigp#7550) - [x] genesis sync on a local devnet (covers current and next fork) - [x] brief shutdown and restart (covers lookup sync) - [x] longer shutdown and restart (covers range sync) I'm hoping to keep these e2e test maintenance effort to a minimum - hopefully longer term we could have some generic e2e tests that works for all clients and the maintenance effort can be spread across teams. ### Latest test run: https://github.com/sigp/lighthouse/actions/runs/15411744248 ### Results: <img width="687" alt="image" src="https://github.com/user-attachments/assets/c7178291-7b39-4f3b-a339-d3715eb16081" /> <img width="693" alt="image" src="https://github.com/user-attachments/assets/a8fc3520-296c-4baf-ae1e-1e887e660a3c" /> #### logs are available as artifacts: <img width="629" alt="image" src="https://github.com/user-attachments/assets/3c0e1cd7-9c94-4d0c-be62-5e45179ab8f3" />
Issue discovered on PeerDAS devnet (node `lighthouse-geth-2.peerdas-devnet-5.ethpandaops.io`). Summary: - A lookup is created for block root `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12` - That block or a parent is faulty and `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12` is added to the failed chains cache - We later receive a block that is a child of a child of `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12` - We create a lookup, which attempts to process the child of `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12` and hit a processor error `UnknownParent`, hitting this line https://github.com/sigp/lighthouse/blob/bf955c7543dac8911a6f6c334b5b3ca4ef728d9c/beacon_node/network/src/sync/block_lookups/mod.rs#L686-L688 `search_parent_of_child` does not create a parent lookup because the parent root is in the failed chain cache. However, we have **already** marked the child as awaiting the parent. This results in an inconsistent state of lookup sync, as there's a lookup awaiting a parent that doesn't exist. Now we have a lookup (the child of `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12`) that is awaiting a parent lookup that doesn't exist: hence stuck. ### Impact This bug can affect Mainnet as well as PeerDAS devnets. This bug may stall lookup sync for a few minutes (up to `LOOKUP_MAX_DURATION_STUCK_SECS = 15 min`) until the stuck prune routine deletes it. By that time the root will be cleared from the failed chain cache and sync should succeed. During that time the user will see a lot of `WARN` logs when attempting to add each peer to the inconsistent lookup. We may also sync the block through range sync if we fall behind by more than 2 epochs. We may also create the parent lookup successfully after the failed cache clears and complete the child lookup. This bug is triggered if: - We have a lookup that fails and its root is added to the failed chain cache (much more likely to happen in PeerDAS networks) - We receive a block that builds on a child of the block added to the failed chain cache Ensure that we never create (or leave existing) a lookup that references a non-existing parent. I added `must_use` lints to the functions that create lookups. To fix the specific bug we must recursively drop the child lookup if the parent is not created. So if `search_parent_of_child` returns `false` now return `LookupRequestError::Failed` instead of `LookupResult::Pending`. As a bonus I have a added more logging and reason strings to the errors
Update `c-kzg` from `v1` to `v2`. My motivation here is that `alloy-consensus` now uses `c-kzg` in `v2` and this results in a conflict when using lighthouse in combination with latest alloy. I tried also to disable the `czkg` feature in alloy, but the conflict persisted. See here for the alloy update to `c-kzg v2`: alloy-rs/alloy#2240 Error: ``` error: failed to select a version for `c-kzg`. ... versions that meet the requirements `^1` are: 1.0.3, 1.0.2, 1.0.0 the package `c-kzg` links to the native library `ckzg`, but it conflicts with a previous package which links to `ckzg` as well: package `c-kzg v2.1.0` ... which satisfies dependency `c-kzg = "^2.1"` of package `alloy-consensus v0.13.0` ... which satisfies dependency `alloy-consensus = "^0.13.0"` of package ... ... ``` - Upgrade `alloy-consensus` to `0.14.0` and disable all default features - Upgrade `c-kzg` to `v2.1.0` - Upgrade `alloy-primitives` to `1.0.0` - Adapt the code to the new API `c-kzg` - There is now `NO_PRECOMPUTE` as my understand from https://github.com/ethereum/c-kzg-4844/pull/545/files we should use `0` here as `new_from_trusted_setup_no_precomp` does not precomp. But maybe it is misleading. For all other places I used `RECOMMENDED_PRECOMP_WIDTH` because `8` is matching the recommendation. - `BYTES_PER_G1_POINT` and `BYTES_PER_G2_POINT` are no longer public in `c-kzg` - I adapted two tests that checking for the `Attestation` bitfield size. But I could not pinpoint to what has changed and why now 8 bytes less. I would be happy about any hint, and if this is correct. I found related a PR here: sigp#6915 - Use same fields names, in json, as well as `c-kzg` and `rust_eth_kzg` for `g1_monomial`, `g1_lagrange`, and `g2_monomial`
Reintroduce the `--logfile` flag with a deprecation warning so that it doesn't prevent nodes from starting. This is considered preferable to breaking node startups so that users fix the flag, even though it means the `--logfile` flag is completely ineffective. The flag was initially removed in: - sigp#6339
…ve inbound requests (sigp#7663) Lighthouse is currently loggign a lot errors in the `RPC` behaviour whenever a response is received for a request_id that no longer exists in active_inbound_requests. This is likely due to a data race or timing issue (e.g., the peer disconnecting before the response is handled). This PR addresses that by removing the error logging from the RPC layer. Instead, RPC::send_response now simply returns an Err, shifting the responsibility to the main service. The main service can then determine whether the peer is still connected and only log an error if the peer remains connected. Thanks @ackintosh for helping debug!
The current data column KZG verification buckets are not giving us useful info as the upper bound is too low. And we see most of numbers above 70ms for batch verification, and we don't know how much time it really takes. This PR improves the buckets based on the numbers we got from testing. Exponential bucket seems like a good candidate here given we're expecting to increase blob count with a similar approach (possibly 2x each fork if it goes well).
Closes sigp#7467. This PR primarily addresses [the P2P changes](ethereum/EIPs#9840) in [fusaka-devnet-2](https://fusaka-devnet-2.ethpandaops.io/). Specifically: * [the new `nfd` parameter added to the `ENR`](ethereum/EIPs#9840) * [the modified `compute_fork_digest()` changes for every BPO fork](ethereum/EIPs#9840) 90% of this PR was absolutely hacked together as fast as possible during the Berlinterop as fast as I could while running between Glamsterdam debates. Luckily, it seems to work. But I was unable to be as careful in avoiding bugs as I usually am. I've cleaned up the things *I remember* wanting to come back and have a closer look at. But still working on this. Progress: * [x] get it working on `fusaka-devnet-2` * [ ] [*optional* disconnect from peers with incorrect `nfd` at the fork boundary](ethereum/consensus-specs#4407) - Can be addressed in a future PR if necessary * [x] first pass clean-up * [x] fix up all the broken tests * [x] final self-review * [x] more thorough review from people more familiar with affected code
Which issue # does this PR address? Closes sigp#7604 Improvements to range sync including: 1. Contain column requests only to peers that are part of the SyncingChain 2. Attribute the fault to the correct peer and downscore them if they don't return the data columns for the request 3. Improve sync performance by retrying only the failed columns from other peers instead of failing the entire batch 4. Uses the earliest_available_slot to make requests to peers that claim to have the epoch. Note: if no earliest_available_slot info is available, fallback to using previous logic i.e. assume peer has everything backfilled upto WS checkpoint/da boundary Tested this on fusaka-devnet-2 with a full node and supernode and the recovering logic seems to works well. Also tested this a little on mainnet. Need to do more testing and possibly add some unit tests.
…ok (sigp#7679) The main change is adding a guide to partially reconstruct historic states to the FAQ. Other changes: - Update the database scheme info - Delete the Homebrew issue as it has been solved in Homebrew/homebrew-core#225877 - Update default gas limit in: [7cbf7f1](sigp@7cbf7f1) - Updated the binary installation page [8076ca7](sigp@8076ca7) as Lighthouse now supports aarch-apple binary built since v7.1.0
N/A Lighthouse BN http endpoint would return a server error pre-genesis on the `validator/duties/attester` and `validator/prepare_beacon_proposer` because `slot_clock.now()` would return a `None` pre-genesis. The prysm VC depends on the endpoints pre-genesis and was having issues interoping with the lighthouse bn because of this reason. The proposer duties endpoint explicitly handles the pre-genesis case here https://github.com/sigp/lighthouse/blob/538067f1ff9840d44e3c2ea60581e18aba8c4143/beacon_node/http_api/src/proposer_duties.rs#L23-L28 I see no reason why we can't make the other endpoints more flexible to work pre-genesis. This PR handles the pre-genesis case on the attester and prepare_beacon_proposer endpoints as well. Thanks for raising @james-prysm.
Closes sigp#6855 Add PeerDAS broadcast validation tests and fix a small bug where `sampling_columns_indices` is none (indicating that we've already sampled the necessary columns) and `process_gossip_data_columns` gets called
N/A During building an enr on startup, we weren't using the value in the custody context. This was resulting in the enr value getting updated when the cgc updates, the change getting persisted, but getting set back to the default on restart. This PR takes the value explicitly from the custody context.
N/A Serializes the blob_schedule in ascending order to match other clients. This is needed to keep the output of `eth/v1/config/spec` http endpoint consistent across clients. cc @barnabasbusa
Although we're working on jemalloc profiler support in sigp#7746, heaptrack seems to be producing more sensible results. This PR adds a heaptrack profile and a heaptrack feature so that we no longer need to patch the code in order to use heaptrack. This may prove complementary to jemalloc profiling, so I think there is no harm in having both.
… from the EL (sigp#7713) sigp#7700 As described in title, the EL already performs KZG verification on all blobs when they entered the mempool, so it's redundant to perform extra validation on blobs returned from the EL. This PR removes - KZG verification for both blobs and data columns during block production - KZG verification for data columns after fetch engine blobs call. I have not done this for blobs because it requires extra changes to check the observed cache, and doesn't feel like it's a worthy optimisation given the number of blobs per block. This PR does not remove KZG verification on the block publishing path yet.
sigp#7647 Introduces a new record in the blobs db `DataColumnCustodyInfo` When `DataColumnCustodyInfo` exists in the db this indicates that a recent cgc change has occurred and/or that a custody backfill sync is currently in progress (custody backfill will be added as a separate PR). When a cgc change has occurred `earliest_available_slot` will be equal to the slot at which the cgc change occured. During custody backfill sync`earliest_available_slot` should be updated incrementally as it progresses. ~~Note that if `advertise_false_custody_group_count` is enabled we do not add a `DataColumnCustodyInfo` record in the db as that would affect the status v2 response.~~ (See comment sigp#7648 (comment)) ~~If `DataColumnCustodyInfo` doesn't exist in the db this indicates that we have fulfilled our custody requirements up to the DA window.~~ (It now always exist, and the slot will be set to `None` once backfill is complete) StatusV2 now uses `DataColumnCustodyInfo` to calculate the `earliest_available_slot` if a `DataColumnCustodyInfo` record exists in the db, if it's `None`, then we return the `oldest_block_slot`.
sigp#7234 Removes the `Arc<Mutex<_>` which was used to store and manage span data and replaces it with the inbuilt `Extension` for managing span-specific data. This also avoids an `unwrap` which was used when acquiring the lock over the mutex'd span data.
Small tweak to `Delayed head block` logging to make it more representative of actual issues. Previously we used the total import delay to determine whether a block was late, but this includes the time taken for IO (and now hdiff computation) which happens _after_ the block is made attestable. This PR changes the logic to use the attestable delay (where possible) falling back to the previous value if the block doesn't have one; e.g. if it didn't meet the conditions to make it into the attestable cache.
Peer sampling has been completely removed from the spec. This PR removes our partial implementation from the codebase. ethereum/consensus-specs#4393
Which issue # does this PR address? Puts the `arbitrary` crate behind a feature flag in the `types` crate.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Another one of these Anchor Pre-PRs. No review, only merge.
Please do not squash :)