chore(kona): pull last changes from kona#18802
Merged
sebastianst merged 31 commits intodevelopfrom Jan 16, 2026
Merged
Conversation
…aled payload (op-rs/kona#3172) Adds unit tests when attributes builder returns critical / reset / temporary error while building unsealed payload Part of op-rs/kona#3070
…ona#3194) ## Description CI usage has been high, this reduces the usage by running expensive jobs only in the mq
The timeout meant brief RPC outages caused kona execution to hang forever because kona-host exited without providing the required preimage. kona-host should continue retrying for as long as it takes to retrieve the preimage.
This patch fixes an oversight in the kona-interop configuration where it accepted an optional set of L1 genesis paths. There's no good usecase to configure a host with multiple L1 genesis. As interop can only occur across chains that have identical L1. ### Testing With this change, the interop action tests now work as it's already compatible with the op-challenger: ``` export KONA_HOST_PATH=target/debug/kona-host cd tests/optimism/op-e2e/actions/interop && gotestsum -- -run TestInteropFaultProofs ``` I'll follow up with another change to reenable the interop action tests in gh ci
# Description
1. Add DNS module to SwarmBuilder to support DNS based multiaddresses at
the transport layer.
2. Fix bug in DialConnections where unable to dial error resulted in not
being able to re-dial a peer since it was never removed from the dial
list
Previously Swarm did not have DNS abilities which resulted in the
following error. After adding this peering is now working with DNS based
multi-address
```
2025-12-10T18:49:47.003184Z DEBUG gossip: Outgoing connection error: Transport([(/dns4/kona-net-0-kona-reth-f-sequencer-2-p2p.primary.infra.dev.oplabs.cloud/tcp/9003/p2p/16Uiu2HAm3e6LBYw9JK5rcyE5rCANd2ZF5i53qAoCsaEbpvJgR6Uu, MultiaddrNotSupported(/dns4/kona-net-0-kona-reth-f-sequencer-2-p2p.primary.infra.dev.oplabs.cloud/tcp/9003/p2p/16Uiu2HAm3e6LBYw9JK5rcyE5rCANd2ZF5i53qAoCsaEbpvJgR6Uu))])
```
# Bug Summary: Stuck Dial Attempts Preventing Peer Connections
### Symptoms
Kona nodes only connecting to 3 out of 8 peers in the network
Discovery successfully finding 145 peers in the routing table
Prometheus metrics showing 842+
kona_node_dial_peer_error{type="already_dialing"} errors and growing
Missing cleanup in SwarmEvent::OutgoingConnectionError handler
When a dial attempt fails asynchronously (network timeout, connection
refused, DNS resolution failure, etc.), the peer ID was never removed
from the current_dials HashSet in the connection gater. This caused the
peer to be permanently stuck in "dialing" state.
## Code Flow:
1. Node discovers peers via discv5 (145 peers found)
3. Node attempts to dial discovered peers
4. Dial attempt starts → peer added to current_dials HashSet
5. If dial succeeds → ConnectionEstablished event → peer stays in
current_dials (OK, protected by "already connected" check)
6.If dial fails → OutgoingConnectionError event → BUG: peer NOT removed
from current_dials
7.Node tries to redial the failed peer later
8. can_dial() check fails with DialError::AlreadyDialing because peer is
still in current_dials
Peer can never be retried, despite peer_redialing: 500 configuration
### Before Fix:
Only 3/8 peers connected (37.5% connectivity)
Failed peers blacklisted forever after first attempt
Network partition risk in production
Peer redial configuration (peer_redialing: 500) effectively useless
## After Fix:
Failed dial attempts can be retried according to peer_redialing config
Should achieve full mesh connectivity (7/7 peers, excluding self)
Proper network resilience against transient failures
### Discovery Process
Started investigating PMS dashboard showing 6 vs RPC showing 3 peers
Found PMS was exporting duplicate metric series (unrelated issue, fixed
with max instead of sum)
Confirmed node was actually only connected to 3 peers via RPC
(opp2p_peers, opp2p_peerStats)
Discovered discv5 was working (145 peers in table) but gossip
connections failing
Examined dial error metrics and found 842 "already_dialing" errors
Traced through connection gater and gossip driver code
Identified missing cleanup in OutgoingConnectionError event handler
### Testing Recommendations
Monitor kona_node_dial_peer_error{type="already_dialing"} - should stop
increasing
Monitor kona_node_swarm_peer_count - should increase from 3 towards 7
Check opp2p_peerStats RPC after 5-10 minutes - should show 7 connected
peers
Verify PMS dashboard shows correct peer counts with updated query
Fix action testing of the kona-interop program and re-enable that test in gh ci. fixes op-rs/kona#3010, #18613
Closes #3163 Adds a scheduled workflow to build nightly docker images for `kona-node`, `kona-host`, and `kona-supervisor`. Runs daily at 2 AM UTC, builds for both amd64 and arm64. Images are pushed to ghcr.io with `nightly` and `nightly-YYYY-MM-DD` tags. Can also be triggered manually via workflow_dispatch.
## Description Announcement for moving `kona` to the monorepo
Adds `HostError` type and migrates core modules from `anyhow` to `thiserror`. Migrated: - KV stores - Backend utils - Precompile execution - Local inputs Part of #2950 left some complicated parts, will finish in the next PR --------- Co-authored-by: einar-oplabs <oplabs@einar.io> Co-authored-by: einar-oplabs <einar@oplabs.co>
…a#3197) ## Description Add support for input DNS(es) for p2p listener ips. Useful for docker context
…3146) ## Summary - Added `shutdown_signal()` function to listen for SIGTERM and SIGINT - Spawned a task in `RollupNode::start()` that triggers the existing `CancellationToken` when OS signals are received - All actors already support graceful shutdown via CancellationToken, this just adds the missing OS signal handling ## Changes - Modified `crates/node/service/src/service/node.rs` - Cross-platform support: SIGTERM on Unix, Ctrl+C (SIGINT) on all platforms Closes #3091
Signed-off-by: Yashvardhan Kukreja <yashvardhan@oplabs.co> The help suggested to provide a path but in runtime, it expected a direct value of the jwt secret itself. The env var called KONA_NODE_L2_ENGINE_JWT_PATH is seeking a plain-text value and KONA_NODE_L2_ENGINE_AUTH is seeking a path, whereas it should be the other way around from my pov For eg:, `KONA_NODE_L2_ENGINE_JWT_PATH=/etc/kona-node/jwt-secret.txt`, doesn't work but `KONA_NODE_L2_ENGINE_JWT_PATH=sdakd123891u2390sbs` does work. Error received ``` error: invalid value '/etc/kona-node/jwt-secret.txt' for '--l2-engine-jwt-encoded <L2_ENGINE_JWT_ENCODED>': JWT key is expected to have a length of 64 digits. 29 digits key provided ``` This PR fixes that. --------- Signed-off-by: Yashvardhan Kukreja <yashvardhan@oplabs.co>
op-rs/kona#3147) Closes #3127 Previously, batch validation failures could only be diagnosed by parsing log messages, which is fragile and not programmatically testable. This change enables: - Precise programmatic testing of batch validation logic - Better error messages with structured drop reasons - Improved debugging experience for node operators ## Changes - `BatchDropReason` enum with variants for all drop scenarios: - Timestamp related: `FutureTimestampHolocene`, `PastTimestampPreHolocene` - Parent/origin: `ParentHashMismatch`, `IncludedTooLate`, `EpochTooOld`, `EpochTooFarInFuture`, `EpochHashMismatch` - Sequencer drift: `TimestampBeforeL1Origin`, `SequencerDriftOverflow`, `SequencerDriftExceeded`, `SequencerDriftNotAdoptedNextOrigin` - Transaction validation: `EmptyTransaction`, `DepositTransaction`, `Eip7702PreIsthmus`, `NonEmptyTransitionBlock` - Span batch specific: `SpanBatchPreDelta`, `SpanBatchNoNewBlocksPreHolocene`, `SpanBatchMisalignedTimestamp`, `SpanBatchNotOverlappedExactly` - Overlap validation: `L1OriginBeforeSafeHead`, `MissingL1Origin`, `OverlappedTxCountMismatch`, `OverlappedTxMismatch`, `L2BlockInfoExtractionFailed`, `OverlappedL1OriginMismatch`
…ow (op-rs/kona#3189) Removes unnecessary intermediate `Vec` allocation in the peer banning logic.
Fixed a copy-paste error in the OpStackEnr decoder where the error message for version decoding incorrectly said "could not decode chain id" instead of "could not decode version". Small fix but makes debugging easier when ENR parsing fails.
…p-rs/kona#3206) This PR consolidates the `EngineActor` inbound channels into a single channel. See [this design doc](https://github.com/ethereum-optimism/design-docs/blob/main/protocol/kona-node-actor-simplification.md#3-and-4-actor-dependencies-are-unclear) for more info on motivation. This PR also: - Creates clients for all callers of `EngineActor` making interactions simple, more generic, safe (e.g. DerivationActor's engine client does not expose functions to perform block building), and mockable - Makes dependent actors generic over their engine client to decouple implementation and make mocking easy Closes #3220 Closes #3221
# Summary This pull request improves the reliability and observability of the secret key loading and generation logic in `crates/utilities/cli/src/secrets.rs`. The main changes add more robust error handling and logging, making it easier to diagnose problems related to secret key files. # Motivation When running kona-node in Kubernetes, users may encounter a new peer ID on every pod restart without any indication of what went wrong. This typically happens when: The secret key file contains trailing \r\n or whitespace (common with ConfigMaps) No --p2p.priv.path or --p2p.priv.raw is configured Previously, these failures were silently swallowed and an ephemeral keypair was generated. Users had no way to know their configuration was broke # Key improvements: **Error handling and diagnostics:** * Added detailed error logging when decoding a secret key from file fails, including the file path, error details, and content length. This helps identify issues like trailing whitespace or invalid characters in the secret key file. * Added error logging when failing to access the secret key file, capturing the file path and error message for easier troubleshooting. **Success logging and observability:** * Added info-level logs when a P2P keypair is successfully loaded from a file or generated and saved, including the file path and peer ID. This provides visibility into key management operations. [[1]](diffhunk://#diff-dd17c74f0a1e7da8e226ddac381785c824cdc251539708daa0247fa699471fa4L29-R47) [[2]](diffhunk://#diff-dd17c74f0a1e7da8e226ddac381785c824cdc251539708daa0247fa699471fa4L39-R78) **Minor code improvements:** * Trimmed whitespace from secret key file contents before decoding to prevent common user errors. * Minor refactor to ensure consistent use of references and variable names. ### Previous Behavior with mounting a key-pair ``` ╭─ ~/workspace/ethereum-optimism/k8s master *10 !15 ?1 32s ○ oplabs-dev-client-secondary/op-mainnet-kona-geth-a-rpc-1 16:14:47 ╰─❯ k logs kona-node-0 | head -n 300 | grep local_peer 2026-01-08T21:14:49.296588Z INFO libp2p_swarm: local_peer_id=16Uiu2HAmBRR5NXam4nhJt7jCuadCs7E1UMhNhfGvBbanwdYkQ63d ╭─ ~/workspace/ethereum-optimism/k8s master *10 !15 ?1 ○ oplabs-dev-client-secondary/op-mainnet-kona-geth-a-rpc-1 16:15:36 ╰─❯ k delete pod kona-node-0 pod "kona-node-0" deleted ╭─ ~/workspace/ethereum-optimism/k8s master *10 !15 ?1 ○ oplabs-dev-client-secondary/op-mainnet-kona-geth-a-rpc-1 16:16:17 ╰─❯ k logs kona-node-0 | head -n 300 | grep local_peer 2026-01-08T21:16:14.397652Z INFO libp2p_swarm: local_peer_id=16Uiu2HAmJvtkQMZBce6bY4KPr4akERUYtiB8ExaKWyktPosBPi9v ╭─ ~/workspace/ethereum-optimism/k8s master *10 !15 ?1 ✔ PIPE|0|0 ○ oplabs-dev-client-secondary/op-mainnet-kona-geth-a-rpc-1 16:16:30 ╰─❯ k delete pod kona-node-0 pod "kona-node-0" deleted % ╭─ ~/workspace/ethereum-optimism/k8s master *10 !15 ?1 32s ○ oplabs-dev-client-secondary/op-mainnet-kona-geth-a-rpc-1 16:19:40 ╰─❯ k logs kona-node-0 | head -n 300 | grep local_peer 2026-01-08T21:19:41.859204Z INFO libp2p_swarm: local_peer_id=16Uiu2HAmUPnaVBKxFAJLVFYDzvRWzb5J8nKTsqzdRW69tBNwsPVY ``` ### New Behavior ``` ╭─ ~/workspace/ethereum-optimism/k8s master *10 !15 ?1 ○ oplabs-dev-client-secondary/op-mainnet-kona-geth-a-rpc-1 16:24:12 ╰─❯ k logs kona-node-0 | head -n 300 | grep -e local_peer -e p2p 2026-01-08T21:23:14.165638Z INFO p2p::secrets: Successfully loaded P2P keypair from file path=/etc/kona-node/p2p-node-key.txt peer_id=16Uiu2HAmDmgXXZZxhPgJXDYtrSHXa4a72gzXyocymdW2Th6fHPuS 2026-01-08T21:23:14.303887Z INFO libp2p_swarm: local_peer_id=16Uiu2HAmDmgXXZZxhPgJXDYtrSHXa4a72gzXyocymdW2Th6fHPuS ╭─ ~/workspace/ethereum-optimism/k8s master *10 !15 ?1 ✔ PIPE|0|0 ○ oplabs-dev-client-secondary/op-mainnet-kona-geth-a-rpc-1 16:24:14 ╰─❯ k delete pod kona-node-0 pod "kona-node-0" deleted ╭─ ~/workspace/ethereum-optimism/k8s master *10 !15 ?1 31s ○ oplabs-dev-client-secondary/op-mainnet-kona-geth-a-rpc-1 16:26:24 ╰─❯ k logs kona-node-0 | head -n 300 | grep -e local_peer -e p2p 2026-01-08T21:26:25.965885Z INFO p2p::secrets: Successfully loaded P2P keypair from file path=/etc/kona-node/p2p-node-key.txt peer_id=16Uiu2HAmDmgXXZZxhPgJXDYtrSHXa4a72gzXyocymdW2Th6fHPuS 2026-01-08T21:26:26.091000Z INFO libp2p_swarm: local_peer_id=16Uiu2HAmDmgXXZZxhPgJXDYtrSHXa4a72gzXyocymdW2Th6fHPuS ```
## Description Fixes cargo deny checks in CI
…idates only necessary blobs, and does not fallback to sidecars endpoint (op-rs/kona#3211) Closes #3136 --------- Co-authored-by: theo <80177219+theochap@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
In preparation for the kona light CL([Design doc](https://github.com/ethereum-optimism/design-docs/blob/main/protocol/kona-node-light-cl.md)): > To cleanly separate the responsibility, the logic for selecting the finalized target is moved from the EngineActor to the DerivationActor. The EngineActor still executes finalization through FinalizeTask, but it no longer determines which block becomes finalized; it applies the finalized head selected by the DerivationActor. This change enables externally determined safe/finalized information to integrate cleanly. Wires in L1Watcher's l1_finalized info to the Derivation Actor. L2Finalizer moved from EngineActor to DerivationActor, and DerivationActor uses l1_finalized info to send finalization signal to EngineActor via ProcessFinalizedL2BlockRequest. Locally validated that the [`TestL2FinalizedSync`](https://github.com/op-rs/kona/blob/3063d64b3604fe1b7dc190f2c6f473d418831c45/tests/node/common/sync_test.go#L49) passes, removing skip. ``` ┌──────────────────┐ ┌──────────────────┐ │ L1WatcherActor │ │ L1WatcherActor │ └────────┬─────────┘ └────────┬─────────┘ │ l1_head_updates │ l1_finalized_updates (NEW) │ │ ▼ ▼ ┌────────────────────────────────────────────────────────────┐ │ DerivationActor │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ L2Finalizer │ │ │ │ awaiting_finalization: L1BlockNum -> L2BlockNum │ │ │ └──────────────────────────────────────────────────────┘ │ └────────┬───────────────────────────────────┬───────────────┘ │ ProcessDerivedL2Attributes │ ProcessFinalizedL2Block (NEW) │ │ (block_number) ▼ ▼ ┌────────────────────────────────────────────────────────────┐ │ EngineActor │ │ │ │ ConsolidateTask FinalizeTask │ └────────────────────────────────────────────────────────────┘ ``` --------- Co-authored-by: theo <80177219+theochap@users.noreply.github.com>
…nsitions (op-rs/kona#3253) Closes: #3248 This formalizes in a single place logic that was implied by a number of booleans and logic in various parts of DerivationActor. Note: if this state machine is implemented incorrectly, it will cause kona-node to fail at runtime. That said, I can confirm that the following tests have passed: - `just test-e2e-sysgo node node/reorgs large-kona-sequencer` - `just test-e2e-sysgo node node/restart large-kona-sequencer` - `just test-e2e-sysgo node node/common large-kona-sequencer` - `just acceptance-tests kona-node op-reth`
~Base branch: op-rs/kona#3242 ~May conflict with op-rs/kona#3229, op-rs/kona#3253 Implement the kona light CL([Design doc](https://github.com/ethereum-optimism/design-docs/blob/main/protocol/kona-node-light-cl.md)): - [DerivationActor - Target Determination](https://github.com/ethereum-optimism/design-docs/blob/main/protocol/kona-node-light-cl.md#derivationactor---target-determination) - [EngineActor - Fork Choice Update](https://github.com/ethereum-optimism/design-docs/blob/main/protocol/kona-node-light-cl.md#engineactor---fork-choice-update) ```mermaid flowchart TB subgraph A ["Normal Mode (Derivation)"] direction TB subgraph A0 ["Rollup Node Service"] direction TB A_Derivation["DerivationActor<br/>(L1->L2 derivation)"] A_Engine["EngineActor"] A_UnsafeSrc["Unsafe Source<br/>(P2P gossip / Sequencer)"] end A_L1[(L1 RPC)] A_EL[(Execution Layer)] A_L1 -->|L1 info| A_Derivation A_UnsafeSrc -->|unsafe| A_Engine A_Derivation -->|"safe(attr)/finalized"| A_Engine A_Engine -->|engine API| A_EL end subgraph B ["Light CL Mode"] direction TB subgraph B0 ["Rollup Node Service"] direction TB B_DerivationX[["DerivationActor<br/>(NEW: Poll external syncStatus)"]] B_Engine["EngineActor"] B_UnsafeSrc["Unsafe Source<br/>(P2P gossip / Sequencer)"] end B_L1[(L1 RPC)] B_Ext[(External CL RPC<br/>optimism_syncStatus)] B_EL[(Execution Layer)] %% Connections B_Ext -->|safe/finalized/currentL1| B_DerivationX B_L1 -->|canonical L1 check| B_DerivationX B_DerivationX -->|"safe(blockInfo)/finalized (validated)"| B_Engine B_UnsafeSrc -->|unsafe| B_Engine %% Visual indicator for disabled actor B_Engine -->|engine API| B_EL end ``` ### Testing #### Acceptance Tests Running guidelines detailed at op-rs/kona#3199: - [x] `TestFollowL2_Safe_Finalized_CurrentL1` - [x] `TestFollowL2_WithoutCLP2P` - [ ] `TestFollowL2_ReorgRecovery` (blocked by [kona: Check L2 reorg due to L1 reorg](#18676)) Injecting CurrentL1 is blocked by [kona: Revise SyncStatus CurrentL1 Selection](#18673) #### Local Sync Tests Validated with syncing op-sepolia between kona-node light CL <> sync tester, successfully finishing the initial EL sync and progress every safety levels reaching each tip. #### Devnet Tests Commit op-rs/kona@0b36fdd is baked to `us-docker.pkg.dev/oplabs-tools-artifacts/dev-images/kona-node:0b36fdd-light-cl` and deployed at `changwan-0` devnet: - As a verifier: `changwan-0-kona-geth-f-rpc-3` [[grafana]](https://optimistic.grafana.net/d/nUSlc3d4k/bedrock-networks?orgId=1&refresh=30s&from=now-1h&to=now&timezone=browser&var-network=changwan-0&var-node=$__all&var-layer=$__all&var-safety=l2_finalized&var-cluster=$__all&var-konaNodes=changwan-0-kona-geth-f-rpc-3) - As a sequencer: `changwan-0-kona-geth-f-sequencer-3` [[grafana]](https://optimistic.grafana.net/d/nUSlc3d4k/bedrock-networks?orgId=1&refresh=30s&from=now-1h&to=now&timezone=browser&var-network=changwan-0&var-node=$__all&var-layer=$__all&var-safety=l2_finalized&var-cluster=$__all&var-konaNodes=changwan-0-kona-geth-f-sequencer-3) - As a standby | leader Noticed all {unsafe, safe, finalized} head progression as a kona node light CL.
The `L1BlockInfo___` structs contains overlapping fields. This branch
factors out the non-deprecated fields e.g. `L1BlockInfoBedrockBase` and
embeds this as a field in `L1BlockInfoBedrock`:
```
pub struct L1BlockInfoBedrock {
#[serde(flatten)]
base: L1BlockInfoBedrockBase,
/// The fee overhead for L1 data. Deprecated in ecotone.
pub l1_fee_overhead: U256,
/// The fee scalar for L1 data. Deprecated in ecotone.
pub l1_fee_scalar: U256,
}
```
The purpose is reuse (think OOP inheritance) instead of repetition. As a
side-effect this increases encapsulation.
It establishes a partial order and a chain of fully embedded structs:
L1BlockInfoBedrockBase < L1BlockInfoEcotoneBase < L1BlockInfoIsthmus <
L1BlockInfoJovian
Further
L1BlockInfoBedrockBase < L1BlockInfoBedrock
and
L1BlockInfoEcotoneBase < L1BlockInfoEcotone
This is deemed necessary to get around deprecated fields in
`L1BlockInfoBedrock` and `L1BlockInfoEcotone`.
To hide the implementation details, constructors have been added and
destructuring is discouraged.
There is no single way to do this in Rust, but this is one way. A
similar way is used
[`op-alloy`](https://github.com/alloy-rs/op-alloy/blob/main/crates/rpc-types/src/transaction.rs).
Wiz Scan Summary
To detect these findings earlier in the dev lifecycle, try using Wiz Code VS Code Extension. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #18802 +/- ##
===========================================
- Coverage 76.8% 76.7% -0.2%
===========================================
Files 558 383 -175
Lines 52993 42856 -10137
===========================================
- Hits 40747 32892 -7855
+ Misses 12102 9964 -2138
+ Partials 144 0 -144
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
sebastianst
approved these changes
Jan 15, 2026
Member
sebastianst
left a comment
There was a problem hiding this comment.
Final update LGTM! Thanks 🦀
04b9ece to
290e52f
Compare
sebastianst
approved these changes
Jan 15, 2026
Member
sebastianst
left a comment
There was a problem hiding this comment.
Still looking good, will merge tomorrow morning while enabling merge commits
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Pull the kona diff until op-rs/kona@3c02e71
Command used:
First big PR was #18754