l2geth: Sync from Backend Queue#2296
Conversation
To facilitate failover recovery without causing a chainsplit.
🦋 Changeset detectedLatest commit: f061820 The changes in this PR will be included in the next version bump. This PR includes changesets to release 2 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
Codecov Report
@@ Coverage Diff @@
## develop #2296 +/- ##
========================================
Coverage 80.14% 80.14%
========================================
Files 77 77
Lines 2458 2458
Branches 450 450
========================================
Hits 1970 1970
Misses 488 488
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report at Codecov.
|
| if err := s.txLogger.Publish(s.ctx, encodedTx.Bytes()); err != nil { | ||
| pubTxDropCounter.Inc(1) | ||
| log.Error("Failed to publish transaction to log", "msg", err) | ||
| return fmt.Errorf("internal error: transaction logging failed") |
There was a problem hiding this comment.
I don't love the idea of returning an error here because that means we tightly couple googles uptime with our ability to accept transactions. Not really sure of a good solution
There was a problem hiding this comment.
Yeah, I don't like quite like it either but it's the best option. With the way infra is setup right now, I don't think this will worsen failure rate of transaction submissions. i.e. the sequencer is more likely to fail than google pubsub. And going by actual historical incidents, even less likely to fail.
There was a problem hiding this comment.
My biggest concern with this approach is the additional latency overhead used per tx for logging. The median time to publish a transaction is about 35ms. Not bad right now, but it's something to watch out if tx rate increases via gas fee reductions or by having more users.
There was a problem hiding this comment.
If we use this logic in two places as per my other comment, we should wrap this in a helper method in rcfg/pub.
|
I think you'll need to add similar TX logging functionality to the RPC handler for |
| p.topic.ResumePublish(messageOrderingKey) | ||
| result := p.topic.Publish(ctx, &pmsg) |
There was a problem hiding this comment.
good catch. Publish by itself is but not with the ResumePublish prior call.
| if err := s.txLogger.Publish(s.ctx, encodedTx.Bytes()); err != nil { | ||
| pubTxDropCounter.Inc(1) | ||
| log.Error("Failed to publish transaction to log", "msg", err) | ||
| return fmt.Errorf("internal error: transaction logging failed") |
There was a problem hiding this comment.
If we use this logic in two places as per my other comment, we should wrap this in a helper method in rcfg/pub.
@mslipper eth_sendRawTransaction goes through the same code path. So we only need to log txs in one location. |
The BackendQueue can be used by verifiers to sync transactions from Google PubSub
| return s.applyTransaction(tx) | ||
| } | ||
|
|
||
| type QueuedTransactionMeta struct { |
There was a problem hiding this comment.
Removed a bunch of required fields from types.Transaction. The subscriber will assert that the necessary fields are set.
| func (q QueueOrigin) MarshalJSON() ([]byte, error) { | ||
| switch q { | ||
| case QueueOriginSequencer: | ||
| return []byte("\"sequencer\""), nil |
There was a problem hiding this comment.
Could you get rid of the need to the escapes by using a backtick string?
There was a problem hiding this comment.
certainly. thanks for pointing that out.
| return | ||
| } | ||
|
|
||
| if err := s.applyTransactionToTip(&tx); err != nil { |
There was a problem hiding this comment.
Do we have an ordering guarantee? And what about a delivery guarantee?
There was a problem hiding this comment.
Yeah. An ordering is enforced by PubSub via external configuration. The transaction ordering relates to the block that contains it. Transactions publishing itself by a Sequencer is ordered - since a SendTx fails if we're unable to log it in PubSub.
Delivery guarantees are also handled by PubSub via external configuration. On infra, published messages will be available for at least several days. Which is enough time for a failover. Messages may be delivered more than once, so we check the db to see if we've already applied the transaction. If so, we acknowledge the message (advance the subscription queue offset) and move on.
There was a problem hiding this comment.
We can assert that infra is setup correctly in code and the PubSub subscription has message ordering enabled. But this requires extra GCP permissions than needed to subscribe to messages.
| // requirement of the remote server being up. | ||
| if service.enable { | ||
| // If we're syncing from the Queue, then we can skip all this and rely on L2 published transactions | ||
| if service.enable && service.backend != BackendQueue { |
There was a problem hiding this comment.
Yay or nay? Side effect of this is that IsSyncing always returns false when we're syncing from the Queue. IsSyncing is only used to prevent transactions from being submitted to the node. But only verifiers will use the BackendQueue, so I don't think this would be an issue.
There was a problem hiding this comment.
I think this is OK - I don't expect any replica operators to use BackendQueue.
|
Approval is conditioned on fixing the |
| tx := mockTx() | ||
| tx.GetMeta().RawTransaction = nil | ||
| tests := map[string][]byte{ | ||
| //"good txmeta": []byte("{\"l1BlockNumber\":0,\"l1Timestamp\":1647549225,\"l1MessageSender\":\"0x1487ef4dd5b0ca7610b85964371c1d8ab7c468eb\",\"queueOrigin\":\"sequencer\",\"index\":0,\"queueIndex\":0,\"rawTransaction\":\"34CAgJQrz3UmBr9M0373farCJhgNfaGiVICCAACAgIA=\"}"), |
There was a problem hiding this comment.
If it's uncommented then the test case fails. It's not meant to be tested. The comment is only a reference transaction log I used to construct the other failure scenarios.
Description