Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve(BundleDataClient,SpokePoolClient): Log about duplicate events and getLatestProposedBundleData should rarely load data from scratch #884

Open
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

nicholaspai
Copy link
Member

@nicholaspai nicholaspai commented Feb 10, 2025

This PR will make the relayer faster when the current bundle proposal is pending but not published to Arweave and more resilient to errors thrown during bundle reconstruction from scratch.

  • We want to know when these events happen but we currently believe this is possible with the Indexed spoke pool client so we shouldn't throw. We'll log them in the SpokePoolClient
  • The relayer reconstructs bundle data to compute the latest running balances in order to choose a repayment chain. We don't want the relayer to ever have to reconstruct data from scratch, instead it should always load it from arweave. This also avoids any safety errors thrown by the BundleDataClient when it detects a duplicate event. Keeping these errors in keeps the bundle data client safe when loadDataFromScratch is called by the proposer and disputer

We want to know when these events happen but we currently believe this is possible with the Indexed spoke pool client so we shouldn't throw.
@nicholaspai nicholaspai requested review from bmzig and pxrl February 10, 2025 17:07
@nicholaspai nicholaspai changed the title improve(BundleDataClient): Log about duplicate destination chain events improve(BundleDataClient,SpokePoolClient): Log about duplicate events Feb 10, 2025
return e.transactionHash === deposit.transactionHash && e.logIndex === deposit.logIndex;
})
) {
this.logger.warn({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is ever logged, this basically means that there is a bug somewhere. I think this should be an error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might take down the fast relayer though, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the relayers are the only bots which do not stop on errors, since they error a lot when txns revert in simulation. @pxrl could probably confirm that I'm not mistaken, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we should crash here. I'd rather just ignore the event and try our best to keep the events stored in memory correct by filtering out duplicates

Copy link
Contributor

@bmzig bmzig Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, this should never trigger, though. I'd argue that if this is logged, then we introduced a bug somewhere and should look into recent commits. Right now, we know that this would occasionally log in the relayer, but if this happened somewhere else in the future, we'd probably want to look into it. Warn logs just seem to be drowned out by the many other warn logs we emit elsewhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm so if we crashed here, this would bring down the fast relayer which seems to handlese duplicate events ok today. Are we ok with that? yes it'd help us debug, but also be disprutive to production service. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the fast relayer crashes on errors, then I agree, we should not log an error and go with what you currently have. I don't think the fast relayer crashes on logged errors, though. Am I wrong?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Errors result in a page being emitted and they're very visible in Slack, but the bot should continue to run.

Warnings are also reasonably visible in Slack but don't result in anyone getting paged.

Depending on how frequently this gets emitted, it could be a blessing or a curse. It'd be nice to avoid behaviour where it triggered a lot of rapid alerts. Given that it's in the SpokePoolClient update() function, I think it'd probably alert once and then disappear, so I think error should be OK.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicholaspai nicholaspai changed the title improve(BundleDataClient,SpokePoolClient): Log about duplicate events improve(BundleDataClient,SpokePoolClient): Log about duplicate events and getLatestProposedBundleData should never load data from scratch Feb 10, 2025
@nicholaspai nicholaspai requested a review from bmzig February 10, 2025 18:01
(d) => d.transactionHash === deposit.transactionHash && d.logIndex === deposit.logIndex
)
) {
throw new Error("Duplicate deposit in bundleDeposits");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a log message here to identify the relevant deposits? That will speed up debugging time significantly if we're able to isolate them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let n = hubPoolClient.hasPendingProposal() ? 1 : 2;

// eslint-disable-next-line no-constant-condition
while (true) {
Copy link
Contributor

@pxrl pxrl Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maybe make this conditional on n < something? It'd suck to somehow end up in an infinite loop here.

Suggested change
while (true) {
while (n++ < 4) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdyt of faebdb8 where we fallback to just loading from scratch.

Comment on lines 782 to 787
if (
this.fills[fill.originChainId] !== undefined &&
this.fills[fill.originChainId].some(
(f) => f.transactionHash === fill.transactionHash && f.logIndex === fill.logIndex
)
) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prettier probably complains about this formatting, but it should be possible to drop the first check and simplify the statements.

Suggested change
if (
this.fills[fill.originChainId] !== undefined &&
this.fills[fill.originChainId].some(
(f) => f.transactionHash === fill.transactionHash && f.logIndex === fill.logIndex
)
) {
const duplicate = this.fills[fill.originChainId]?.find((f) => f.transactionHash === fill.transactionHash && f.logIndex === fill.logIndex);
if (duplicate) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicholaspai nicholaspai requested a review from pxrl February 10, 2025 21:51
@nicholaspai nicholaspai changed the title improve(BundleDataClient,SpokePoolClient): Log about duplicate events and getLatestProposedBundleData should never load data from scratch improve(BundleDataClient,SpokePoolClient): Log about duplicate events and getLatestProposedBundleData should rarely load data from scratch Feb 10, 2025
@nicholaspai
Copy link
Member Author

@bmzig @pxrl if we do experience duplicate events, I can imagine the logs would get very noisy and be distracting. WDYT about emitting one error level log per update() run? 7545cf2

@nicholaspai nicholaspai requested a review from pxrl February 11, 2025 00:11
// Check if bundle data exists on arweave, otherwise fallback to last published bundle data. If the
// first bundle block range we are trying is the pending proposal, then we'll grab the most recently
// validated bundle, otherwise we'll grab the second most recently validated bundle.
let n = hubPoolClient.hasPendingProposal() ? 1 : 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it OK to do this? This function now won't necessarily return the latest proposed bundle data, and will instead likely return an older bundle's arweave data. Wouldn't this mean that calculating the "latest" pool rebalance root isn't actually going to calculate the latest root, and instead some older one (unless the latest data has been published).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should re-name this function to getLatestProposedRootWithArweaveData because if you look at how this function is used, its used by the InventoryClient here to get the best approximation of the latest running balances in order to compute which chains are "over-allocated" here.

So, when a bundle is first proposed but doesn't have its bundle data published to arweave, there is really little negative consequence for the InventoryClient. The InventoryClient will use an outdated running balance until the arweave bundle data is published. If we assume the arweave data gets published once a challenge period, then at worst the "latest" pool rebalance root will be one bundle behind. In reality, it only lags by ~15 mins until the bundle data is very likely published to Arweave.

2b6a8dd

@nicholaspai nicholaspai requested a review from bmzig February 11, 2025 15:21
nicholaspai added a commit to across-protocol/relayer that referenced this pull request Feb 11, 2025
 Improves relayer and monitor performance by loading more data from arweave optimistically

Depends on across-protocol/sdk#884
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants