Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some cometindex robustness improvements #5095

Merged
merged 3 commits into from
Feb 17, 2025

Conversation

cronokirby
Copy link
Contributor

@cronokirby cronokirby commented Feb 15, 2025

Describe your changes

This tweaks the indexing logic in cometindex to add a bit of extra robustness against indexing the same block twice. The current logic is to generate batches of events, which are then sent over to logical threads for each app view, the idea being to only read each batch once, avoiding duplicate database fetching work from each thread, while allowing parallel processing. One potential race condition which might have been the cause of some oddities we've seen around duplicate processing is that each thread would process the entire batch as long as the next height it needed to index was somewhere inside of that batch. In practice, if the threads were in sync, this would be fine, but if for whatever reason they got desynced (e.g. one app view crashes because there's a bug or some weird data thing, then we patch the bug and restart pindexer, or cometindex dies when some indices have committed a batch, but not others, which can totally happen because some indices are faster than other), it's possible that this might lead to some threads processing some events twice. This adds some logic to truncate the events in each thread so that only the blocks that particular thread needs are indexed.

I also did a pass over on the rest of the indexing logic, and added a comment on a particularly tricky section, justifying its correctness.

CI should be sufficient to test this. We shouldn't observe any difference in behavior.

Checklist before requesting a review

  • I have added guiding text to explain how a reviewer should test these changes.

  • If this code contains consensus-breaking changes, I have added the "consensus-breaking" label. Otherwise, I declare my belief that there are not consensus-breaking changes, for the following reason:

    indexing only

@cronokirby cronokirby changed the title Cronokirby/cometindex robustness Some cometindex robustness improvements Feb 15, 2025
@cronokirby
Copy link
Contributor Author

I also found a bug in the previous code causing events to be marked as being in the wrong block

@conorsch
Copy link
Contributor

Failing CI was due to testnet protocol mismatch; resolved by #5093, so this must be rebased for CI to pass. Doing so now.

@conorsch conorsch self-requested a review February 17, 2025 16:34
@conorsch conorsch force-pushed the cronokirby/cometindex-robustness branch from 2056b8d to 8e2035d Compare February 17, 2025 16:34
@conorsch conorsch mentioned this pull request Feb 17, 2025
2 tasks
@conorsch
Copy link
Contributor

In practice, if the threads were in sync, this would be fine, but if for whatever reason they got desynced (e.g. one app view crashes because there's a bug or some weird data thing, then we patch the bug and restart pindexer, or cometindex dies when some indices have committed a batch, but not others, which can totally happen because some indices are faster than other),

Not only am I certain I've encountered this bug before, but I even saw it again this morning when updating the mainnet indexing pipeline. Big improvement here, thanks for taking the time!

@conorsch conorsch merged commit eb79827 into main Feb 17, 2025
10 checks passed
@conorsch conorsch deleted the cronokirby/cometindex-robustness branch February 17, 2025 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants