Some cometindex robustness improvements #5095
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Describe your changes
This tweaks the indexing logic in cometindex to add a bit of extra robustness against indexing the same block twice. The current logic is to generate batches of events, which are then sent over to logical threads for each app view, the idea being to only read each batch once, avoiding duplicate database fetching work from each thread, while allowing parallel processing. One potential race condition which might have been the cause of some oddities we've seen around duplicate processing is that each thread would process the entire batch as long as the next height it needed to index was somewhere inside of that batch. In practice, if the threads were in sync, this would be fine, but if for whatever reason they got desynced (e.g. one app view crashes because there's a bug or some weird data thing, then we patch the bug and restart pindexer, or cometindex dies when some indices have committed a batch, but not others, which can totally happen because some indices are faster than other), it's possible that this might lead to some threads processing some events twice. This adds some logic to truncate the events in each thread so that only the blocks that particular thread needs are indexed.
I also did a pass over on the rest of the indexing logic, and added a comment on a particularly tricky section, justifying its correctness.
CI should be sufficient to test this. We shouldn't observe any difference in behavior.
Checklist before requesting a review
I have added guiding text to explain how a reviewer should test these changes.
If this code contains consensus-breaking changes, I have added the "consensus-breaking" label. Otherwise, I declare my belief that there are not consensus-breaking changes, for the following reason: