Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DLQ for reprocessing of failed blocks #61

Merged

Conversation

adammilnesmith
Copy link
Contributor

@adammilnesmith adammilnesmith commented Jul 10, 2024

Adds a DLQ for block gaps and failed blocks to be reprocessed as per internal design doc.

Incidental bug fixes:

  • Fixes a memory leak in collectedBlocks that would have existed when SkipFailedBlocks was enabled
  • Makes an assertion conditional so it only works when we aren't skipping failed blocks. If we skip the last block that would have gone into a batch then we were panicking.
  • Fixes a rare hang during an attempted clean shutdown if the context closes when we are writing to the channel

Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @adammilnesmith and the rest of your teammates on Graphite Graphite

@adammilnesmith adammilnesmith force-pushed the 07-10-add_dlq_for_reprocessing_of_failed_blocks branch 4 times, most recently from 9fa28e3 to 623069c Compare July 11, 2024 16:01
@adammilnesmith adammilnesmith force-pushed the 07-10-add_dlq_for_reprocessing_of_failed_blocks branch from 623069c to 4401b67 Compare July 11, 2024 16:11
@@ -91,6 +91,9 @@ func (i *ingester) trySendBlockBatch(
for block, ok := collectedBlocks[nextBlockToSend]; ok; block, ok = collectedBlocks[nextBlockToSend] {
// Skip Failed block if we're configured to skip Failed blocks
if i.cfg.SkipFailedBlocks && block.Errored() {
i.log.Error("SendBlocks: RPCBlock has an error, requeueing...", "block", block.BlockNumber, "error", block.Error)
i.dlq.AddBlock(block.BlockNumber, 0)
delete(collectedBlocks, nextBlockToSend)
Copy link
Contributor Author

@adammilnesmith adammilnesmith Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bug fix: This delete should also fix a memory leak in collectedBlocks that would have existed when SkipFailedBlocks was enabled

@@ -110,7 +113,7 @@ func (i *ingester) trySendBlockBatch(

// Send the batch
lastBlockNumber := blockBatch[len(blockBatch)-1].BlockNumber
if lastBlockNumber != nextBlockToSend-1 {
if !i.cfg.SkipFailedBlocks && lastBlockNumber != nextBlockToSend-1 {
panic("unexpected last block number")
Copy link
Contributor Author

@adammilnesmith adammilnesmith Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bug fix: This assertion only works when we aren't skipping failed blocks. If we skip the last block that would have gone into a batch then we were panicking.

Copy link
Contributor

@helanto helanto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job 🥇 Mostly some questions

client/duneapi/models.go Show resolved Hide resolved
client/duneapi/client.go Show resolved Hide resolved
ingester/dlq.go Outdated Show resolved Hide resolved
ingester/dlq.go Outdated Show resolved Hide resolved
ingester/dlq.go Outdated Show resolved Hide resolved
ingester/dlq.go Outdated Show resolved Hide resolved
ingester/dlq.go Outdated Show resolved Hide resolved
ingester/dlq.go Outdated Show resolved Hide resolved
ingester/ingester.go Outdated Show resolved Hide resolved
ingester/mainloop.go Outdated Show resolved Hide resolved
@adammilnesmith adammilnesmith force-pushed the 07-10-add_dlq_for_reprocessing_of_failed_blocks branch 6 times, most recently from 9760102 to a90ff9b Compare July 12, 2024 10:21
Copy link
Contributor

@helanto helanto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! My only comment is that it doesn't make much sense to add the type parameter T in the DLQ.

lib/dlq/dlq.go Show resolved Hide resolved
@adammilnesmith adammilnesmith force-pushed the 07-10-add_dlq_for_reprocessing_of_failed_blocks branch from a90ff9b to a09b211 Compare July 12, 2024 15:15
"elapsed", time.Since(startTime),
"error", err,
)
if !i.cfg.SkipFailedBlocks {
return err
}
blocks <- models.RPCBlock{BlockNumber: blockNumber, Error: err}
select {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bug fix: this should avoid a rare hang during an attempted clean shutdown if the context closes when we are writing to the channel

@adammilnesmith adammilnesmith marked this pull request as ready for review July 12, 2024 15:32
Copy link
Contributor

@helanto helanto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LFG 🔥

ingester/mainloop.go Show resolved Hide resolved
@adammilnesmith adammilnesmith merged commit e491b73 into main Jul 12, 2024
1 check passed
@adammilnesmith adammilnesmith deleted the 07-10-add_dlq_for_reprocessing_of_failed_blocks branch July 12, 2024 15:55
@vegarsti
Copy link
Member

vegarsti commented Aug 5, 2024

Nice bug fixes!

@vegarsti
Copy link
Member

vegarsti commented Aug 5, 2024

Awesome work, well done! It's not clear to me why we need to duplicate so much of the sending code, though -- would love to hear more about that when you're back

Copy link
Contributor Author

Awesome work, well done! It's not clear to me why we need to duplicate so much of the sending code, though -- would love to hear more about that when you're back

We will be able to dedup some of this but the short version is: for the DLQ we have wrapping types and sending without batches or ordering.

@vegarsti
Copy link
Member

vegarsti commented Aug 5, 2024

Awesome work, well done! It's not clear to me why we need to duplicate so much of the sending code, though -- would love to hear more about that when you're back

We will be able to dedup some of this but the short version is: for the DLQ we have wrapping types and sending without batches or ordering.

Great, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants