Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retry loop to block handler. #823

Merged
merged 7 commits into from
Jan 20, 2022
Merged

Add retry loop to block handler. #823

merged 7 commits into from
Jan 20, 2022

Conversation

winder
Copy link
Contributor

@winder winder commented Jan 14, 2022

Summary

If the Postgres database handles are exhausted due to many pending queries, block import will fail. With the existing code that would cause Indexer to terminate.

This PR adds a retry loop to ensure the process eventually recover after this sort of transient failure.

Test Plan

New unit test.

@tolikzinovyev
Copy link
Contributor

How about creating 2 dedicated connections for block import? (AddBlock() needs 2).

@winder
Copy link
Contributor Author

winder commented Jan 14, 2022

How about creating 2 dedicated connections for block import? (AddBlock() needs 2).

I think this is still useful as a basic safety net, but your idea is even better.

Another possible issue is that Aurora replication (presumably) needs its own connection handle(s). One suspicion is this is why our mobile wallet deployments fall behind from time to time. To resolve that one we might need to make the pgx pool size configurable. Seems like ConnectConfig has plenty of options for doing that.

@tolikzinovyev
Copy link
Contributor

I'm just not sure ignoring an error is a good solution.

Not following about aurora replication.

@winder
Copy link
Contributor Author

winder commented Jan 14, 2022

I'm just not sure ignoring an error is a good solution.

It doesn't ignore the error. An error is logged, it's up to the service runner to decide whether or not to ignore errors.

Not following about aurora replication.

I can follow up on this separately.

@codecov-commenter
Copy link

codecov-commenter commented Jan 14, 2022

Codecov Report

Merging #823 (3e4bbe0) into develop (59e181f) will decrease coverage by 1.67%.
The diff coverage is 70.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #823      +/-   ##
===========================================
- Coverage    59.17%   57.49%   -1.68%     
===========================================
  Files           32       35       +3     
  Lines         4108     4327     +219     
===========================================
+ Hits          2431     2488      +57     
- Misses        1379     1539     +160     
- Partials       298      300       +2     
Impacted Files Coverage Δ
cmd/algorand-indexer/daemon.go 25.43% <70.00%> (ø)
cmd/algorand-indexer/import.go 12.50% <0.00%> (ø)
cmd/algorand-indexer/main.go 29.21% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 59e181f...3e4bbe0. Read the comment docs.

@winder winder marked this pull request as ready for review January 18, 2022 17:26
@winder
Copy link
Contributor Author

winder commented Jan 18, 2022

@tolikzinovyev I created some followup tickets to reduce the chance that this will happen:
#826
#827

shiqizng
shiqizng previously approved these changes Jan 19, 2022
Copy link
Contributor

@shiqizng shiqizng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine.

@winder winder dismissed stale reviews from shiqizng and AlgoStephenAkiki via 3e4bbe0 January 19, 2022 17:50
Copy link
Contributor

@tolikzinovyev tolikzinovyev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@winder winder merged commit 28b5e17 into develop Jan 20, 2022
@winder winder deleted the will/do-not-terminate branch January 20, 2022 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants