Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event delivery retry stop working after some time #1667

Open
achiarenza opened this issue Jun 16, 2023 · 4 comments
Open

Event delivery retry stop working after some time #1667

achiarenza opened this issue Jun 16, 2023 · 4 comments

Comments

@achiarenza
Copy link

I'm experiencing a strange behaviour when a delivery attempt fails.

Convoy correctly handle the retry mechanism (exponential backoff) that is setted up for the endpoint but it seems that the job, after some retry operation, stop to work and the last scheduled attempt is never picked up. The result is a retry event with "next attempt" date time that is in the past.

image

I'm currently using Convoy v23.06.1 but the same happened with the previous version.

@jirevwe
Copy link
Collaborator

jirevwe commented Jun 16, 2023

Hey @achiarenza 👋🏿

Hmm, this might be a bug with the exponential backoff. I'll take a look at it.

Can you please help me with the steps to reproduce this?

@achiarenza
Copy link
Author

achiarenza commented Jun 16, 2023

Hello @jirevwe, for sure!

I'm currently using the docker compose file the repo provide to spin up Convoy, inside a Ubuntu 20.04.6 box.

Docker is version 23.0.5, build bc4487a.

Project settings are configured as you can see in the screenshot:
image

All the other configuration values are left as default.

In my tries to have the issue fixed I tried to scale up the docker worker instance to a number grater than one with docker compose up --scale worker=2 -d but the problem persisted.

Let me know if you need some other info.

@jirevwe
Copy link
Collaborator

jirevwe commented Jun 16, 2023

Thanks for the info,

The exponential back-off strategy uses the values from table below which go from 10secs to 15mins. All subsequent retries after the 7th retry will be about 15 mins apart.

10000  // 10 seconds
30000  // 30 seconds
60000  // 1 minute
180000 // 3 minutes
300000 // 5 minutes
600000 // 10 minutes
900000 // 15 minutes

This might make a 20 retry limit strategy take about 3 hours to reach the failure state. Can you please share the worker logs, so I can debug further?

In the meantime, can you re-test it with a smaller retry limit (about 5 to 10) because I can't seem to reproduce this.

@achiarenza
Copy link
Author

The full docker log: worker.log with some info redacted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants