Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interval of update loop varies and can slow down on slow backend #53

Closed
ties opened this issue Oct 31, 2021 · 3 comments
Closed

Interval of update loop varies and can slow down on slow backend #53

ties opened this issue Oct 31, 2021 · 3 comments
Assignees

Comments

@ties
Copy link
Collaborator

ties commented Oct 31, 2021

As stayrtr operator I want stayrtr to keep fetching updates if the backend system is slow or not responsive.

If I want updates every 10 minutes, and a update takes 5 minutes, I want the next update to run 10 minutes after the previous one started. Not 15 minutes after (10 minutes after the previous finished).

Context

When running stayrtr from a slow connection (4G was not cooperating) I noticed that the update loop does not have a set interval but has a set delay. If the response of SLURM or the JSON are slow the loop takes (much) longer.

Root cause

Handling slow responses is a hard problem. It ends up being a tradeoff between liveliness of the whole system or getting all information.

For example, in my rpki-client wrapped I found that some repositories were so slow that they prevented me from updating on time. I decided to add a utility to timeout/abort fetching from slow repos. There I decided finishing an update was more important than having all information.

Desired behaviour

first of all:

  • exponential backoff on errors
  • have basic metrics for http behaviour. We have part of this, but last succesful response for url/response size/duration/status code should be tracked. And some metrics can be moved: RefreshStatusCode etc could be tracked from the http util.
  • make both updates (slurm + vrp-json) asynchronous, they can be performed in parallel.

then:

  • abort connection if retrieving the response takes longer than [timelimit] to send the response
  • schedule updates at set interval: "a update happens every interval". Not "interval after the previous update finishes"
@ties
Copy link
Collaborator Author

ties commented Oct 31, 2021

It could also be that performing the update interval after the previous one finished is the desired behaviour. In that case this one can be closed (and I'll make a separate issue for the http metrics part).

@randomthingsandstuff
Copy link
Contributor

randomthingsandstuff commented Jan 31, 2022

I agree with your view on the matter and noticed this working on VRP expiry stuff.

That whole refresh/VRP expiry piece (in #15) needs to be broken out to accomplish this and test it properly. So I should be able to address this as part of that work.

When I push them, we should discuss the default timer values.

@randomthingsandstuff randomthingsandstuff self-assigned this Jan 31, 2022
benjojo added a commit that referenced this issue Jan 25, 2023
Previously if you had a very slow backend, the refresh timer for a reload
would only start after the current refresh has finished.

Now the timer will run after the timer fires for the last one.

This helps avoid the client being torpedod by very slow backends

Tag: #53
@benjojo
Copy link
Collaborator

benjojo commented Jan 26, 2023

I split two of the subpoints into their own tickets, since they are worth their own investigations for now.

But the update loop now happens consistently, even if the backend is slow.

And VRP+SLURM updates are done in parallel

@benjojo benjojo closed this as completed Jan 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants