Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protect against web/api 5xx responses #2258

Closed
agjohnson opened this issue Jun 7, 2016 · 4 comments
Closed

Protect against web/api 5xx responses #2258

agjohnson opened this issue Jun 7, 2016 · 4 comments
Assignees
Labels
Accepted Accepted issue on our roadmap Improvement Minor improvement to code
Milestone

Comments

@agjohnson
Copy link
Contributor

When a build is attempted, but a web/api instance throws a 5xx response, a number of strange behaviors can result:

  • Builds will get stuck in triggered state
  • Builds will fail randomly when updating the api fails
  • 5xx responses will be returned from the web servers to users

Part of the resolution to this may be defensive protection around intermittent 5xx responses. This may take some operation changes as well though, as our load balancer should really assume a 5xx response is enough to dislodge the server from the lb group.

Raised from #2255

@agjohnson agjohnson added Improvement Minor improvement to code Operations Operations or server issue labels Jun 7, 2016
@humitos
Copy link
Member

humitos commented Aug 15, 2018

The first one from the list is solved at #3312

I'm not sure to understand the resolution for this. If the request from the builder fails when hitting the API with a 5xx, what we can do?

@ericholscher
Copy link
Member

I believe we should be able to have the build HTTP calls do a smart retry backoff. eg:

  • Try call
  • If it fails, try again
  • If that fails, try again in 5s
  • If that fails, try again in 10s
  • If that fails, raise an exception

@humitos humitos added this to the Build stability milestone Nov 6, 2018
@humitos
Copy link
Member

humitos commented Nov 6, 2018

If that's the way to go, I suppose we could modify our slumber client API class to do this magic. It would be IO blocking, but I suppose that would be fine.

@agjohnson labeled as Operations so he is maybe thinking in another thing.

@agjohnson
Copy link
Contributor Author

agjohnson commented Nov 10, 2018

Yeah, my original take on this is that if a server is overloaded, we should be detecting this and limiting traffic to the server. It's probably a sign that our web servers are overloaded.

Looks like this is still an issue after Azure move: https://sentry.io/read-the-docs/readthedocs-org/issues/712687013/

Retrying is probably a good first place to be defensive. 👍 Requests might even do this natively.

@agjohnson agjohnson added Accepted Accepted issue on our roadmap and removed Operations Operations or server issue labels Nov 10, 2018
@agjohnson agjohnson modified the milestones: Build stability, 2.11 Nov 10, 2018
@humitos humitos self-assigned this Nov 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted Accepted issue on our roadmap Improvement Minor improvement to code
Projects
None yet
Development

No branches or pull requests

3 participants