-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
retry federation requests on 5xx errors #8915
Comments
I think this is more of a thinko that we should just fix. I think we should just extend: synapse/synapse/http/matrixfederationclient.py Lines 523 to 528 in 30fba62
Though I wonder if we should not retry quite so often on 500 errors, as they should(?) be more likely due to bad request, but that isn't always true. |
a bad request should cause a 400 error :-p |
In fact I would say the very thing that distinguishes a 500 from a 400 is that a 500 says "please try the same thing again, you might have more luck" (whereas a 400 says "stop doing that please") |
Yeah, that's fair |
Hi! I would like to take up this issue. |
503 backing off for 10 minutes is also a bit harsh:
My server was happily talking to fosdem.org for a long while, but then fosdem.org was restarted which lead to a single 503 - the backoff is a bit harsh here. |
Why is the backoff a flat 10 minutes, anyways? Why not exponential backoff - from 10 seconds - that caps at around 15 minutes? |
it is exponential. It starts at 10 minutes because it assumes (incorrectly, per this issue) that by that point you've already tried a few times. See also #5406 (comment) for somewhere I've written about this in the past. |
federationhttpclient
has a bunch of logic to retry requests, but it only fires for connection failures, 429 responses, and (optionally) dns lookup errors. Why don't we retry on 5xx errors?(For example: we sometimes see 520 errors from matrix.org. Arguably that's indicative of a problem, but I think we could be robust to it.)
The text was updated successfully, but these errors were encountered: