Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid getting stuck in a loop #2244

Merged
merged 1 commit into from
Sep 25, 2014
Merged

Conversation

sigmavirus24
Copy link
Contributor

This prevents a case where we make a request to URL A, which 301s to B which
would then 301 back to A. Alternatively, for less simple schemes, this will
also prevent us from getting stuck in a loop, e.g., it will prevent the
following from causing an endless loop:

    A -> B -> C -> D -> E -> F --
    ^                             \
    |                             /
    ---<------------<----------<-

Fixes #2231.

I tested this by cloning httpbin and hard coding a permanent redirect loop from /relative-redirect/1 to /relative-redirect/2 so that we could trigger this. This pull request fixes it.

This prevents a case where we make a request to URL A, which 301s to B which
would then 301 back to A. Alternatively, for less simple schemes, this will
also prevent us from getting stuck in a loop, e.g., it will prevent the
following from causing an endless loop:

    A -> B -> C -> D -> E -> F --
    ^                             \
    |                             /
    ---<------------<----------<-
@Lukasa
Copy link
Member

Lukasa commented Sep 23, 2014

This no longer gets us stuck in a loop by causing us to actually hit the URLs that we're being forcibly redirected to, starting with the end of the redirect chain.

Is this what we want? Or do we want to throw an exception, e.g. RedirectLoopError?

@sigmavirus24
Copy link
Contributor Author

I don't think we should be adding a new exception for this. This will preserve the previous behavior of having hotting a Max Redirects exception thrown. It's no worse than the behavior we had before the redirect cache was introduced.

Sent from my Android device with K-9 Mail. Please excuse my brevity.

@sigmavirus24
Copy link
Contributor Author

But yes, I suspect we could subclass RedirectLoopError from TooManyRedirects and raise that so as to save the calls to the network that would otherwise cause us to exceed the max number of retries. I'm not opposed to that. I'd like to hear other opinions though from @fcosantos and @RuudBurger (and anyone else that has one).

@kennethreitz thoughts on saving people from hitting the network more than necessary when we can detect an endless redirect loop?

@sigmavirus24
Copy link
Contributor Author

I should also have mentioned that this saves us from loops like this too:

    A -> B -> C -> D -> E -> F --
        ^                         \
        |                         /
        ------------<----------<-

Or any other case that essentially results in an endless loop

@RuudBurger
Copy link

The RedirectLoopError seems like a good idea.

In my case, the url works (and redirects properly) in the browser. So maybe there is another issue on how the cache works.
I can provide you with a test url, if you have an email address I can send it to. As it is a url containing an API key.

@sigmavirus24
Copy link
Contributor Author

@RuudBurger both @Lukasa and I have our emails available on our GitHub profiles. If you'd rather use PGP, you can find my PGP key (and associated email address) by searching https://pgp.mit.edu/ for my real name.

@Lukasa
Copy link
Member

Lukasa commented Sep 23, 2014

I see absolutely no reason why permanent redirects in the redirect cache should not count against the max redirects limit. They are still redirects, we're just not hitting the wire to do them. That behaviour will also fix our infinite redirects problem.

@fcosantos
Copy link

In my opinion an exception needs to be raised and RedirectLoopError is always going to be a TooManyRedirects at the end, if you want to subclass it is fine but maybe unnecessary.

I like @sigmavirus24 set() solution, maybe you want to pack it a bit more:

checked_urls = set()
while request.url in redirect_cache:
    checked_urls.add(request.url)
    request.url = self.redirect_cache.get(request.url)
    if request.url in checked_urls:
        raise *whatever*

@sigmavirus24
Copy link
Contributor Author

RedirectLoopError is always going to be a TooManyRedirects at the end

@fcosantos I don't quite understand what you mean. Could you clarify this for me?

@Lukasa
Copy link
Member

Lukasa commented Sep 24, 2014

I think the key point is that there's no need for a RedirectLoopError, we can just have TooManyRedirects. I still think permanent redirects should count against the redirect limit though, with a loop being a special case where we immediately know that we'll hit TooManyRedirects.

@sigmavirus24
Copy link
Contributor Author

@Lukasa I can think of a couple ways to make it influence the max_redirects count.

  1. Move the logic into Session#resolve_redirects. This unfortunately would mean actually using the network for the first (possibly permanent) redirect.
  2. Keep count along with the set of how many times we go through that loop and add a new optional parameter to Session#resolve_redirects along the lines of redirects_already_followed=0. We can then initialize i in Session#resolve_redirects with that and that will affect the max number of redirects possible (including using the cache).

Option 2 seems most practical, but I just loathe adding more arguments (that could confuse a user) to a public API like this.

Also, I think there's value in using a subclass of TooManyRedirects. I can see an instance where this might cause confusion because of a case like @RuudBurger has. In a browser, it might very well work just fine, but because of oddities in the usage of requests the loop is caused by the redirect cache. I think providing users a way to disambiguate where the error is actually coming from is very useful (and a better experience).

@Lukasa
Copy link
Member

Lukasa commented Sep 24, 2014

I wonder if my concern about the redirect cache leading to reproducible behaviour is just my specific problem. I just realised that the other thing this redirect cache changes is the behaviour of Response.history, which is now not guaranteed to be the same for each request (the redirect cache doesn't populate it).

@sigmavirus24
Copy link
Contributor Author

@Lukasa good point. I'm not sure we can actually reconstruct the history accurately unless we also cache responses in the redirect cache. In other words, we'd have a cache something like:

{'http://example.com/': ('http://www.example.com', <Response [301]>)}

And a big problem with that would be timestamps in headers and such (e.g., cookies). All of this which makes me wonder exactly how good an idea it is to keep the redirect cache around.

@Lukasa
Copy link
Member

Lukasa commented Sep 24, 2014

I don't know that we should necessarily throw out the redirect cache, but we should at the very least document the hell out of how it is going to behave.

kennethreitz added a commit that referenced this pull request Sep 25, 2014
Avoid getting stuck in a loop
@kennethreitz kennethreitz merged commit de76fbe into psf:master Sep 25, 2014
@kennethreitz
Copy link
Contributor

+1

@sigmavirus24
Copy link
Contributor Author

Uh... this wasn't exactly ready to merge. (Not that it breaks anything, but we were discussing alternative solutions.)

cazzerson referenced this pull request in cazzerson/social-feed-manager Oct 3, 2014
@sigmavirus24 sigmavirus24 deleted the bug/2231 branch June 6, 2016 20:33
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Endless loop in sessions.py
5 participants