Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

linkcheck: support ignored-URIs for redirects #13127

Open
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

jayaddison
Copy link
Contributor

Feature or Bugfix

  • Bugfix

Purpose

  • Server-issued HTTP redirects may refer the linkcheck client to URLs that the operator has configured as ignored using the linkcheck_ignore config setting. This changeset handles those redirections by returning an ignore result immediately, without following the redirect.

Detail

  • Override the requests.Session.get_redirect_target method so that we can raise a (Sphinx-internal) exception if we encounter a redirect to a configured ignore-URL.

Relates

# do not follow redirections that match ignored URI patterns
if resp.is_redirect:
location = resp.headers['location']
dest = urljoin(resp.url, location)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd draw attention to this line in particular as possibly-dubious.

Servers often respond with relative Location redirects; but I think we want to match against more-fully-qualified URI paths.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

sphinx/util/requests.py Outdated Show resolved Hide resolved
sphinx/util/requests.py Outdated Show resolved Hide resolved
Copy link
Contributor

@francoisfreitag francoisfreitag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great! Thanks for making this update :)

'uri': f'http://{address}/',
'info': f'ignored redirect: http://{address}/redirected',
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth adding a test verifying that when the response Location is an absolute URL going to a different server (so it overrides the urljoin base entirely), it is indeed excluded.

@francoisfreitag
Copy link
Contributor

I’m wondering if the change should be considered breaking? An intermediary redirect might match a pattern from linkcheck_ignore and the link would stop being verified silently. A bit far-fetched 🤷

@jayaddison
Copy link
Contributor Author

Thanks @francoisfreitag for the code review! I agree with adding test coverage for a cross-domain redirect case, and I'll do that soon. I'm less sure about whether we should consider it a breaking change - so far we haven't stated anything about the handling of linkcheck_ignore in relation to redirects; but I do agree the change could affect some use cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support linkcheck_ignore in link redirection
2 participants