Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return 404 instead of 400 responses for obviously-invalid URLs #1434

Open
robertknight opened this issue Oct 7, 2024 · 1 comment
Open

Comments

@robertknight
Copy link
Member

robertknight commented Oct 7, 2024

Requests for "obviously invalid" URLs like https://via.hypothes.is/wp-admin return 400 responses instead of 404. This is inconvenient because we cannot easily filter out such responses in eg. New Relic metrics which monitor the overall error rate of the service.

We have encountered situations when a bot hits a large number of URLs like this in a short window of time, typically looking for vulnerabilities in common PHP packages. This triggered an alarm that fires when 80%+ of the service's requests are failing for a period of time (10-15 minutes).

The reason for the 400 here is that /wp-admin matches the general route for proxying websites which treats the part after the initial / as a URL, where the protocol is optional. CheckmateClient.check_url fails to parse wp-admin as a public URL and raises BadURL, which results in a 400 response.

For context, see https://hypothes-is.slack.com/archives/C074BUPEG/p1728300410941439?thread_ts=1728292002.576029&cid=C074BUPEG.

New Relic alert: https://one.newrelic.com/alerts/issue?account=1385283&duration=259200000&state=e0b2c426-026d-27ee-4aa8-b0894fb965d1

@robertknight
Copy link
Member Author

robertknight commented Oct 7, 2024

Some other options:

  • Modify alert conditions to exclude all errors of type BadURL
  • Modify alert conditions to exclude errors with status 400. This might be a problem as the 400 status is a general bad request status that is potentially used in other contexts and we do want to be notified if the volume increases

An advantage of making these requests return a 404 in Via is that it matches how other services would respond to the same scenario, where eg. /wp-admin would not match any routes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant