Change robots to deny by default with allowlist by aduth · Pull Request #10292 · 18F/identity-idp

aduth · 2024-03-22T18:13:54Z

🛠 Summary of changes

Updates robots.txt to disallow all by default, with few exceptions. Also removes redundant robots <meta> tag.

Related Slack discussion: https://gsa-tts.slack.com/archives/C0NGESUN5/p1711054927329839

Why?

The previous robots.txt was incomplete and allowed crawling beyond what we would expect to be allowed
Since there are very few pages we expect to be crawlable, it's easier to list them explicitly
By preventing most all pages from being crawled, it's no longer necessary to also include a meta directive preventing indexing (related resource)

Open questions:

Trailing slash? Technically both /en/ and /en are valid. We tend to link to the one without the final trailing slash, and it seems reasonable to only want to allowing crawling of one and not both versions of the page.

📜 Testing Plan

Visit http://localhost:3000/robots.txt
Verify entries match expected crawlable routes

changelog: Bug Fixes, Robots, Improve consistency of robots.txt crawling directives

mitchellhenke · 2024-03-22T18:15:49Z

public/robots.txt

+Disallow: /
+Allow: /$
+Allow: /es$
+Allow: /fr$


Do we want to try to do this programmatically with https://github.com/18F/identity-idp/blob/4087aba3d76b0607fdf7f441781ff581d203ba0b/lib/idp/constants.rb#L3?

Yeah, you're probably right, we could do this with a custom route + controller and it'd probably be better, and it would also let us use URL route helpers as well.

are you suggesting making that a live file served by a Rails controller, or an ERB that we write out to public/ as a build step?

are you suggesting making that a live file served by a Rails controller, or an ERB that we write out to public/ as a build step?

I was thinking a controller, and implemented a first pass in 3ee1ae8, though I do like the idea of a static file since it wouldn't be something we expect to change. Then again, we probably don't get much traffic to this so hopefully it's not a big deal either way?

zachmargolis

LGTM but where's the robots_controller_spec.rb?

app/views/layouts/base.html.erb

See: https://github.com/18F/identity-idp/pull/10292/files#r1537583288

aduth · 2024-03-25T13:23:08Z

but where's the robots_controller_spec.rb?

I found it hiding behind the "Ready for review" button 😂

Added in 2b15d49

aduth added 3 commits March 22, 2024 14:07

Change robots to deny by default with allowlist

ea0ad3f

changelog: Bug Fixes, Robots, Improve consistency of robots.txt crawling directives

Remove redundant robots meta tag

77e7848

Update specs

e594775

mitchellhenke reviewed Mar 22, 2024

View reviewed changes

aduth marked this pull request as draft March 22, 2024 18:18

Dynamically generate robots.txt

3ee1ae8

zachmargolis approved these changes Mar 22, 2024

View reviewed changes

aduth mentioned this pull request Mar 25, 2024

Simplify session with trust check to only consider user #10290

Merged

aduth commented Mar 25, 2024

View reviewed changes

app/views/layouts/base.html.erb Show resolved Hide resolved

aduth added 2 commits March 25, 2024 09:17

Remove unused disallow_all_web_crawlers config

13394eb

See: https://github.com/18F/identity-idp/pull/10292/files#r1537583288

Add specs for RobotsController

2b15d49

aduth marked this pull request as ready for review March 25, 2024 13:23

aduth merged commit dfb19f6 into main Mar 25, 2024

aduth deleted the aduth-crawling branch March 25, 2024 14:21

zachmargolis mentioned this pull request Mar 27, 2024

Deploy RC 367 to Production #10325

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change robots to deny by default with allowlist#10292

Change robots to deny by default with allowlist#10292
aduth merged 6 commits intomainfrom
aduth-crawling

aduth commented Mar 22, 2024 •

edited

Loading

Uh oh!

mitchellhenke Mar 22, 2024

Uh oh!

aduth Mar 22, 2024

Uh oh!

zachmargolis Mar 22, 2024

Uh oh!

aduth Mar 22, 2024

Uh oh!

zachmargolis left a comment

Uh oh!

Uh oh!

aduth commented Mar 25, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

aduth commented Mar 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🛠 Summary of changes

📜 Testing Plan

Uh oh!

mitchellhenke Mar 22, 2024

Choose a reason for hiding this comment

Uh oh!

aduth Mar 22, 2024

Choose a reason for hiding this comment

Uh oh!

zachmargolis Mar 22, 2024

Choose a reason for hiding this comment

Uh oh!

aduth Mar 22, 2024

Choose a reason for hiding this comment

Uh oh!

zachmargolis left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aduth commented Mar 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aduth commented Mar 22, 2024 •

edited

Loading

aduth commented Mar 25, 2024 •

edited

Loading