-
-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement canonical URLs and redirects (if possible) #440
Comments
It is interesting that the URLs with .html and without both resolve, because I don't see two files when I build a lesson. Is that a GitHub thing? Just noting that canonical URLs for the whole lesson came up in #481 as a building block to link episodes/chapters to the lesson in the metadata. |
I did not think about this, but yes, this is absolutely a GitHub thing and it runs into the boundaries of my knowledge of networking Take for example the beta phase preview of the lessons (deployed on AWS): https://preview.carpentries.org/instructor-training/02-practice-learning.html (works) $ curl -I https://preview.carpentries.org/instructor-training/02-practice-learning.html
HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 62331
Connection: keep-alive
Date: Thu, 29 Jun 2023 13:27:47 GMT
Last-Modified: Tue, 27 Jun 2023 00:16:47 GMT
ETag: "2fab9dad8bdfa9df0a1753d25a4bb2cf"
Server: AmazonS3
Vary: Accept-Encoding
X-Cache: Miss from cloudfront
Via: 1.1 d6cbeccd9a6d25b691d204399bf8b728.cloudfront.net (CloudFront)
X-Amz-Cf-Pop: SFO5-P2
X-Amz-Cf-Id: ftzMBTuQ4JvGacNZZ70dw3ZYTFJHJ0wyhmSI49x5uIiMKkhtgTi1ZQ==
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
Referrer-Policy: strict-origin-when-cross-origin
X-Content-Type-Options: nosniff
Strict-Transport-Security: max-age=31536000
Vary: Origin
$ curl -I https://preview.carpentries.org/instructor-training/02-practice-learning
HTTP/1.1 403 Forbidden
Connection: keep-alive
x-amz-error-code: AccessDenied
x-amz-error-message: Access Denied
Date: Thu, 29 Jun 2023 13:27:49 GMT
Server: AmazonS3
X-Cache: Error from cloudfront
Via: 1.1 94be61e339880d0097634de6934f7710.cloudfront.net (CloudFront)
X-Amz-Cf-Pop: SFO5-P2
X-Amz-Cf-Id: zaSAKzoVYtmJPsIR2xmodwiUhDMtAhDlC5bzSMo8ixBR6iiWLPaDaA==
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
Referrer-Policy: strict-origin-when-cross-origin
X-Content-Type-Options: nosniff
Strict-Transport-Security: max-age=31536000
Vary: Origin When I look at the pages on GitHub, there is no difference between the pages; not even a redirect: $ curl -I https://carpentries.github.io/sandpaper-docs/episodes.html
HTTP/1.1 200 OK
Connection: keep-alive
Content-Length: 90902
Server: GitHub.com
Content-Type: text/html; charset=utf-8
permissions-policy: interest-cohort=()
Last-Modified: Tue, 27 Jun 2023 00:26:06 GMT
Access-Control-Allow-Origin: *
ETag: "649a2c9e-16316"
expires: Thu, 29 Jun 2023 13:36:04 GMT
Cache-Control: max-age=600
x-proxy-cache: MISS
X-GitHub-Request-Id: 6B52:9B94:500FB1:5EF38F:649D866C
Accept-Ranges: bytes
Date: Thu, 29 Jun 2023 13:28:41 GMT
Via: 1.1 varnish
Age: 157
X-Served-By: cache-pdx12332-PDX
X-Cache: HIT
X-Cache-Hits: 1
X-Timer: S1688045321.324659,VS0,VE1
Vary: Accept-Encoding
X-Fastly-Request-ID: 9a03fb665121bdd4a2d53f703447089dce0becdc
$ curl -I https://carpentries.github.io/sandpaper-docs/episodes
HTTP/1.1 200 OK
Connection: keep-alive
Content-Length: 90902
Server: GitHub.com
Content-Type: text/html; charset=utf-8
permissions-policy: interest-cohort=()
Last-Modified: Tue, 27 Jun 2023 00:26:06 GMT
Access-Control-Allow-Origin: *
ETag: "649a2c9e-16316"
expires: Thu, 29 Jun 2023 13:35:58 GMT
Cache-Control: max-age=600
x-proxy-cache: MISS
X-GitHub-Request-Id: 5FE4:84EB:50548F:5F37A8:649D8665
Accept-Ranges: bytes
Date: Thu, 29 Jun 2023 13:28:44 GMT
Via: 1.1 varnish
Age: 166
X-Served-By: cache-pdx12331-PDX
X-Cache: HIT
X-Cache-Hits: 1
X-Timer: S1688045324.075636,VS0,VE1
Vary: Accept-Encoding
X-Fastly-Request-ID: ff6c4cda35fb5b4299f55d7f949a79ecaad846f3 |
Thanks for doing this research. I feel that The Workbench should not rely on this GitHub feature and use the .html URLs as canonical. I noticed the variants while working on carpentries/lesson-development-training#209. To signal which URL is canonical, you could (or perhaps should) use RFC 6596. |
Initially brought up in #43, but never actually moved beyond discussion are the idea of canonical URLs.
Basically, if someone wants to visit https://carpentries.github.io/sandpaper-docs/episodes.html, they can do so with two links:
but if they use https://carpentries.github.io/sandpaper-docs/episodes/, or https://carpentries.github.io/sandpaper-docs/episodes/index.html then they get a 404.
The reason for this is because the first two links point to a file, but the last two links point to a folder and analytics will see all of them as different unless we establish a canonical URL.
{pkgdown} has implemented redirects, but I am not sure how they will work for this because we want a redirect that exists inside of a folder with the same name as the file.
The text was updated successfully, but these errors were encountered: