-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define hosts' public suffix and registrable domain. #391
Changes from all commits
e679f89
cbf9063
6ea048d
cc03e7b
bd35e7e
2be718c
8828de3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -272,6 +272,124 @@ for further processing. | |
U+0020 SPACE, U+0023 (#), U+0025 (%), U+002F (/), U+003A (:), U+003F (?), U+0040 (@), U+005B ([), | ||
U+005C (\), or U+005D (]). | ||
|
||
<p>A <a for=/>host</a>'s <dfn for=host export>public suffix</dfn> is the portion of a | ||
<a for=/>host</a> which is included on the <cite>Public Suffix List</cite>. To obtain | ||
<var>host</var>'s <a for=host>public suffix</a>, run these steps: [[!PSL]] | ||
|
||
<ol> | ||
<li><p>If <var>host</var> is not a <a>domain</a>, then return null. | ||
|
||
<li><p>Return the <a for=host>public suffix</a> obtained by executing the | ||
<a href="https://publicsuffix.org/list/">algorithm</a> defined by the Public Suffix List on | ||
<var>host</var>. [[!PSL]]. | ||
</ol> | ||
|
||
<p>A <a for=/>host</a>'s <dfn for=host export>registrable domain</dfn> is a <a>domain</a> formed by | ||
the most specific public suffix, along with the domain label immediately preceeding it, if any. To | ||
obtain <var>host</var>'s <a for=host>registrable domain</a>, run these steps: | ||
|
||
<ol> | ||
<li><p>If <var>host</var>'s <a for=host>public suffix</a> is null or <var>host</var>'s | ||
<a for=host>public suffix</a> <a for=host>equals</a> <var>host</var>, then return null. | ||
|
||
<li><p>Return the <a for=host>registrable domain</a> obtained by executing the | ||
<a href="https://publicsuffix.org/list/">algorithm</a> defined by the Public Suffix List on | ||
<var>host</var>. [[!PSL]] | ||
</ol> | ||
|
||
<div class=example id=example-host-psl> | ||
<table> | ||
<tr> | ||
<th>Host input | ||
<th>Public suffix | ||
<th>Registrable domain | ||
<tr> | ||
<td><code>com</code> | ||
<td><code>com</code> | ||
<td><i>null</i> | ||
<tr> | ||
<td><code>example.com</code> | ||
<td><code>com</code> | ||
<td><code>example.com</code> | ||
<tr> | ||
<td><code>www.example.com</code> | ||
<td><code>com</code> | ||
<td><code>example.com</code> | ||
<tr> | ||
<td><code>sub.www.example.com</code> | ||
<td><code>com</code> | ||
<td><code>example.com</code> | ||
<tr> | ||
<td><code>EXAMPLE.COM</code> | ||
<td><code>com</code> | ||
<td><code>example.com</code> | ||
<tr> | ||
<td><code>github.io</code> | ||
<td><code>github.io</code> | ||
<td><i>null</i> | ||
<tr> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Isn't this row duplicated? The previous one looks the same. |
||
<td><code>whatwg.github.io</code> | ||
<td><code>github.io</code> | ||
<td><code>whatwg.github.io</code> | ||
<tr> | ||
<td><code>إختبار</code> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same as above. And also applies below. |
||
<td><code>xn-kgbechtv</code> | ||
<td><i>null</i> | ||
<tr> | ||
<td><code>example.إختبار</code> | ||
<td><code>xn-kgbechtv</code> | ||
<td><code>example.xn-kgbechtv</code> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So one of the things is the PSL doesn't specify whether or not it returns U-Label or A-Label (that's left to the implementation). I'm curious the documentation here for the A-Label - is this an expectation of the contract? That is, are you trying to show that either U-Label or A-Label can be returned regardless of U-Label or A-Label input, or are you trying to state that A-Labels should be the consistent return? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Currently we don't rely on this anywhere (assuming it's consistent to be one or the other, is that at least required?), but A-label seems preferable as that'd be consistent with how the platform exposes URLs and origins overall. I suspect this will only matter if we add an API, but it really depends on whether PSL dependencies keep getting added or not. |
||
<tr> | ||
<td><code>sub.example.إختبار</code> | ||
<td><code>xn-kgbechtv</code> | ||
<td><code>example.xn-kgbechtv</code> | ||
</table> | ||
</div> | ||
|
||
<p>Two <a for=/>hosts</a>, <var>A</var> and <var>B</var> are said to be | ||
<dfn for=host export>same site</dfn> with each other if either of the following statements are true: | ||
|
||
<ul class=brief> | ||
<li><p><var>A</var> <a for=host>equals</a> <var>B</var> and <var>A</var>'s | ||
<a for=host>registrable domain</a> is non-null. | ||
|
||
<li><p><var>A</var>'s <a for=host>registrable domain</a> is <var>B</var>'s | ||
<a for=host>registrable domain</a> and is non-null. | ||
</ul> | ||
|
||
<div class=example id=example-same-site> | ||
<p>Assuming that <code>suffix.example</code> is a <a for=host>public suffix</a> and that | ||
<code>example.com</code> is not: | ||
|
||
<ul> | ||
<li><p><code>example.com</code>, <code>sub.example.com</code>, <code>other.example.com</code>, | ||
<code>sub.sub.example.com</code>, and <code>sub.other.example.com</code> are all <a>same site</a> | ||
with each other (and themselves), as their <a for=host>registrable domains</a> are | ||
<code>example.com</code>. | ||
|
||
<li><p><code>registrable.suffix.example</code>, <code>sub.registrable.suffix.example</code>, | ||
<code>other.registrable.suffix.example</code>, <code>sub.sub.registrable.suffix.example</code>, | ||
and <code>sub.other.registrable.suffix.example</code> are all <a>same site</a> with each other | ||
(and themselves), as their <a for=host>registrable domains</a> are | ||
<code>registrable.suffix.example</code>. | ||
|
||
<li><p><code>example.com</code> and <code>registrable.suffix.example</code> are not | ||
<a>same site</a> with each other, as their <a for=host>registrable domains</a> differ. | ||
|
||
<li><p><code>suffix.example</code> is not <a>same site</a> with <code>suffix.example</code>, as | ||
it is a <a for=host>public suffix</a>, and therefore has a null | ||
<a for=host>registrable domain</a>. | ||
</ul> | ||
</div> | ||
|
||
<p class=warning>Specifications should avoid depending on "<a for=host>public suffix</a>", | ||
"<a for=host>registrable domain</a>", and "<a>same site</a>". The public suffix list will diverge | ||
from client to client, and cannot be relied-upon to provide a hard security boundary. Specifications | ||
which ignore this advice are encouraged to carefully consider whether URLs' schemes ought to be | ||
incorporated into any decision made based upon whether or not two <a for=/>hosts</a> are | ||
<a>same site</a>. HTML's <a>same origin-domain</a> concept is a reasonable example of this | ||
consideration in practice. | ||
|
||
|
||
<h3 id=idna>IDNA</h3> | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a host, but input to the host parser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's helpful to point out that no matter how folks spell the URL, it's going to be normalized. Perhaps shifting this table to include a URL rather than a host would make that point, especially for the punycode bits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to just list hosts, but we should label it "host input" or some such, to not confuse it with host as a concept, which is already parsed and normalized.