Status: CG-DRAFT
Title: Scroll To Text Fragment
-ED: wicg.github.io/ScrollToTextFragment/draftspec.html
+ED: wicg.github.io/ScrollToTextFragment/index.html
Shortname: scroll-to-text
Level: 1
Editor: Nick Burris, Google https://www.google.com, nburris@chromium.org
@@ -41,7 +41,7 @@ the receiver loses the context of the page.
This section is non-normative
-A [=valid text directive=] is specified in the fragment directive (see
+A [=text fragment directive=] is specified in the [=fragment directive=] (see
[[#the-fragment-directive]]) with the following format:
#:~:text=[prefix-,]textStart[,textEnd][,-suffix]
@@ -106,12 +106,12 @@ example" in "here is an example text".
## The Fragment Directive ## {#the-fragment-directive}
To avoid compatibility issues with usage of existing URL fragments, this spec
-introduces the fragment directive. The fragment directive is a portion
+introduces the [=fragment directive=]. The [=fragment directive=] is a portion
of the URL fragment delimited by the code sequence :~:. It is
reserved for UA instructions, such as text=, and is stripped from the URL
during loading so that author scripts can't directly interact with it.
-The fragment-directive is a mechanism for URLs to specify instructions meant
+The [=fragment directive=] is a mechanism for URLs to specify instructions meant
for the UA rather than the document. It's meant to avoid direct interaction with
author script so that future UA instructions can be added without fear of
introducing breaking changes to existing content. Potential examples could be:
@@ -123,8 +123,9 @@ To the definition of a
URL record, add:
-A URL's fragment-directive is either null or an ASCII string holding data used
-by the UA to process the resource. It is initially null
+A [[URL#concept-url|URL]]'s fragment
+directive is either null or an ASCII string holding data used by the UA to
+process the resource. It is initially null.
The fragment directive delimiter is the string ":~:", that is the
@@ -134,50 +135,55 @@ The fragment directive is the part of the URL fragment that follows
the [=fragment directive delimiter=].
- The fragment-directive is part of the URL fragment. This means it must always
- appear after a U+0023 (#) code point in a URL.
+ The [=fragment directive=] is part of the URL fragment. This means it must
+ always appear after a U+0023 (#) code point in a URL.
- To add a fragment-directive to a URL like https://example.com, a fragment
+ To add a [=fragment directive=] to a URL like https://example.com, a fragment
must first be appended to the URL: https://example.com#:~:text=foo.
Amend the
-basic URL parser steps to parse fragment directives in a URL:
-
- - In step 11 of this algorithm, amend the fragment state case:
- - In the inner switch on c, in the Otherwise case, add a step after
- step 2:
- - If c is U+003A (:) and remaining begins with the two
- consecutive code points U+007E (~) and U+003A (:), set state to
- fragment-directive state. Increase pointer by the
- length of the [=fragment directive delimiter=] minus 1.
+basic URL parser steps to parse the [=fragment directive=] in a URL:
+
+ - In step 11 of this algorithm, amend the [[URL#fragment-state|fragment
+ state]] case:
+ - In the inner switch on [[URL#c|c]], in the Otherwise case, add a step
+ after step 2:
+ - If [[URL#c|c]] is U+003A (:) and
+ remaining
+ begins with the two consecutive code points U+007E (~) and U+003A
+ (:), set state to [=fragment directive state=]. Increment
+ pointer by the length of the [=fragment directive
+ delimiter=] minus 1.
- Step 3 (now step 4 after the above change) must begin with "Otherwise,"
- - In step 11 of this algorithm, add a new fragment-directive state
+ - In step 11 of this algorithm, add a new [=fragment directive state=]
case with the following steps:
- fragment-directive state:
- - Switching on c:
+ fragment directive state:
+ - Switching on [[URL#c|c]]:
- The EOF code point: Do nothing
- U+0000 NULL: Validation error
- Otherwise:
- 1. If c is not a URL code point and not U+0025 (%), validation
- error.
- 2. If c is U+0025 (%) and remaining does not start with
- two ASCII hex digits, validation error.
- 3. UTF-8 percent encode c using the fragment percent-encode set
- and append the result to url’s fragment-directive.
+ 1. If [[URL#c|c]] is not a URL code point and not U+0025 (%),
+ validation error.
+ 2. If [[URL#c|c]] is U+0025 (%) and
+ remaining
+ does not start with two ASCII hex digits, validation error.
+ 3. UTF-8 percent encode [[URL#c|c]] using the fragment
+ percent-encode set and append the result to [=URL's fragment
+ directive=].
- These changes make a URL's fragment end at the [=fragment directive delimiter=].
- The [=fragment directive=] includes all characters that follow, but not including,
- the delimiter.
+ These changes make a URL's fragment end at the [=fragment directive
+ delimiter=]. The [=fragment directive=] includes all characters that follow,
+ but not including, the delimiter.
https://example.org/#test:~:text=foo will be parsed such that
-the fragment is the string "test" and the fragment-directive is the string
+the fragment is the string "test" and the [=fragment directive=] is the string
"text=foo".
@@ -186,10 +192,12 @@ the fragment is the string "test" and the fragment-directive is the string
Amend the URL serializer
steps by inserting a step after step 7:
-8. If the exclude fragment flag is unset and url's fragment-directive is
- non-null:
- 1. If url's fragment is null, append U+0023 (#) to output.
- 2. Append ":~:", followed by url's fragment-directive, to output.
+8. If the exclude fragment flag is unset and [=URL's fragment
+ directive=] is non-null:
+ 1. If [[URL#concept-url-fragment|url's fragment]] is null, append U+0023 (#)
+ to output.
+ 2. Append ":~:", followed by [=URL's fragment directive=], to
+ output.
### Processing the fragment directive ### {#processing-the-fragment-directive}
@@ -197,31 +205,34 @@ To the definition of
Document, add:
-Each document has an associated fragment directive.
+Each document has an associated fragment
+directive.
Amend the
create and initialize a Document object steps to store and remove the
-fragment directive from the a Document's URL.
+[=fragment directive=] from the Document's [[DOM#concept-document-url|URL]].
Replace steps 7 and 8 of this algorithm with:
7. Let url be null
-8. If request is non-null, then set url to request's
- current URL.
-9. Otherwise, set url to response's URL.
-10. Set document's fragment-directive be url's
- fragment-directive. (Note: this is stored on the document but not
- web-exposed)
-11. Set url's fragment-directive to null.
-12. Set the document's url to be url.
+8. If request is non-null, then set document's
+ [[DOM#concept-document-url|URL]] to request's
+ [[FETCH#concept-request-current-url|current URL]].
+9. Otherwise, set url to response's
+ [[FETCH#concept-response-url|URL]].
+10. Set [=Document's fragment directive=] to [=URL's fragment directive=].
+ (Note: this is stored on the document but not web-exposed)
+11. Set [=URL's fragment directive=] to null.
+12. Set document's [[DOM#concept-document-url|URL]] to be url.
### Fragment directive grammar ### {#fragment-directive-grammar}
A valid fragment directive is a sequence of characters that appears
in the [=fragment directive=] that matches the production:
-
FragmentDirective ::=
[=TextDirective=] ("&" [=TextDirective=])*
+
FragmentDirective ::=
+
[=TextDirective=] ("&" [=TextDirective=])*
@@ -231,109 +242,158 @@ multiple indicated strings in the page, but this also allows for future
directive types to be added and combined.
-A valid text directive is one such directive, that matches the
-production:
+The text fragment directive is one such [=fragment directive=] that
+enables specifying a piece of text on the page, that matches the production:
A [=TextMatchChar=] may be any
URL code point that
-is not explicitly used in the [=TextDirective=] syntax, that is "&", "-", and ",",
-which must be percent-encoded.
+is not explicitly used in the [=TextDirective=] syntax, that is "&", "-", and
+",", which must be percent-encoded.
-Care must be taken when implementing text fragment directive so that it
+Care must be taken when implementing [=text fragment directive=] so that it
cannot be used to exfiltrate information across origins. Scripts can navigate
-a page to a cross-origin URL with a text fragment directive. If a malicious
+a page to a cross-origin URL with a [=text fragment directive=]. If a malicious
actor can determine that a victim page scrolled after such a navigation, they
can infer the existence of any text on the page.
In addition, the user's privacy should be ensured even from the destination
origin. Although scripts on that page can already learn a lot about a user's
-actions, a text fragment directive can still contain sensitive information. For
-this reason, this specification provides no way for a page to extract the
+actions, a [=text fragment directive=] can still contain sensitive information.
+For this reason, this specification provides no way for a page to extract the
content of the text fragment anchor. User agents must not expose this
information to the page.
A user visiting a page listing dozens of medical conditions may have gotten
- there via a link with a text fragment directive containing a specific
+ there via a link with a [=text fragment directive=] containing a specific
condition. This information must not be shared with the page.
+### Search Timing
+
+A naive implementation of the text search algorithm could allow information
+exfiltration based on runtime duration differences between a matching and non-
+matching query. If an attacker were to find a way to synchronously navigate
+to a [=text fragment directive=]-invoking URL, they would be able to determine
+the existence of a text snippet by measuring how long the navigation call takes.
+
+
+ The restrictions in [[#should-allow-text-fragment]] should prevent this
+ specific case; in particular, the no-same-document-navigation restriction.
+ However, these restrictions are provided as multiple layers of defence.
+
+
+For this reason, the implementation must ensure the runtime of
+[[#navigating-to-text-fragment]] steps does not differ based on whether a match
+has been successfully found.
+
+This specification does not specify exactly how a UA achieves this as there are
+multiple solutions with differing tradeoffs. For example, a UA may
+continue to walk the tree even after a match is found in
+[[#find-a-target-text]]. Alternatively, it may schedule an
+asynchronous task to find and set the indicated part of the document.
+
### Should Allow Text Fragment ### {#should-allow-text-fragment}
This algorithm has input window, is user triggered and returns a
-boolean indicating whether a text fragment directive should be allowed to
+boolean indicating whether a [=text fragment directive=] should be allowed to
invoke.
1. If any of the following conditions are true, return false.
- * window's parent field is non-null.
- * window's opener field is non-null.
- * The document of the previous entry in window's browsing context's session history is equal to window's document.
-
That is, this is the result of a same document navigation
- * is user triggered is false.
+ * window's
+
+ parent field is non-null.
+ * window's
+
+ opener field is non-null.
+ * The Document of the
+ [[HTML#latest-entry|latest entry]] in window's
+ [[HTML#browsing-context|browsing context]]'s
+ [[HTML#session-history|session history]] is equal to window's
+ document.
+
+ That is, this is the result of a same document navigation
+
+ * is user triggered is false.
2. Otherwise, return true.
## Navigating to a Text Fragment ## {#navigating-to-text-fragment}
The scroll to text specification proposes an amendment to
-[[html#scroll-to-fragid]]. In summary, if a text fragment directive is present
-and a match is found in the page, the text fragment takes precedent over the
-element fragment as the indicated part of the document.
+[[html#scroll-to-fragid]]. In summary, if a [=text fragment directive=] is
+present and a match is found in the page, the text fragment takes precedent over
+the element fragment as the indicated part of the document.
Add the following steps to the beginning of the processing model for The
-indicated part of the document.
+href="https://html.spec.whatwg.org/multipage/browsing-the-web.html#the-indicated-part-of-the-document">
+The indicated part of the document.
-1. Let fragment directive be the document URL's
- fragment directive.
+1. Let fragment directive string be the document [=URL's fragment
+ directive=].
2. Let is user activated be true if the current navigation was triggered by
- user activation
-
- TODO: This might need an additional flag somewhere to track the user
- activation triggering
-
+ href="https://html.spec.whatwg.org/#triggered-by-user-activation"> triggered
+ by user activation
+
+ TODO: This might need an additional flag somewhere to track the user
+ activation triggering
+
3. If the result of [[#should-allow-text-fragment]] with the window of the
- document's browsing context and is user activated is true then:
- 1. If [[#find-a-target-text]] with fragment directive returns
- non-null, then the return value is the indicated part of the document;
- return.
+ document's browsing context and is user activated is true then:
+ 1. If [[#find-a-target-text]] with fragment directive string
+ returns non-null, then the return value is the indicated part of the
+ document; return.
### Find a target text ### {#find-a-target-text}
-To find the target text for a given string fragment directive, the
-user agent must run these steps:
-1. If fragment directive does not begin with the string "text=",
+To find the target text for a given string fragment directive input,
+the user agent must run these steps:
+1. If fragment directive input does not begin with the string "text=",
then return null.
-2. Let raw target text be the substring of fragment directive
- starting at index 5.
+2. Let raw target text be the substring of fragment directive
+ input starting at index 5.
- This is the remainder of the fragment directive following, but not
- including, the "text=" prefix.
+ This is the remainder of the fragment directive input following,
+ but not including, the "text=" prefix.
3. If raw target text is the empty string, return null.
-4. Let tokens be a list of strings that is the result of splitting the
- string raw target text on commas.
+4. Let tokens be a [[INFRA#list|list]] of strings that is the result of
+ [[INFRA#split-on-commas|splitting a string on commas]] of raw target
+ text.
5. Let prefix and suffix and textEnd be the empty
string.
@@ -344,29 +404,31 @@ user agent must run these steps:
7. If the last character of potential prefix is U+002D (-), then:
1. Set prefix to the result of removing the last character from
potential prefix.
- 2. Remove the first item of the list tokens.
+ 2. [[INFRA#list-remove|Remove]] the first item of the list tokens.
8. Let potential suffix be the last item of tokens.
9. If the first character of potential suffix is U+002D (-), then:
1. Set suffix to the result of removing the first character from
potential suffix.
- 2. Remove the last item of the list tokens.
-10. Assert: tokens has size 1 or tokens has size 2.
+ 2. [[INFRA#list-remove|Remove]] the last item of the list tokens.
+10. Assert: tokens has [[INFRA#list-size|size]] 1 or tokens
+ has [[INFRA#list-size|size]] 2.
Once the prefix and suffix are removed from tokens, tokens may either
contain one item (textStart) or two items (textStart and textEnd).
11. Let textStart be the first item of tokens.
-12. If tokens has size 2, then let textEnd be the last item of
- tokens.
+12. If tokens has [[INFRA#list-size|size]] 2, then let textEnd
+ be the last item of tokens.
The strings prefix, textStart, textEnd, and suffix now contain the
text directive parameters as defined in [[#syntax]].
13. Let walker be a
- TreeWalker equal to
- Document.createTreeWalker().
-14. Let position be a position variable that indicates a text offset in
- in walker.currentNode.innerText.
+ [[DOM#treewalker|TreeWalker]] equal to
+ [[DOM#dom-document-createtreewalker|Document.createTreeWalker()]].
+14. Let position be a [[INFRA#string-position-variable|position
+ variable]] that indicates a text offset in
+ walker.currentNode.innerText.
15. If textEnd is the empty string, then:
1. Let match position be the result of [[#find-match-with-context]]
with input walker walker, search position position,
@@ -399,9 +461,8 @@ This algorithm has input walker, search position, prefix, query, and
suffix and returns a text position that is the start of the match.
-The input walker is a
-TreeWalker reference, not
-a copy, i.e. any modifications are performed on the caller's instance of
+The input walker is a [[DOM#treewalker|TreeWalker]] reference, not a
+copy, i.e. any modifications are performed on the caller's instance of
walker.
@@ -416,14 +477,16 @@ a copy, i.e. any modifications are performed on the caller's instance of
text from search position with [=current
locale=].
2. If search position is null, then break.
- 3. Advance search position past any whitespace.
+ 3. [[INFRA#skip-ascii-whitespace|Skip ASCII whitespace]] on
+ search position.
4. If search position is at the end of text, then:
1. Perform [[#advance-walker-to-text]] on walker.
2. If walker.currentNode is null, then return null.
3. Set text to walker.currentNode.innerText.
4. Set search position to the beginning of
text.
- 5. Advance search position past any whitespace.
+ 5. [[INFRA#skip-ascii-whitespace|Skip ASCII whitespace]] on
+ search position.
5. If the result of [[#next-word-bounded-instance]] of
query in text from search position
with [=current locale=] does not start at search
@@ -438,21 +501,23 @@ a copy, i.e. any modifications are performed on the caller's instance of
instance of query.
3. If search position is null, then break.
- 4. Let potential match position be a position variable equal to
+ 4. Let potential match position be a
+ [[INFRA#string-position-variable|position variable]] equal to
search position minus the length of query.
5. If suffix is the empty string, then return potential
match position.
- 6. Advance search position past any whitespace.
+ 6. [[INFRA#skip-ascii-whitespace|Skip ASCII whitespace]] on
+ search position.
7. If search position is at the end of text, then:
- 1. Let suffix_walker be a
- TreeWalker
+ 1. Let suffix_walker be a [[DOM#treewalker|TreeWalker]]
that is a copy of walker.
2. Perform [[#advance-walker-to-text]] on suffix_walker.
3. If suffix_walker.currentNode is null, then return null.
4. Set text to
suffix_walker.currentNode.innerText.
5. Set search position to the beginning of text.
- 6. Advance search position past any whitespace.
+ 6. [[INFRA#skip-ascii-whitespace|Skip ASCII whitespace]] on
+ search position.
8. If the result of [[#next-word-bounded-instance]] of suffix
in text from search position with [=current
locale=] starts at search position, then return
@@ -466,16 +531,16 @@ of the currentNode.
### Advance a TreeWalker to the next text node ### {#advance-walker-to-text}
-The input walker is a
-TreeWalker reference, not
-a copy, i.e. any modifications are performed on the caller's instance of
+The input walker is a [[DOM#treewalker|TreeWalker]] reference, not a
+copy, i.e. any modifications are performed on the caller's instance of
walker.
1. While the input walker.currentNode is not null and
walker.currentNode is not a text node:
1. Advance the current node by calling
- walker.nextNode()
+
+ walker.nextNode()
### Find the next word bounded instance ### {#next-word-bounded-instance}
@@ -508,9 +573,9 @@ API for word boundary matching.
href="http://www.unicode.org/reports/tr29/#Word_Boundaries">Unicode
text segmentation annex. The
- Default Word Boundary Specification defines a default set of what
- constitutes a word boundary, but as the specification mentions, a
- more sophisticated algorithm should be used based on the
+ Default Word Boundary Specification defines a default set of
+ what constitutes a word boundary, but as the specification mentions,
+ a more sophisticated algorithm should be used based on the
locale.
@@ -529,8 +594,8 @@ API for word boundary matching.
## Indicating The Text Match ## {#indicating-the-text-match}
In addition to scrolling the text fragment into view as part of the Try
-To Scroll To The Fragment steps, the UA should visually indicate the
+href="https://html.spec.whatwg.org/multipage/browsing-the-web.html#try-to-scroll-to-the-fragment">
+Try To Scroll To The Fragment steps, the UA should visually indicate the
matched text in some way such that the user is made aware of the text match.
The UA should provide to the user some method of dismissing the match, such
@@ -547,8 +612,8 @@ The UA must not visually indicate any provided context terms.
## Feature Detectability ## {#feature-detectability}
For feature detectability, we propose adding a new FragmentDirective interface
-that is exposed via window.location.fragmentDirective if the UA supports the
-feature.
+that is exposed via window.location.fragmentDirective if the UA
+supports the feature.
This section contains recommendations for UAs automatically generating URLs
-with text fragment directives. These recommendations aren't normative but are
-provided to ensure generated URLs result in maximally stable and usable URLs.
+with a [=text fragment directive=]. These recommendations aren't normative but
+are provided to ensure generated URLs result in maximally stable and usable
+URLs.
-## Prefer Exact Matching To Range-based
+## Prefer Exact Matching To Range-based ## {#prefer-exact-matching-to-range-based}
The match text can be provided either as an exact string "text=foo%20bar%20baz"
or as a range "text=foo,bar".
@@ -625,14 +691,14 @@ as a range-based match.
TODO: Can we determine the above limit in some more objective way?
-## Use Context Only When Necessary
+## Use Context Only When Necessary ## {#use-context-only-when-necessary}
-Context terms allow the text fragment directive to disambiguate text snippets
-on a page. However, their use can make the URL more brittle in some cases.
-Often, the desired string will start or end at an element boundary. The context
-will therefore exist in an adjacent element. Changes to the page structure
-could invalidate the text fragment directive since the context and match text
-may no longer appear to be adjacent.
+Context terms allow the [=text fragment directive=] to disambiguate text
+snippets on a page. However, their use can make the URL more brittle in some
+cases. Often, the desired string will start or end at an element boundary. The
+context will therefore exist in an adjacent element. Changes to the page
+structure could invalidate the [=text fragment directive=] since the context and
+match text may no longer appear to be adjacent.
Suppose we wish to craft a URL for the following text:
@@ -642,7 +708,7 @@ may no longer appear to be adjacent.
<div class="content">Text to quote</div>
- We could craft the text fragment directive as follows:
+ We could craft the [=text fragment directive=] as follows:
text=HEADER-,Text%20to%20quote
@@ -666,14 +732,14 @@ true:
TODO: Determine the numeric limit above in a more objective way
-## Determine If Fragment Id Is Needed
+## Determine If Fragment Id Is Needed ## {#determine-if-fragment-id-is-needed}
-When the UA navigates to a URL containing a text fragment directive, it will
+When the UA navigates to a URL containing a [=text fragment directive=], it will
fallback to scrolling into view a regular element-id based fragment if it
exists and the text fragment isn't found.
This can be useful to provide a fallback, in case the text in the document
-changes, invalidating the text fragment directive.
+changes, invalidating the [=text fragment directive=].
Suppose we wish to craft a URL to
diff --git a/index.html b/index.html
index 9af595a..7a5c94f 100644
--- a/index.html
+++ b/index.html
@@ -1214,7 +1214,7 @@
-
+