Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should the processing model care about validity? #221

Closed
annevk opened this issue Mar 27, 2023 · 6 comments · Fixed by #247
Closed

Should the processing model care about validity? #221

annevk opened this issue Mar 27, 2023 · 6 comments · Fixed by #247

Comments

@annevk
Copy link
Collaborator

annevk commented Mar 27, 2023

It currently has

If fragment directive input is not a valid fragment directive, then return an empty list.

but can that actually happen? It seems better if we don't need these additional grammar-based validation requirements.

@bokand
Copy link
Collaborator

bokand commented Mar 28, 2023

Hmm, are you suggesting to remove the grammar definitions and rely on the parsing and processing steps to handle errors?

That makes sense to me, though personally I'd find the grammar useful so readers can tell at a glance how to construct valid text directives (or other fragment directives); maybe keep that as a non-normative reference?

@annevk
Copy link
Collaborator Author

annevk commented Mar 28, 2023

Yeah, I think a dual nature makes sense. One to guide producers and the other to guide consumers. URL and the HTML syntax also have this dual nature.

@bokand
Copy link
Collaborator

bokand commented Mar 28, 2023

One additional point, I see that for FragmentDirective it doesn't matter since an invalid string will just be an UnknownDirective. But for TextDirectives we do use the grammar to discard invalid cases (e.g. a URL includes a fifth, invalid, term after suffix). That seems convenient to have a grammar? But I'll see if it's easy to just do algorithmically.

@annevk
Copy link
Collaborator Author

annevk commented Mar 28, 2023

Usually the problem is that implementers don't use grammars in their implementation so you get subtle differences. I've found a lot of these over the years in HTTP header parsers.

They also had quite a bit of complexity when a string iterator can do the job as well.

@bokand
Copy link
Collaborator

bokand commented Mar 28, 2023

Yeah, that's true, but then I have a bit of a meta question about spec writing :) ...I recently came across this note in parallelism:

Algorithms in standards are to be easy to understand and are not necessarily great for battery life or performance.

How do we balance this vs. writing steps that match implementation? Blink also doesn't use a grammar for this case but this seemed like the best way to convey the requirements in spec; implementors can validate however they choose but the spec wins in case of any deviations (does HTML follow something similar to C++'s as-if rule?).

Is this typically a judgement call, with preference to match implementation?

As another example, the core find a range algorithm in this spec no longer mechanically matches what Blink does (though the output should be the same). Blink rearranged the steps to enable asynchronous execution but it's much harder to understand than the spec steps. Presumably that's reasonable (and common?)

@annevk
Copy link
Collaborator Author

annevk commented Mar 29, 2023

Yeah, see https://infra.spec.whatwg.org/#algorithm-conformance for a description of that rule in standards land. (I had not seen the C++ rule before today though and there may well be differences.)

In my experience implementers prefer a set of steps over a grammar. And a set of steps tends to lead towards better tests and more interoperable implementations. But yes, ultimately it's a bit of a judgment call and people might have different preferences.

Now we already have algorithms for splitting on code points and such written down so I think for the case at hand here building on that would be very straightforward.

As for "find a range", I think it works because the invocation is rather hand-wavy. Though I have not studied <a>.click() and looking at the various events across implementations to determine if there are web-exposed differences.

bokand added a commit to bokand/ScrollToTextFragment that referenced this issue Nov 30, 2023
This commit overhauls the parsing steps to avoid using the EBNF grammar
for validity, instead specifying that imperatively. It also moves
parsing to happen earlier in the process so that we pass around parsed
Text Directive objects.

Also makes the steps more precise, referring to infra types and
correctly decoding the strings.

Fixes WICG#221
Fixes WICG#230
bokand added a commit to bokand/ScrollToTextFragment that referenced this issue Dec 13, 2023
This commit overhauls the parsing steps to avoid using the EBNF grammar
for validity, instead specifying that imperatively. It also moves
parsing to happen earlier in the process so that we pass around parsed
Text Directive objects.

Also makes the steps more precise, referring to infra types and
correctly decoding the strings.

Fixes WICG#221
Fixes WICG#230
bokand added a commit that referenced this issue Dec 13, 2023
* Specify parsing imperatively

This commit overhauls the parsing steps to avoid using the EBNF grammar
for validity, instead specifying that imperatively. It also moves
parsing to happen earlier in the process so that we pass around parsed
Text Directive objects.

Also makes the steps more precise, referring to infra types and
correctly decoding the strings.

Fixes #221
Fixes #230

* Fix and make grammar non-normative

The grammar is now provided solely as a convenience so this makes the
section non-normative. Also fixes it so that UnknownDirective doesn't
subsume TextDirective.

Fixes #220
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants