Should the processing model care about validity? #221

annevk · 2023-03-27T12:02:15Z

It currently has

If fragment directive input is not a valid fragment directive, then return an empty list.

but can that actually happen? It seems better if we don't need these additional grammar-based validation requirements.

bokand · 2023-03-28T16:24:41Z

Hmm, are you suggesting to remove the grammar definitions and rely on the parsing and processing steps to handle errors?

That makes sense to me, though personally I'd find the grammar useful so readers can tell at a glance how to construct valid text directives (or other fragment directives); maybe keep that as a non-normative reference?

annevk · 2023-03-28T16:40:57Z

Yeah, I think a dual nature makes sense. One to guide producers and the other to guide consumers. URL and the HTML syntax also have this dual nature.

bokand · 2023-03-28T16:44:07Z

One additional point, I see that for FragmentDirective it doesn't matter since an invalid string will just be an UnknownDirective. But for TextDirectives we do use the grammar to discard invalid cases (e.g. a URL includes a fifth, invalid, term after suffix). That seems convenient to have a grammar? But I'll see if it's easy to just do algorithmically.

annevk · 2023-03-28T16:50:34Z

Usually the problem is that implementers don't use grammars in their implementation so you get subtle differences. I've found a lot of these over the years in HTTP header parsers.

They also had quite a bit of complexity when a string iterator can do the job as well.

bokand · 2023-03-28T17:10:59Z

Yeah, that's true, but then I have a bit of a meta question about spec writing :) ...I recently came across this note in parallelism:

Algorithms in standards are to be easy to understand and are not necessarily great for battery life or performance.

How do we balance this vs. writing steps that match implementation? Blink also doesn't use a grammar for this case but this seemed like the best way to convey the requirements in spec; implementors can validate however they choose but the spec wins in case of any deviations (does HTML follow something similar to C++'s as-if rule?).

Is this typically a judgement call, with preference to match implementation?

As another example, the core find a range algorithm in this spec no longer mechanically matches what Blink does (though the output should be the same). Blink rearranged the steps to enable asynchronous execution but it's much harder to understand than the spec steps. Presumably that's reasonable (and common?)

annevk · 2023-03-29T07:24:48Z

Yeah, see https://infra.spec.whatwg.org/#algorithm-conformance for a description of that rule in standards land. (I had not seen the C++ rule before today though and there may well be differences.)

In my experience implementers prefer a set of steps over a grammar. And a set of steps tends to lead towards better tests and more interoperable implementations. But yes, ultimately it's a bit of a judgment call and people might have different preferences.

Now we already have algorithms for splitting on code points and such written down so I think for the case at hand here building on that would be very straightforward.

As for "find a range", I think it works because the invocation is rather hand-wavy. Though I have not studied <a>.click() and looking at the various events across implementations to determine if there are web-exposed differences.

This commit overhauls the parsing steps to avoid using the EBNF grammar for validity, instead specifying that imperatively. It also moves parsing to happen earlier in the process so that we pass around parsed Text Directive objects. Also makes the steps more precise, referring to infra types and correctly decoding the strings. Fixes WICG#221 Fixes WICG#230

* Specify parsing imperatively This commit overhauls the parsing steps to avoid using the EBNF grammar for validity, instead specifying that imperatively. It also moves parsing to happen earlier in the process so that we pass around parsed Text Directive objects. Also makes the steps more precise, referring to infra types and correctly decoding the strings. Fixes #221 Fixes #230 * Fix and make grammar non-normative The grammar is now provided solely as a convenience so this makes the section non-normative. Also fixes it so that UnknownDirective doesn't subsume TextDirective. Fixes #220

annevk mentioned this issue Mar 28, 2023

Disambiguate UnknownDirective grammar #222

Merged

bokand added the spec issue label Mar 28, 2023

bokand mentioned this issue Dec 13, 2023

[Spec] Overhaul directive parsing #247

Merged

bokand closed this as completed in #247 Dec 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should the processing model care about validity? #221

Should the processing model care about validity? #221

annevk commented Mar 27, 2023

bokand commented Mar 28, 2023

annevk commented Mar 28, 2023

bokand commented Mar 28, 2023

annevk commented Mar 28, 2023

bokand commented Mar 28, 2023

annevk commented Mar 29, 2023

Should the processing model care about validity? #221

Should the processing model care about validity? #221

Comments

annevk commented Mar 27, 2023

bokand commented Mar 28, 2023

annevk commented Mar 28, 2023

bokand commented Mar 28, 2023

annevk commented Mar 28, 2023

bokand commented Mar 28, 2023

annevk commented Mar 29, 2023