enh(parser) new highlight() API #3053

joshgoebel · 2021-03-17T17:57:47Z

Resolves #2277.

Changes

Implements the new highlight(code, options = {}) API
language and ignoreIllegals are options now (vs params)
continuation is for internal use only and hence not supported any longer

Compatibility:

v10.7 supports BOTH old and new APIs fully.
v11 will support both as well (to ease migrations) but drop continuation support from the old API.
v12 will drop old API support entirely.

Checklist

Added markup tests, or they don't apply here because...
Updated the changelog at CHANGES.md

allejo

Do we want to output a deprecation message for this?

…s into new_highlight_api

paladox · 2021-04-07T23:23:35Z

I'm wondering if we could add support for continuation again? We use it for the gerrit project specifically here https://github.com/GerritCodeReview/gerrit/blob/master/polygerrit-ui/app/elements/diff/gr-syntax-layer/gr-syntax-layer.ts#L445.

joshgoebel · 2021-04-08T00:38:27Z

This has been removed for many different reasons.

Line by line driving is fundamentally broken. Grammars rules are not guaranteed to break cleanly at line boundaries - and some most definitely do not.
It was added to our documentation in 2019 that this is not intended to be used to for line-by-line highlighting (93badf7) [as soon as I found out people were doing that].
Technically it's leaking parser internals.
It's not intended to "drive the parser" (line by line or otherwise).
It is a detail of our internal sublanguage implementation - one that's likely to change in the future.

Many of the things we'd like to investigate in v11 may require deep internal changes. We might switch from an iterative parser to a completely recursive parser. There is rethinking sublanguages (#1263), a potential new back-tracking parser (#1140), and more. Just because a line has been processed may not even mean it's processed. We might hit a rule mismatch a few lines later and then rewind and try again with an entirely different matching strategy.

To allow us to iterate as freely as possible without worrying about breaking changes that means our internals need to be fully internal. As such continuation is gone... and _top becomes private API. No state in, no state out.

To do line-by-line content 100% safely and correctly one should instead post-process the final output (which can be done with ~15 lines of JS or so and a small stack). See my stackoverflow answer. There are other options with the private Emitter API, but use at your own risk. Post-processing the final text output is the easy and most stable choice and what most implementors should do.

joshgoebel · 2021-04-08T00:46:36Z

Someone should really write a line by line plugin using after:highlight that wraps this up nicely and then we could just point everyone to that.

Sample ~ implementation: https://github.com/hexojs/hexo-util/pull/246/files#diff-23bd09b492fb22d0b078517e838293bfa0031ece18b7bfba1adf706d5b9592d0R126

One could easily alter it to return an array or whatever, etc.

paladox · 2021-04-08T03:13:18Z

@joshgoebel I guess it should be:

result = this.hljs.highlight(
        baseLine,
        {
            language: this.baseLanguage,
            ignoreIllegals: true
        }
      );

result.value = result.value.split('\n').map(line => {
     const prepend = tokenStack.map(token => `<span class="${token}">`).join('');
     const matches = line.matchAll(/(<span class="(.*?)">|<\/span>)/g);
     for (const match of matches) {
       if (match[0] === '</span>') tokenStack.shift();
       else tokenStack.unshift(match[2]);
     }
     const append = '</span>'.repeat(tokenStack.length);
     return prepend + line + append;
   }).join('\n');
this.baseRanges.push(this._rangesFromString(result.value, rangesCache));

?

paladox · 2021-04-08T03:26:15Z

So this https://gerrit-review.googlesource.com/c/gerrit/+/302602 ?

joshgoebel · 2021-04-08T05:57:31Z

You tell me...? I don't completely follow larger context of what you are doing exactly or what baseLine and revisionLine are... if you want access to the individual lines quickly you probably shouldn't be rejoining but rather just leave it as an array so you could slice/splice, etc... the code was just an example of how easy it is to "fix up" the broken tags after splitting on \n.

joshgoebel · 2021-04-08T06:09:43Z

@paladox On an unrelated note: https://github.com/gerrit-review/gerrit/blob/master/polygerrit-ui/app/elements/diff/gr-syntax-layer/gr-syntax-layer.js#L349

If these are still actual issues today I'd love to see issues filed and perhaps we can fix them so you don't have to deal with them downstream.

(chore) new highlight() API

86510e3

joshgoebel requested a review from allejo March 17, 2021 17:58

joshgoebel changed the title ~~(chore) new highlight() API~~ (end) new highlight() API Mar 17, 2021

joshgoebel changed the title ~~(end) new highlight() API~~ enh(parser) new highlight() API Mar 17, 2021

update README example

329a623

joshgoebel self-assigned this Mar 18, 2021

joshgoebel added this to the 10.7 milestone Mar 18, 2021

Base automatically changed from master to main March 18, 2021 17:26

allejo approved these changes Mar 19, 2021

View reviewed changes

joshgoebel added 7 commits March 20, 2021 00:45

deprecation notice and fix the correct function name

da1063d

remove logging

b954e52

continuation may not be passed via option

887ce9c

remove from api

d46d9a7

Merge branch 'main' into new_highlight_api

6b184d2

make sure someone does not pass as 4th argument

b1a71da

Merge branch 'new_highlight_api' of github.com:joshgoebel/highlight.j…

bde065c

…s into new_highlight_api

joshgoebel merged commit bfb5a59 into highlightjs:main Mar 20, 2021

joshgoebel deleted the new_highlight_api branch April 8, 2021 06:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enh(parser) new highlight() API #3053

enh(parser) new highlight() API #3053

joshgoebel commented Mar 17, 2021 •

edited

Loading

allejo left a comment

paladox commented Apr 7, 2021

joshgoebel commented Apr 8, 2021 •

edited

Loading

joshgoebel commented Apr 8, 2021 •

edited

Loading

paladox commented Apr 8, 2021 •

edited

Loading

paladox commented Apr 8, 2021

joshgoebel commented Apr 8, 2021 •

edited

Loading

joshgoebel commented Apr 8, 2021

enh(parser) new highlight() API #3053

enh(parser) new highlight() API #3053

Conversation

joshgoebel commented Mar 17, 2021 • edited Loading

Changes

Checklist

allejo left a comment

Choose a reason for hiding this comment

paladox commented Apr 7, 2021

joshgoebel commented Apr 8, 2021 • edited Loading

joshgoebel commented Apr 8, 2021 • edited Loading

paladox commented Apr 8, 2021 • edited Loading

paladox commented Apr 8, 2021

joshgoebel commented Apr 8, 2021 • edited Loading

joshgoebel commented Apr 8, 2021

joshgoebel commented Mar 17, 2021 •

edited

Loading

joshgoebel commented Apr 8, 2021 •

edited

Loading

joshgoebel commented Apr 8, 2021 •

edited

Loading

paladox commented Apr 8, 2021 •

edited

Loading

joshgoebel commented Apr 8, 2021 •

edited

Loading