Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update code to work with new embed syntax #124

Merged
merged 18 commits into from
Dec 9, 2017
Merged

Update code to work with new embed syntax #124

merged 18 commits into from
Dec 9, 2017

Conversation

Keats
Copy link
Contributor

@Keats Keats commented Nov 14, 2017

Fix #120

Not optimal at all since I don't know how much we care about perf in the parsing and I chose the way that (I think) would need the least code changes.

I'm skipping the Latex syntax as a recent commit broke the use with Syntect (Doesn't work with sublimehq/Packages@666b869#diff-40f615a2e7ecb9138b1bc5826bed9074R1522), cc @keith-hall if you know what's wrong, the YAML syntax on that commit looks a bit odd to me.

Also added pretty_assertions for dev deps, I can remove it if you want but it's really useful to have differences in assert_eq! be diffed and highlighted.

@keith-hall
Copy link
Collaborator

I'm quite surprised the LaTeX PR with that ugly commit got merged, there were some other PRs on the Packages repo with similar hard to read style that were reviewed and the authors of those were asked to change it to a more common, readable format. I don't know YAML syntax very well, but I believe the author of the commit in question was being lazy and didn't want to create a named context, so used that hacky workaround instead - in my experience, ST otherwise doesn't allow pushing an array containing an anonymous context and a named context at once, so this hack tricks ST into accepting it.

I'm guessing that LaTeX.sublime-syntax fails to parse in syntect with ParseSyntaxError::TypeMismatch, in which case we may unfortunately have to modify syntect's yaml_load to handle such odd YAML.


About your actual solution, it's pretty similar to what I was thinking about, nice one! (my Rust-foo wasn't good enough for me to get a PR in before you :) )
However, I can foresee one small problem at the moment, and that is when the escape pattern consumes characters and there are multiple contexts to pop out of - at the moment, I think it will end up only popping out of one.

The solution, I believe, will be to push a context similar to this, whereby the with_prototype will just use a lookahead on the escape pattern to pop out of all the contexts:

- match: {{escape}}
  captures:
    {{escape_captures}}
  pop: true
- match: ''
  push:
    - meta_content_scope: {{embed_scope}}
    - include: {{embed}}
  with_prototype:
    - match: (?={{escape}})
      pop: true

My test syntax for this was going to be something like:

%YAML 1.2
---
name: Embed_Escape Used by tests in src/parsing/parser.rs
scope: source.embed-test
contexts:
  main:
    - match: '"'
      scope: meta.attribute-with-value.style.html string.quoted.double punctuation.definition.string.begin.html
      embed: embedded_context
      embed_scope: meta.attribute-with-value.style.html source.css
      escape: '"'
      escape_captures:
        0: meta.attribute-with-value.style.html string.quoted.double punctuation.definition.string.end.html
    - match: '(>)\s*'
      captures:
        1: meta.tag.style.begin.html punctuation.definition.tag.end.html
      embed: embedded_context
      embed_scope: source.css.embedded.html
      escape: (?i)(?=</style)
    - match: foobar
      scope: top-level.test

  embedded_context:
    - match: a
      push: # prove that multiple context levels can be "escape"d
        - match: b
          push:
            - match: c
              push:
                - match: 'test'
                  scope: test.embedded

with the following test case in parsing/parser.rs:

    #[test]
    fn can_parse_issue120() {
        let ps = SyntaxSet::load_from_folder("testdata/Packages").unwrap();
        let syntax = ps.find_syntax_by_name("Embed_Escape Used by tests in src/parsing/parser.rs").unwrap();

        let line1 = "\"abctest\" foobar";
        let expect1 = [
            "<meta.attribute-with-value.style.html>, <string.quoted.double>, <punctuation.definition.string.begin.html>",
            "<meta.attribute-with-value.style.html>, <source.css>",
            "<meta.attribute-with-value.style.html>, <string.quoted.double>, <punctuation.definition.string.end.html>",
            "<source.css.embedded.html>, <test.embedded>",
            "<top-level.test>",
        ];
        expect_scope_stacks_with_syntax(&line1, &expect1, syntax.to_owned());

        let line2 = ">abctest</style>foobar";
        let expect2 = [
            "<meta.tag.style.begin.html>, <punctuation.definition.tag.end.html>",
            "<source.css.embedded.html>, <test.embedded>",
            "<top-level.test>",
        ];
        expect_scope_stacks_with_syntax(&line2, &expect2, syntax.to_owned());
    }

One other thought I had, is that, in ST, embed without escape is an error, so maybe we want the same semantics in syntect too? (ParseSyntaxError::MissingMandatoryKey("escape"))

@keith-hall
Copy link
Collaborator

Actually, thinking about this more, it should probably take this sort of pattern instead, to take into account the backrefs properly:

%YAML 1.2
---
# See http://www.sublimetext.com/docs/3/syntax.html
scope: source.example-embed
contexts:
  main:
    # embed style (from)
    - match: (`+)(javascript|js)\s*$
      captures:
        1: markup.raw.code-fence.javascript.markdown punctuation.definition.raw.code-fence.begin.markdown
        2: markup.raw.code-fence.javascript.markdown constant.other.language-name.markdown
      embed: scope:source.js
      embed_scope: markup.raw.code-fence.javascript.markdown source.js
      escape: '^[ ]{0,3}(\1)\s*$'
      escape_captures:
        0: markup.raw.code-fence.javascript.markdown
        1: punctuation.definition.raw.code-fence.end.markdown

    # with_prototype style (to)
    - match: (`+)(javascript|js)\s*$
      captures:
        1: markup.raw.code-fence.javascript.markdown punctuation.definition.raw.code-fence.begin.markdown
        2: markup.raw.code-fence.javascript.markdown constant.other.language-name.markdown
      push: [pop_js, embed_js]
      with_prototype:
        - match: (?=^[ ]{0,3}(\1)\s*)
          pop: true

  pop_js:
    - match: '^[ ]{0,3}(\1)\s*'
      captures:
        0: markup.raw.code-fence.javascript.markdown
        1: punctuation.definition.raw.code-fence.end.markdown
      pop: true

  embed_js:
    - meta_content_scope: markup.raw.code-fence.javascript.markdown source.js
    - include: scope:source.js

Although obviously we probably have the advantage of not needing to name contexts from syntect when creating contexts inline.

@Keats
Copy link
Contributor Author

Keats commented Nov 15, 2017

Does the testcase need to change as well?

@keith-hall
Copy link
Collaborator

Good question, I believe we could get away with just changing two lines in my earlier sublime-syntax example - from - match: '"' to - match: (") and from escape: '"' to escape: '\1'

@est31 est31 mentioned this pull request Nov 19, 2017
@trishume
Copy link
Owner

Thanks for doing this! I'm pretty busy until December so I'm not going to casually review this before its done, but when you think it's ready let me know and I'll review it.

I'll probably end up bundling #125, this PR, and maybe one or two other minor breaking changes into a 2.0.0 release early December.

@Keats
Copy link
Contributor Author

Keats commented Nov 20, 2017

@keith-hall is helping with the sublime syntax related issues he mentioned

@Keats Keats changed the title [wip] Update code to work with new embed syntax Update code to work with new embed syntax Nov 30, 2017
@Keats
Copy link
Contributor Author

Keats commented Nov 30, 2017

Thanks to @keith-hall work, it's almost there. Some tests in syntest failing still but it's mostly ok otherwise

@@ -451,8 +451,26 @@ impl ParseState {
MatchOperation::None => return false,
};
for (i, r) in ctx_refs.iter().enumerate() {
let proto = if i == ctx_refs.len() - 1 {
pat.with_prototype.clone()
let proto = if i == ctx_refs.len() - 1 { // https://forum.sublimetext.com/t/dev-build-3111/19240/17
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have time to fully review this yet (I will soon), but I just want to mention that this big deeply nested block looks scary. The link to the forum post is definitely helpful, but if there's some way you could make this less scary that would be great, or at least comment it more heavily. Maybe explain how it is implementing the behaviour described in that post in terms of syntect's data structures.

...as the contexts named aren't shown, and instead the entire match patterns etc. are output, the debug info from the parse state is too large to be useful, so I removed this aspect of it
@trishume trishume mentioned this pull request Dec 1, 2017
4 tasks
@Keats
Copy link
Contributor Author

Keats commented Dec 8, 2017

@trishume I believe this is ready for review, unless @keith-hall wants to add his fix for #127 in that PR

Copy link
Owner

@trishume trishume left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preliminary review: looks good except for the with_prototype capture handling.

But don't fix that yet since I'm going to take a hard look at how with_prototypes work and try and figure out a better way. If I seem to have forgotten about my commitment to doing this though feel free to ping me.

@@ -33,6 +33,7 @@ serde_json = "1.0"
rayon = "0.8.0"
regex = "0.2.1"
getopts = "0.2"
pretty_assertions = "0.4.0"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve 👍

@@ -621,6 +665,118 @@ mod tests {
}
}

#[test]
fn can_parse_embed_as_with_prototypes() {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this test. Very clearly shows what the rewrite does.

1: meta.tag.style.begin.html punctuation.definition.tag.end.html
push:
- [{ match: '(?i)(?=</style)', pop: true }]
- [{ meta_content_scope: 'source.css.embedded.html'}, { match: '', pop: true }]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My knowledge of Sublime syntaxes is quite rusty (heh), but what's the purpose of this context? Why is the meta_content_scope not just in the scope above?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may just be a leftover from how I originally thought it would work, and I didn't spot it when I was cleaning up...

- [{ meta_content_scope: 'source.css.embedded.html'}, { match: '', pop: true }]
- scope:source.css
with_prototype:
- match: (?=(?i)(?=</style))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The double-lookahead does exactly the same as the single lookahead right? It's just that the escape regex might not be a lookahead and in order to emulate it correctly we have to ensure this one is a lookahead?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct 👍

// even have had any capture groups which would previously have caused
// a panic)
if match_pat.has_captures {
match_pat.regex = Some(match_pat.compile_with_refs(regions, line));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this doesn't work properly if there's recursion that allows the same with_context to be pushed twice on the stack, when the top one is popped it will still have the old captures compiled into it. This may never happen in practice with a with_prototype that has captures but it's possible in principle.

I'm also looking at this code again and thinking that I might have really screwed up the implementation of with_prototype and that I can redo it completely to be cleaner.

Don't try and fix this yet, I want to take a crack at simplifying a bunch of the related code, which may just outdate whatever fix you try.

@@ -2,7 +2,7 @@
<%
Class TestClass2 Public Sub TestSub () Response.Write("wow") End Sub End Class
'^^^^^ meta.class.asp meta.class.identifier.asp storage.type.asp
' ^ meta.class.asp meta.class.body.asp meta.class.asp meta.class.identifier.asp
' ^ meta.class.asp meta.class.identifier.asp
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I copied this from the Sublime syntax test, did the Sublime one change?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trishume
Copy link
Owner

trishume commented Dec 9, 2017

I just posted https://github.com/Keats/syntect/pull/2 which fixes the issue with the captures logic while also folding it into the general case.

Once you merge that into your branch and @keith-hall finishes investigating whether pushing two inline contexts is really necessary to emulate embed then I'll look at merging.

@trishume
Copy link
Owner

trishume commented Dec 9, 2017

I benchmarked it and performance doesn't seem to have regressed, in fact I don't know if it's favourable testing conditions or improvements in the Nightly compiler but now highlighting jQuery takes 550ms instead of 600 and something.

@keith-hall do you know if the regression in syntax tests is just because the updated syntaxes are hitting existing bugs more than they were before, or if we broke something?

Because C# now has tons of failing tests when it only had a few before, and Bash and Make are now slightly broken where they weren't before.

@keith-hall
Copy link
Collaborator

I'm almost certain its the updated syntaxes that are causing it to hit more bugs than before - its definitely true of C#.

@trishume
Copy link
Owner

trishume commented Dec 9, 2017

Okay I'm going to merge this now then. Thanks so much to both of you for the great work!

I might hold off on releasing 2.0 for a week or so to see if I have time to look around for a few more backwards-incompatible changes I can lump in. If anyone wants a release sooner rather than later I can probably do that.

@trishume trishume merged commit af810cd into trishume:master Dec 9, 2017
@est31
Copy link
Contributor

est31 commented Dec 9, 2017

@trishume sounds fine by me!

@est31
Copy link
Contributor

est31 commented Jan 2, 2018

@trishume do you have time to do a 2.0 release now? That would be great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants