-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update code to work with new embed syntax #124
Conversation
I'm quite surprised the LaTeX PR with that ugly commit got merged, there were some other PRs on the Packages repo with similar hard to read style that were reviewed and the authors of those were asked to change it to a more common, readable format. I don't know YAML syntax very well, but I believe the author of the commit in question was being lazy and didn't want to create a named context, so used that hacky workaround instead - in my experience, ST otherwise doesn't allow pushing an array containing an anonymous context and a named context at once, so this hack tricks ST into accepting it. I'm guessing that About your actual solution, it's pretty similar to what I was thinking about, nice one! (my Rust-foo wasn't good enough for me to get a PR in before you :) ) The solution, I believe, will be to push a context similar to this, whereby the - match: {{escape}}
captures:
{{escape_captures}}
pop: true
- match: ''
push:
- meta_content_scope: {{embed_scope}}
- include: {{embed}}
with_prototype:
- match: (?={{escape}})
pop: true My test syntax for this was going to be something like: %YAML 1.2
---
name: Embed_Escape Used by tests in src/parsing/parser.rs
scope: source.embed-test
contexts:
main:
- match: '"'
scope: meta.attribute-with-value.style.html string.quoted.double punctuation.definition.string.begin.html
embed: embedded_context
embed_scope: meta.attribute-with-value.style.html source.css
escape: '"'
escape_captures:
0: meta.attribute-with-value.style.html string.quoted.double punctuation.definition.string.end.html
- match: '(>)\s*'
captures:
1: meta.tag.style.begin.html punctuation.definition.tag.end.html
embed: embedded_context
embed_scope: source.css.embedded.html
escape: (?i)(?=</style)
- match: foobar
scope: top-level.test
embedded_context:
- match: a
push: # prove that multiple context levels can be "escape"d
- match: b
push:
- match: c
push:
- match: 'test'
scope: test.embedded with the following test case in #[test]
fn can_parse_issue120() {
let ps = SyntaxSet::load_from_folder("testdata/Packages").unwrap();
let syntax = ps.find_syntax_by_name("Embed_Escape Used by tests in src/parsing/parser.rs").unwrap();
let line1 = "\"abctest\" foobar";
let expect1 = [
"<meta.attribute-with-value.style.html>, <string.quoted.double>, <punctuation.definition.string.begin.html>",
"<meta.attribute-with-value.style.html>, <source.css>",
"<meta.attribute-with-value.style.html>, <string.quoted.double>, <punctuation.definition.string.end.html>",
"<source.css.embedded.html>, <test.embedded>",
"<top-level.test>",
];
expect_scope_stacks_with_syntax(&line1, &expect1, syntax.to_owned());
let line2 = ">abctest</style>foobar";
let expect2 = [
"<meta.tag.style.begin.html>, <punctuation.definition.tag.end.html>",
"<source.css.embedded.html>, <test.embedded>",
"<top-level.test>",
];
expect_scope_stacks_with_syntax(&line2, &expect2, syntax.to_owned());
} One other thought I had, is that, in ST, |
Actually, thinking about this more, it should probably take this sort of pattern instead, to take into account the backrefs properly: %YAML 1.2
---
# See http://www.sublimetext.com/docs/3/syntax.html
scope: source.example-embed
contexts:
main:
# embed style (from)
- match: (`+)(javascript|js)\s*$
captures:
1: markup.raw.code-fence.javascript.markdown punctuation.definition.raw.code-fence.begin.markdown
2: markup.raw.code-fence.javascript.markdown constant.other.language-name.markdown
embed: scope:source.js
embed_scope: markup.raw.code-fence.javascript.markdown source.js
escape: '^[ ]{0,3}(\1)\s*$'
escape_captures:
0: markup.raw.code-fence.javascript.markdown
1: punctuation.definition.raw.code-fence.end.markdown
# with_prototype style (to)
- match: (`+)(javascript|js)\s*$
captures:
1: markup.raw.code-fence.javascript.markdown punctuation.definition.raw.code-fence.begin.markdown
2: markup.raw.code-fence.javascript.markdown constant.other.language-name.markdown
push: [pop_js, embed_js]
with_prototype:
- match: (?=^[ ]{0,3}(\1)\s*)
pop: true
pop_js:
- match: '^[ ]{0,3}(\1)\s*'
captures:
0: markup.raw.code-fence.javascript.markdown
1: punctuation.definition.raw.code-fence.end.markdown
pop: true
embed_js:
- meta_content_scope: markup.raw.code-fence.javascript.markdown source.js
- include: scope:source.js Although obviously we probably have the advantage of not needing to name contexts from syntect when creating contexts inline. |
Does the testcase need to change as well? |
Good question, I believe we could get away with just changing two lines in my earlier |
Thanks for doing this! I'm pretty busy until December so I'm not going to casually review this before its done, but when you think it's ready let me know and I'll review it. I'll probably end up bundling #125, this PR, and maybe one or two other minor breaking changes into a 2.0.0 release early December. |
@keith-hall is helping with the sublime syntax related issues he mentioned |
allow immediate child patterns of with_prototype to use backrefs
Thanks to @keith-hall work, it's almost there. Some tests in |
src/parsing/parser.rs
Outdated
@@ -451,8 +451,26 @@ impl ParseState { | |||
MatchOperation::None => return false, | |||
}; | |||
for (i, r) in ctx_refs.iter().enumerate() { | |||
let proto = if i == ctx_refs.len() - 1 { | |||
pat.with_prototype.clone() | |||
let proto = if i == ctx_refs.len() - 1 { // https://forum.sublimetext.com/t/dev-build-3111/19240/17 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have time to fully review this yet (I will soon), but I just want to mention that this big deeply nested block looks scary. The link to the forum post is definitely helpful, but if there's some way you could make this less scary that would be great, or at least comment it more heavily. Maybe explain how it is implementing the behaviour described in that post in terms of syntect's data structures.
...as the contexts named aren't shown, and instead the entire match patterns etc. are output, the debug info from the parse state is too large to be useful, so I removed this aspect of it
@trishume I believe this is ready for review, unless @keith-hall wants to add his fix for #127 in that PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preliminary review: looks good except for the with_prototype
capture handling.
But don't fix that yet since I'm going to take a hard look at how with_prototype
s work and try and figure out a better way. If I seem to have forgotten about my commitment to doing this though feel free to ping me.
@@ -33,6 +33,7 @@ serde_json = "1.0" | |||
rayon = "0.8.0" | |||
regex = "0.2.1" | |||
getopts = "0.2" | |||
pretty_assertions = "0.4.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approve 👍
@@ -621,6 +665,118 @@ mod tests { | |||
} | |||
} | |||
|
|||
#[test] | |||
fn can_parse_embed_as_with_prototypes() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this test. Very clearly shows what the rewrite does.
src/parsing/yaml_load.rs
Outdated
1: meta.tag.style.begin.html punctuation.definition.tag.end.html | ||
push: | ||
- [{ match: '(?i)(?=</style)', pop: true }] | ||
- [{ meta_content_scope: 'source.css.embedded.html'}, { match: '', pop: true }] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My knowledge of Sublime syntaxes is quite rusty (heh), but what's the purpose of this context? Why is the meta_content_scope
not just in the scope above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this may just be a leftover from how I originally thought it would work, and I didn't spot it when I was cleaning up...
src/parsing/yaml_load.rs
Outdated
- [{ meta_content_scope: 'source.css.embedded.html'}, { match: '', pop: true }] | ||
- scope:source.css | ||
with_prototype: | ||
- match: (?=(?i)(?=</style)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The double-lookahead does exactly the same as the single lookahead right? It's just that the escape
regex might not be a lookahead and in order to emulate it correctly we have to ensure this one is a lookahead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct 👍
src/parsing/parser.rs
Outdated
// even have had any capture groups which would previously have caused | ||
// a panic) | ||
if match_pat.has_captures { | ||
match_pat.regex = Some(match_pat.compile_with_refs(regions, line)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this doesn't work properly if there's recursion that allows the same with_context
to be pushed twice on the stack, when the top one is popped it will still have the old captures compiled into it. This may never happen in practice with a with_prototype
that has captures but it's possible in principle.
I'm also looking at this code again and thinking that I might have really screwed up the implementation of with_prototype
and that I can redo it completely to be cleaner.
Don't try and fix this yet, I want to take a crack at simplifying a bunch of the related code, which may just outdate whatever fix you try.
@@ -2,7 +2,7 @@ | |||
<% | |||
Class TestClass2 Public Sub TestSub () Response.Write("wow") End Sub End Class | |||
'^^^^^ meta.class.asp meta.class.identifier.asp storage.type.asp | |||
' ^ meta.class.asp meta.class.body.asp meta.class.asp meta.class.identifier.asp | |||
' ^ meta.class.asp meta.class.identifier.asp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought I copied this from the Sublime syntax test, did the Sublime one change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just posted https://github.com/Keats/syntect/pull/2 which fixes the issue with the captures logic while also folding it into the general case. Once you merge that into your branch and @keith-hall finishes investigating whether pushing two inline contexts is really necessary to emulate |
Clean up and fix with_prototype captures logic
I benchmarked it and performance doesn't seem to have regressed, in fact I don't know if it's favourable testing conditions or improvements in the Nightly compiler but now highlighting jQuery takes 550ms instead of 600 and something. @keith-hall do you know if the regression in syntax tests is just because the updated syntaxes are hitting existing bugs more than they were before, or if we broke something? Because C# now has tons of failing tests when it only had a few before, and Bash and Make are now slightly broken where they weren't before. |
I'm almost certain its the updated syntaxes that are causing it to hit more bugs than before - its definitely true of C#. |
Okay I'm going to merge this now then. Thanks so much to both of you for the great work! I might hold off on releasing 2.0 for a week or so to see if I have time to look around for a few more backwards-incompatible changes I can lump in. If anyone wants a release sooner rather than later I can probably do that. |
@trishume sounds fine by me! |
@trishume do you have time to do a 2.0 release now? That would be great. |
Fix #120
Not optimal at all since I don't know how much we care about perf in the parsing and I chose the way that (I think) would need the least code changes.
I'm skipping the Latex syntax as a recent commit broke the use with Syntect (Doesn't work with sublimehq/Packages@666b869#diff-40f615a2e7ecb9138b1bc5826bed9074R1522), cc @keith-hall if you know what's wrong, the YAML syntax on that commit looks a bit odd to me.
Also added
pretty_assertions
for dev deps, I can remove it if you want but it's really useful to have differences inassert_eq!
be diffed and highlighted.