-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Syntax highlighting splits unicode characters #5069
Comments
🤔 I think this really is a problem with the grammar and not a problem with the way the grammars are being processed as using the old Lean Textmate grammar we used to use, using the exact same parsing/processing, works as expected:
... versus ...
We switched grammars in #4546. |
Thanks for doing that test. What does the conversion between the various types of grammar file? Could it be a bug that applies only to |
There are two but in this case it's an internal library called PrettyLights heavily based off the Textmate grammar processing. It's not open source because of various licensing requirements. The other is the grammar processor in this repo which produces the JSON files used by PrettyLights in production. These files are attached to each release.
It's possible, but I don't think so as ultimately we convert all grammars to JSON. I've taken the
I've taken a look at the history of the new grammar and I can see it switched from TextMate to JSON in this commit. Using that JSON file produces the same syntax highlighting at the old TextMate grammar as can be see here so I think this confirms this is definitely an issue with the grammar itself. I'm not very good with writing grammars, but I know @Alhadis is quite the dab hand; he may be able to spot where things are going wrong in the current version of the grammar. |
Thanks for the further investigation. I think you're right that the {
"name": "Lean",
"scopeName": "source.lean",
"patterns": [
{
"name": "storage.type.lean",
"match": "\\b(Prop|Type|Sort)\\b"
}
]
} https://gist.github.com/eric-wieser/c5a9efea2581d65fda99ec2816177fde |
Okay, this is weird. leanprover/vscode-lean@deb64b0 appears to be the commit that broke the syntax highlighting, according to the Lightshow results:
However, that doesn't make sense, because the commit in question only added a single keyword to an unrelated pattern: - "match": "\\b(Prop|Type)\\b",
+ "match": "\\b(Prop|Type|Sort)\\b", I agree with @eric-wieser; I don't believe the grammar file is to blame. |
@Alhadis is spot on - I found that if I removed either |
🤔 Interesting. Good work peeps. I've reached out to the internal maintainers of the syntax highlighter to get some 👀 on this. |
Did the internal maintainers of the "PrettyLights" syntax highlighter make any progres on this? |
No, and they're not likely to in the near future as work is prioritized on the replacement for PrettyLights which uses Treesitter-based grammars. There are no plans at the moment to allow Linguist to supply treesitter-based grammars but this might become a possibility in the future. |
Closing as "won't fix" as there is no more funding for the ancient prettylights highlighter so this will never be fixed for Textmate-based grammars. |
Is there an alternative format to textmate-based grammars that uses the new system that we could contribute? |
@eric-wieser Yes, but it requires external tooling to generate a several megabytes C file from a weird-looking dialect of Scheme (that you need to track with version control…). However… I'm looking through Lean's grammar now, and I notice you're not tokenising the Unicode brackets in --- syntaxes/lean.json 2022-12-15 12:22:52.000000000 +1100
+++ grammars/lean.json 2022-12-15 12:36:21.000000000 +1100
@@ -76,11 +76,18 @@
},
{ "match": "\\b(?<!\\.)(variable|variables|parameter|parameters|constants)(?!\\.)\\b",
"name": "keyword.other.lean"
},
+ { "include": "#brackets" },
{ "include": "#expressions" }
],
"repository": {
+ "brackets": {
+ "patterns": [
+ {"match": "⟨", "name": "punctuation.definition.bracket.angle.begin.lean"},
+ {"match": "⟩", "name": "punctuation.definition.bracket.angle.end.lean"}
+ ]
+ },
"expressions": {
"patterns": [
{ "match": "\\b(Prop|Type|Sort)\\b", "name": "storage.type.lean" },
{ "match": "\\b(sorry)\\b", "name": "invalid.illegal.lean" }, You can and should scope all significant punctuation like brackets, operators, separators (etc) as |
Preliminary Steps
Please confirm you have...
Problem Description
Syntax highlighting is splitting a unicode codepoint into two garbled halves:
https://github.com/leanprover-community/mathlib/blob/d4477fa7f79beea1058f72fc3741c88a1832d9a1/src/group_theory/ore_localization.lean#L29
To eliminate browser interference, you can reproduce the issue with
which prints
\xe2\x9f\xa8
is the utf8 encoding of⟨
, which has somehow ended up with a span tag right in the middle of it.The hypothesis over at https://leanprover.zulipchat.com/#narrow/stream/113488-general/topic/github.20syntax.20highlighting is that some kind of chunking is going on in the file.An even simpler reproduction is https://gist.github.com/eric-wieser/caef77bc87edc0feae06bd91b0d241f2/756f85e2f06618ef2b7261e7ec3fca0aa0d73e2f:
While I know that the "How Linguist Works" page says that highlighting issues belong in upstream repos, this looks like a highlighting issue with how the grammar files themselves are processed.
URL of the affected repository:
As above, https://github.com/leanprover-community/mathlib/blob/d4477fa7f79beea1058f72fc3741c88a1832d9a1/src/group_theory/ore_localization.lean#L29
Last modified on:
The text was updated successfully, but these errors were encountered: