Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TOML syntax highliting breaks with strings longer than 1024 characters #6402

Closed
ItsIgnacioPortal opened this issue May 4, 2023 · 3 comments
Labels

Comments

@ItsIgnacioPortal
Copy link

ItsIgnacioPortal commented May 4, 2023

Describe the bug

The toml syntax-highliting breaks in github if the line is 1024 (or more) characters long. Example: Line 1022 in this file: https://github.com/ItsIgnacioPortal/testrepo/blob/c638d9977ac4b9f1b2e1f8ed282a344167705d71/biglines.toml#L1022

Expected behaviour

That line 1022 be highlighted in the same color as line 1021 was.

Related discussion

N/A

Additional notes

The line in which the syntax-highlighter broke, is 1024 characters long (1024 = 2^10). This doesn't seem like a coincidence.
In this repository I've included a python script that can generate toml files with long strings, so that this issue can be tested more easily.

This might be related to #6227

@lildude
Copy link
Member

lildude commented May 4, 2023

The line in which the syntax-highlighter broke, is 1024 characters long (1024 = 2^10). This doesn't seem like a coincidence.

It's not. This is a deliberate limitation of the old prettylights syntax highlighting engine which provides the highlighting for files that use Textmate-compatible grammars, like TOML. This was implemented waaaay back in 2014 for performance reasons as very few people will realistically read such long lines on GitHub so there's little point in consuming resources highlighting them.

This limit isn't going to change as the old highlighting engineer is now in maintenance mode as we slowly migrate to a new tree-sitter based highlighter. I also doubt many people realistically read such long lines on GitHub beyond a cursory glance.

This might be related to #6227

Almost. That is likely to be hitting the other deliberate decision of limiting the stack depth to 256 which should be very rarely hit and when it is, it's almost certainly due to an error in the grammar.

Closing as will not fix.

@lildude lildude closed this as not planned Won't fix, can't repro, duplicate, stale May 4, 2023
@ItsIgnacioPortal
Copy link
Author

very few people will realistically read such long lines on GitHub so there's little point in consuming resources highlighting them.

I opened this issue because this caused a code review to fail. Politiwatch/privacyspy#154 (comment)

In privacyspy, we use long strings in our TOML files, as these are complete extracts from privacy policies, and legal documents don't tend to be short.

In the code review that I linked, the reviewer thought that because the TOML string wasn't highlighted, the string had invalid syntax, so he asked for that to be revised again. I then had to manually check that this was actually an issue with github.

This limit isn't going to change as the old highlighting engineer is now in maintenance mode as we slowly migrate to a new tree-sitter based highlighter.

That's precisely why this should be fixed. This is not a new feature, it's a bug which was caused by an undocumented behavior. And it's blocking a key functionality of Github: code reviews.

I can't imagine that increasing a limit set in 2014 almost 10 years later would be very computationally expensive.

If the fix is as simple as increasing a number, then this shouldn't take long to fix.

But if the limit truly cannot be increased, then a visible warning should be displayed that the syntax-highlighting failed, if possible with an explanation of why, and a link to its related discussion/issue. This limit should also be documented somewhere where users having the same issue will easily find it, to avoid unnecesary spending in support-peoples salary, as they'll be the first ones who will have to deal with the confused users, victims of this undocumented behavior. One example of an easily findable page title could be: "Github Syntax Highlighting Docs". There, these hidden limits would be explained. This issue would remain open until those changes are implemented, as to avoid duplicate issues in the future.

What do you think @lildude ?

@lildude
Copy link
Member

lildude commented May 5, 2023

That's precisely why this should be fixed. This is not a new feature, it's a bug which was caused by an undocumented behavior. And it's blocking a key functionality of Github: code reviews.

This isn't a bug. It's a deliberately implementation decision.

If the fix is as simple as increasing a number, then this shouldn't take long to fix.

And there's the rub. You're assuming it's a simple fix. Removing such a limitation may not be trivial and may require way more work elsewhere within the main GitHub code base which is working on the knowledge of this limitation. We won't find this out until this change is made and tested, however the old highlighter is no longer getting funding so no active development, even for actual bugs, is happening.

I've raised an issue with the team that "supports" the old highlighter, but it's not likely to see any action as is the case with the issue I opened for #5069 which is an actual bug. This behaviour has also been like this for nearly 10 years without issue, until now 😁.

That said, the other solution is to change your code and introduce a line break in these long lines. I see several other fields in that PR contain line breaks.

But if the limit truly cannot be increased, then a visible warning should be displayed that the syntax-highlighting failed, if possible with an explanation of why, and a link to its related discussion/issue. This limit should also be documented somewhere where users having the same issue will easily find it, to avoid unnecesary spending in support-peoples salary, as they'll be the first ones who will have to deal with the confused users, victims of this undocumented behavior. One example of an easily findable page title could be: "Github Syntax Highlighting Docs". There, these hidden limits would be explained.

This also isn't likely to happen for this old highlighter as it's on its way out and would require funding for someone to trawl through the code and find and document all of these limitations. I'm not part of the team that maintains the syntax highlighting engines, but I believe all these limitations will be gone once we've completed the move to the new engine which uses tree-sitter grammars.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants