Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File size limits? #6227

Closed
OneCDOnly opened this issue Dec 27, 2022 · 7 comments
Closed

File size limits? #6227

OneCDOnly opened this issue Dec 27, 2022 · 7 comments
Labels

Comments

@OneCDOnly
Copy link

Hi guys. 😄

I'm having a minor problem with GitHub, rather than this great utility, and I think it's being caused by the size of one of my files. However, I've been unable to locate the size limits for GitHub syntax highlighting. Does anyone know what they are?

More detail: I have an unusually large BASH script (presently just over 214kiB) that is not displayed with syntax highlighting when viewed on the GitHub site. I assume GitHub refuses to format it due to the filesize.

Smaller BASH script file are displayed correctly (with syntax highlighting). It's just this large script that doesn't.

I assume it's being detected correctly. But, just-in-case, I included a specific override in my .gitattributes that reads as follows:

*.sh linguist-language=bash

... but this hasn't changed GitHub's display of this file.

Can anyone advise? Thank you.

@OneCDOnly OneCDOnly added the Bug label Dec 27, 2022
@Alhadis
Copy link
Collaborator

Alhadis commented Dec 29, 2022

presently just over 214kiB

This shouldn't be causing any issues. Do you have a link to the affected file in question?

However, I've been unable to locate the size limits for GitHub syntax highlighting. Does anyone know what they are?

Linguist imposes a hardcoded limit of 1 MB, although this isn't documented anywhere publicly, to my knowledge.

@Alhadis Alhadis changed the title file size limits? File size limits? Dec 29, 2022
@OneCDOnly
Copy link
Author

@Alhadis
Copy link
Collaborator

Alhadis commented Dec 29, 2022

🤔 This might be a limit imposed by Tree-Sitter, which handles syntax highlighting for a selection of languages instead of the grammars used by Linguist. If I change your script's hashbang to #!/usr/bin/env coffee so that it's interpreted as CoffeeScript instead of Bash, the code receives imperfect (yet noticeable) highlighting powered by Linguist's vendor/grammars/language-coffee-script module.

@lildude Could you clarify if this is a Tree-Sitter issue?

@lildude
Copy link
Member

lildude commented Jan 3, 2023

@lildude Could you clarify if this is a Tree-Sitter issue?

This isn't likely to be a tree-sitter issue as we're still using the grammar from this repo for Bash highlighting which brings me onto your prior sentence:

If I change your script's hashbang to #!/usr/bin/env coffee so that it's interpreted as CoffeeScript instead of Bash, the code receives imperfect (yet noticeable) highlighting powered by Linguist's vendor/grammars/language-coffee-script module.

This confirms this isn't a tree-sitter issue and hints towards a grammar issue.

After a lot of tinkering this afternoon, I've managed to whittle the file down to this gist which works. The syntax highlighting fails as soon as I add an f to the very last line.

It's then fixed if I remove:

  1. the f in the if above it
  2. the { at the top of this function
  3. any one of the remaining functions

This suggests to me there is some parsing or grammar error somewhere that is causing this.

Unfortunately, this is never going to be fixed in the current grammar as it's the one from Atom and if this is somehow tickling a bug in prettylights, this won't be fixed their either so the only solutions we have are:

  1. Identify if and what in the grammar is causing this and update and use a fork of the grammar (tough to test)
  2. Find another grammar that doesn't experiene the issue (tough to test)
  3. Wait for GitHub to switch to the Treesitter grammar for shell scripts (we could be waiting a while if the upstream grammar isn't complete)
  4. Mangle the script so it no longer tickles this issue (quickest to test as Gists show the issue too)

@Alhadis
Copy link
Collaborator

Alhadis commented Jan 3, 2023

This confirms this isn't a tree-sitter issue and hints towards a grammar issue.

I'm still using Atom, and have its Tree-Sitter grammars disabled. The TextMate grammar used for highlighting shell-scripts (the same one used on GitHub) works perfectly fine with the file that @OneCDOnly linked.

So, I suspect it may have something to do with PrettyLights instead…

@lildude
Copy link
Member

lildude commented Jan 4, 2023

So, I suspect it may have something to do with PrettyLights instead…

I think so too though probably only because of an issue with the grammar. I think this is Prettylights protecting itself from consuming too much memory in the event we encounter a situation where the grammar is producing a very large stack of nested elements that are never popped. We've seen this in the past and found some grammars produced massive stacks on large files resulting in Prettylights consuming huge amounts of memory and subsequently being killed.

Prettylights now limits the stack depth to 256 which should be more than enough for very large files with correctly behaving grammars, whilst also keeping resources under control. In the event the stack grows to this point, we bail and return the unhighlighted file.

Such a problem isn't likely to affect Atom as it has way more resources available to it so it doesn't need to bail in such a way and can consume all the memory it needs until it reaches the end of the file when it will then finally free the resources.

@lildude
Copy link
Member

lildude commented Jun 6, 2023

Closing as this is likely to be a grammar issue which Prettylights is protecting itself from.

@lildude lildude closed this as completed Jun 6, 2023
@github-linguist github-linguist locked as resolved and limited conversation to collaborators Jun 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants