Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disabling tree-sitter on big files #338

Open
kirawi opened this issue Jun 21, 2021 · 19 comments · Fixed by #7028
Open

Disabling tree-sitter on big files #338

kirawi opened this issue Jun 21, 2021 · 19 comments · Fixed by #7028
Labels
A-helix-term Area: Helix term improvements C-enhancement Category: Improvements

Comments

@kirawi
Copy link
Member

kirawi commented Jun 21, 2021

Opening large files (e.g. >100mb) that have an associated syntax highlighting grammar leads to high memory usage and the file taking a while to load: tree-sitter/tree-sitter#222

Ideally opening such a file would prompt the user with disabling syntax highlighting for the file.

@kirawi kirawi added the C-enhancement Category: Improvements label Jun 21, 2021
@kirawi kirawi changed the title Use syntect for big files Handling big files Jun 21, 2021
@archseer
Copy link
Member

Refactor tree-sitter-highlight to work like the atom one, recomputing partial tree updates.

This is already done, the base layer is incremental, and injections are recomputed.

I personally don't think we should bother highlighting above a certain very large file size.

@kirawi
Copy link
Member Author

kirawi commented Jun 22, 2021

I agree, but some people (like me) like syntax highlighting on big files even if it's not actually that useful. So yeah, I agree with you which is why I want this to be a plugin in the future. I might do it myself if I'm not too busy. For the prompt, I'm not sure how dependent it would be on tui since we're switching to termwiz.

@kirawi
Copy link
Member Author

kirawi commented Jun 22, 2021

I'll take this issue on (prompt to disable syntax highlighting on big files).
Unassigning myself.

@kirawi kirawi changed the title Handling big files Disabling tree-sitter on big files Jun 22, 2021
@sudormrfbin
Copy link
Member

Wouldn't we have to use syntect anyway to provide highlights for languages without treesitter at some point in the future ? Installing a plugin for syntax highlighting of some filetypes seems annoying.

@kirawi
Copy link
Member Author

kirawi commented Jun 22, 2021

Maybe, syntect seems fairly light and there are a lot of common dependencies between it and Helix. However it might be a problem because it's Regex-based, and we're already trying to move away from the standard Regex library.

@kirawi kirawi added A-helix-term Area: Helix term improvements E-easy Call for participation: Experience needed to fix: Easy / not much labels Aug 19, 2021
@kirawi kirawi added E-good-first-issue Call for participation: Issues suitable for new contributors and removed E-easy Call for participation: Experience needed to fix: Easy / not much labels Oct 15, 2021
@Aloso
Copy link
Contributor

Aloso commented Feb 1, 2022

@kirawi syntect uses the oniguruma regex engine by default, though it can be configured to use fancy-regex instead. If you mean regex with the "standard Regex library", why are you trying to move away from it? cargo tree shows that regex is used by 5 different crates in the dependency graph.

@archseer
Copy link
Member

archseer commented Feb 1, 2022

I don't have any plans to drop the regex library, I also don't plan on supporting syntect. The scope of this issue should purely be on disabling highlighting on 200MB+ files.

@pppKin
Copy link
Contributor

pppKin commented Jun 22, 2022

Hi, any update on this one? I opened a 21k+ lines lua file and it stack overflowed tree-sitter and crashing helix. While I try to gather more info on that particular stack overflow and report it to upstream, it would be nice to be able to disable tree-sitter on large file right within helix.

@sudormrfbin sudormrfbin removed the E-good-first-issue Call for participation: Issues suitable for new contributors label Jul 17, 2022
@tgharib
Copy link

tgharib commented Aug 12, 2022

This is a deal-breaker for me as well. As soon as I open a 820k+ LoC C file for registers, helix crashes. Change the file extension to txt and it instantly loads.

@pppKin
Copy link
Contributor

pppKin commented Aug 15, 2022

Apart from the disable ts for files larger than xx mb thing , I think we can also implement a open as plain text command in file picker.

edit: not just file picker, search results, references, etc

@msdrigg
Copy link
Contributor

msdrigg commented Mar 1, 2023

I am seeing this as well in a long .json file. Thankfully I found this issue and changing the lang to .txt fixed it.

This should be the default for large files.

@archseer
Copy link
Member

I still think there should be an upper bound but we can set it to something really high (200~500MB?)

@archseer archseer reopened this May 18, 2023
@pascalkuthe
Copy link
Member

pascalkuthe commented May 18, 2023

I still think there should be an upper bound but we can set it to something really high (200~500MB?)

Are you just concerned about the 500ms delay when first opening such a large file (the time until the parser times out) or what usecase did you have in mind?

Atlwadt for 200MB headers in the kernel I have seen TS perform pretty decently (and I think I it can be optimized further so.it endsup pretty usable) so the limit should be pretty large (like 500MB) IMO

@archseer
Copy link
Member

I think past a certain size size limit there's just no point in wasting CPU cycles to even attempt highlighting, e.g. 500mb+

@pascalkuthe
Copy link
Member

Makes sense adding a limit should be really straightforward now. Just need to return an error from the parser function if the filesize is larger than 500MB.

@iocron
Copy link

iocron commented Jun 4, 2023

I am not even able to open huge files, because it crashes on the file picker code-preview beforehand.

@MasterAwesome
Copy link

Is there a way to support partial syntax highlighting not based on TS? Possibly using something like syntect or similar?

Alternatively, I also think the ability to force treesitter parsing to run through a command like :ts-force-parse might be useful since we have default timeouts of 500ms and possibly file size based limits too in the future.

@kirawi
Copy link
Member Author

kirawi commented Jul 3, 2024

We don't want to support any highlighter except tree-sitter. It's possible that we may support LSP highlighting in the future, but regex highlighting would have to be implemented as a plugin.

@leath-dub
Copy link

leath-dub commented Sep 28, 2024

Can you not just parse chunks of the file, treesitter was made to be error tolerant so especially if you make the chunk size large enough it is very likely an error would be recovered far away from the users view.

Edit: I suppose this is quite a refactor though as a lot of other features rely on the whole tree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-helix-term Area: Helix term improvements C-enhancement Category: Improvements
Projects
None yet
Development

Successfully merging a pull request may close this issue.