Very slow editing of large files when tree-sitter is used #3072

kchibisov · 2022-07-14T19:11:47Z

Summary

Certain markdown files are very slow to edit. In particular when they have a lot of lists.
For example a CHANGELOG.md file from the helix repo is a enough to trigger it.

The problem is likely in grammar here, since neovim with tree-sitter works fine on it.

While this particular problem could be solved with slightly optimized, but less accurate grammar. However the issue manifests itself on any large file of any other type, just requires much larger file than ~600LoC(I've used 50K LoC C++ that I have around).

To avoid delay and input lag in those particular scenarios it would make sense to either run tree-sitter only on a part of the buffer (I'm not familiar how it works) or update it after the typing ended. Since right now you have a latency of like 2 seconds on 50K lines file of C++, which is impossible to work with.

Reproduction Steps

Try to edit CHANGELOG.md from the helix repo or any large file.

Helix log

No response

Platform

Linux

Terminal Emulator

alacritty

Helix Version

helix 22.05 (3cced1e)

The text was updated successfully, but these errors were encountered:

MDeiml · 2022-07-14T22:47:47Z

Author of the markdown grammar here. It seems that at the moment language injection cannot be parsed incrementally. The markdown grammar uses injection a lot so it becomes quite apparent for it. This is what I get from line 846 here:

helix/helix-core/src/syntax.rs

Lines 841 to 848 in 8681fb6

    
           let layer = self 
        
               .layers 
        
               .iter_mut() 
        
               .find(|(_, layer)| { 
        
                   layer.depth == depth && // TODO: track parent id instead 
        
                   layer.config.language == config.language && layer.ranges == ranges 
        
               }) 
        
               .map(|(id, _layer)| id);

Notice how a layer can only be reused if ranges did not change. But this is often the case when editing a file. The test here could probably be laxer.

I didn't look to deep, but (from my knowledge of treesitter) any layer should be fine here to "reuse" as long as none is reused twice. So something akin to

Choosing the layer whose ranges "match best" in some way
Changing those ranges to the new ones

should work.

archseer · 2022-07-15T01:59:08Z

We actually follow tree-sitter-cli/atom logic to update all the markers according to user edits beforehand: https://github.com/helix-editor/helix/blob/8681fb6d9ee06610fac2baafb4ca34f93f9c7f4d/helix-core/src/syntax.rs#L644-L726=

So the injection layers are tracked reliably.

We also seem to only use HTML injections & code blocks: https://github.com/helix-editor/helix/blob/master/runtime/queries/markdown/injections.scm

MDeiml · 2022-07-15T07:14:35Z

Ah ok, I guess you are still using the old version of my parser then (the main branch). You could try updating to the split_parser branch. That one has some major performance improvements, but uses injection very extensively.

archseer · 2022-07-15T10:23:25Z

Ah, good point! I'll see if I can update us to the new branch 👍

Resolves helix-editor#3072

asb · 2022-08-07T02:51:10Z

On an 8.4MiB markdown file (I keep daily notes in a single big document), helix 22.05 takes > 10s to open it and is used 1GiB of memory after doing so. I'm assuming this is tree-sitter related and the same issue described here, but let me know if you think not.

asb · 2022-08-08T13:29:48Z

The following script generates a ~5MiB file that takes >5 seconds to open with Helix. After opening, memory usage (reported via ps_mem) is ~660MiB.

#!/bin/sh

for i in $(seq 10000); do
  cat <<EOF >> big_markdown.md
# Title
## Subtitle
Some notes about things.

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore
eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt
in culpa qui officia deserunt mollit anim id est laborum

* A list
  * With a sub-item
* Item 2
* Item 3
* Item 4
* Item 5
  * Item 5.1

EOF
done

kchibisov · 2022-08-08T14:23:02Z

@asb if you disable grammar for markdown does that happen?

asb · 2022-08-08T14:28:15Z

@kchibisov I'm not sure what option to change to disable the grammar (I was trying out helix for the first time). Certainly, if mv big_markdown.md big markdown.txt and open the version of the file with a .txt extension, it opens instantly and helix consumes a more reasonable 18MiB RAM.

MDeiml · 2022-08-08T14:31:22Z

I think the simplest way to fix this is to only run the grammar for injected languages if they are visible (i.e. run it lazily).

Markdown is split into two grammars: block and inline. The block-level grammar is quite fast and produces the ranges on which the inline-level grammar is injected. So the computational cost mostly grows with the amount of inline content parsed.

I'd guess that the memory cost is also from the parsed inline-level trees, so that problem would also be solved in the same way, but I'd have to check that.

MDeiml · 2022-08-08T14:32:53Z

I'm not sure how to best implements something like this tough. Currently the parsing is handled by the Document impl, but for lazy parsing we need information from the Views that point to that document.

Yevgnen · 2022-08-15T06:36:37Z

Having trouble when open rust.org.gz 3000 line .org file and never finish. Renaming it to abc.xyz to avoid syntax highlight then the process is instant.

kirawi · 2022-09-07T20:49:29Z

I think the simplest way to fix this is to only run the grammar for injected languages if they are visible (i.e. run it lazily).

Markdown is split into two grammars: block and inline. The block-level grammar is quite fast and produces the ranges on which the inline-level grammar is injected. So the computational cost mostly grows with the amount of inline content parsed.

I'd guess that the memory cost is also from the parsed inline-level trees, so that problem would also be solved in the same way, but I'd have to check that.

I think that this could build upon the work in #2857 to keep track of the nodes and lazily highlight injections.

MDeiml · 2022-10-13T18:40:39Z

Just wanted to mention #4146 here. Maybe that fixed some problems, but probably not all of them.

kchibisov · 2022-10-15T09:39:04Z

That indeed improved the situation and the editing of markdown is at least possibly with tree-sitter.

Large files still blow up helix leading to seconds in input lag. Not sure where it should be tracked.

midnightcodr · 2023-03-08T15:08:59Z

With helix 22.12 (34be71fb) I am seeing about 1 second delay per keystroke when doing either insert or search. The file I am editing is a 129MB mysql dump file with 150 very long lines.

The machine I am working on is a MacBook Pro M1 with 16GB of RAM.

dead10ck · 2023-03-10T05:32:25Z

Wondering if #6218 fixed this

archseer · 2023-03-10T06:21:47Z

It probably fixed or improved viewing the file, but insert mode will still be just as slow on large files.

archseer · 2023-03-10T06:23:35Z

The file I am editing is a 129MB mysql dump file with 150 very long lines.

Yeah both of these are an issue. You're parsing a very large file but performance is also not optimized on long lines. Or at least it wasn't before but maybe @pascalkuthe's rewrite improved things slightly.

pascalkuthe · 2023-03-10T07:27:07Z

The file I am editing is a 129MB mysql dump file with 150 very long lines.

Yeah both of these are an issue. You're parsing a very large file but performance is also not optimized on long lines. Or at least it wasn't before but maybe @pascalkuthe's rewrite improved things slightly.

Right now it shouldnt make much if a difference. Howver performance for huge lines will be improved in a future followup PR to #5420 where I will Implement line chunking. This was omitted from the MVP but the architecture lends itself well to this optimization (and was designed with it in mind, I even had a mostly finished implementation but the changes were accidently deleted by an overwrite from a second helix instance)

kchibisov · 2023-04-07T14:19:30Z

While my original issue is resolved with the markdown.

I'd think I'll leave a real world example of a big file that is well formatted and not some dump of the logging utility, since I've mentioned in original report a C++ file, but I haven't showed it.

So that's the file I've referred to originally wrt my C++ quote.

https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/X86/X86ISelLowering.cpp

The :set-language text makes a normal editing experience(same perf as file of any other size).

mvdan · 2023-05-03T15:07:44Z

I use git commit --verbose, which causes the COMMIT_EDITMSG files to contain the full diff of what's being committed. At work, I just happened to commit changes weighing about ~400k lines of diff, all generated, and I experienced this slowness - each insert keystroke lagged by about a second. @kchibisov's suggestion of :set-language text works like a charm as a workaround, for now.

Still happening with 23.03, for what it's worth.

kchibisov added the C-bug Category: This is a bug label Jul 14, 2022

kirawi added a commit to kirawi/helix that referenced this issue Jul 19, 2022

Update markdown grammar

a5fdad4

Resolves helix-editor#3072

MDeiml mentioned this issue Jul 19, 2022

Use split_parser branch for markdown grammar #3108

Merged

the-mikedavis added the A-tree-sitter Area: Tree-sitter label Jul 31, 2022

kirawi mentioned this issue Mar 24, 2023

Unusable in large files (slow syntax highlight) #6427

Closed

kchibisov changed the title ~~Slow tree-sitter performance on certain files~~ Very slow *editing* of large files when tree-sitter is used Apr 7, 2023

ysooqe mentioned this issue May 27, 2024

Helix becomes very slow when the file contains ultra long lines #10826

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Very slow editing of large files when tree-sitter is used #3072

Very slow editing of large files when tree-sitter is used #3072

kchibisov commented Jul 14, 2022 •

edited

Loading

MDeiml commented Jul 14, 2022

archseer commented Jul 15, 2022

MDeiml commented Jul 15, 2022

archseer commented Jul 15, 2022

asb commented Aug 7, 2022

asb commented Aug 8, 2022

kchibisov commented Aug 8, 2022

asb commented Aug 8, 2022 •

edited

Loading

MDeiml commented Aug 8, 2022

MDeiml commented Aug 8, 2022

Yevgnen commented Aug 15, 2022

kirawi commented Sep 7, 2022

MDeiml commented Oct 13, 2022

kchibisov commented Oct 15, 2022

midnightcodr commented Mar 8, 2023

dead10ck commented Mar 10, 2023

archseer commented Mar 10, 2023

archseer commented Mar 10, 2023

pascalkuthe commented Mar 10, 2023 •

edited

Loading

kchibisov commented Apr 7, 2023

mvdan commented May 3, 2023

Very slow *editing* of large files when tree-sitter is used #3072

Very slow *editing* of large files when tree-sitter is used #3072

Comments

kchibisov commented Jul 14, 2022 • edited Loading

Summary

Reproduction Steps

Helix log

Platform

Terminal Emulator

Helix Version

MDeiml commented Jul 14, 2022

archseer commented Jul 15, 2022

MDeiml commented Jul 15, 2022

archseer commented Jul 15, 2022

asb commented Aug 7, 2022

asb commented Aug 8, 2022

kchibisov commented Aug 8, 2022

asb commented Aug 8, 2022 • edited Loading

MDeiml commented Aug 8, 2022

MDeiml commented Aug 8, 2022

Yevgnen commented Aug 15, 2022

kirawi commented Sep 7, 2022

MDeiml commented Oct 13, 2022

kchibisov commented Oct 15, 2022

midnightcodr commented Mar 8, 2023

dead10ck commented Mar 10, 2023

archseer commented Mar 10, 2023

archseer commented Mar 10, 2023

pascalkuthe commented Mar 10, 2023 • edited Loading

kchibisov commented Apr 7, 2023

mvdan commented May 3, 2023

Very slow editing of large files when tree-sitter is used #3072

Very slow editing of large files when tree-sitter is used #3072

kchibisov commented Jul 14, 2022 •

edited

Loading

asb commented Aug 8, 2022 •

edited

Loading

pascalkuthe commented Mar 10, 2023 •

edited

Loading