faster line iterator #70

pascalkuthe · 2022-11-01T16:43:53Z

Note: This is a very early draft and still contains known issues. CI failures are known and expected (multiple tests relating to the reverse iterator still fail).
It's just posted here to allow comparison with the approach in #69 .

This PR is now mostly ready for review, although I still want to add some more comments and clean up the code a bit. But this PR works now so reviewing makes sense now. I have dropped the unrelated commits and removed debug prints.

This implements a line iterator using a custom chunks iterator that tracks the lowest common parent across line boundries to avoid a binary search in RopeSlice::new.

The results are pretty spectacular. On synthetic benchmarks it outperforms the old implementation by a factor between 3x and 9x.
See the run of the builtin benchmarks below (this is the wrong way around, I benchmarked the new implementation first and then the old one):

I have tested the iterator in helix with helix-editor/helix#4457, reloading a 1GB file improves the total reload time (excluding IO) from 14 seconds to 7 second.

cessen · 2022-11-01T20:37:07Z

This is definitely the approach I wanted to take with this. Indeed, it's subtle to get right, which is why I put it off for so long, ha ha. Thanks for taking this on! I'll take a look at the actual code when my brain is functioning properly again.

cessen · 2022-11-01T20:40:52Z

Ah... maybe I'm forgetting how github works, but it looks like the commits from your other PRs are rolled in with this one? Could you keep your PRs on separate branches so it's easier to review? (And also easier to accept some PRs without having to accept others too early.)

pascalkuthe · 2022-11-01T20:49:25Z

Hey @cessen, thanks for all the feedback. Please take the time to recover, I hope you get well soon! Regarding the extra commits you are totally right about that. This PR is not ready for review yet (hence marked as draft) and not at all in a state where I would usually post it. It just wanted to post my state for the discussion in #69. There are still comments to add, a few edge cases to fix, benchmarks to perform and indeed removing the other commits. I was just starting off my local master, sorry about that.

Closing the PR was a missclick. Sorry about that

archseer · 2022-11-05T00:37:16Z

Should we also include 1594596?

pascalkuthe · 2022-11-05T05:51:33Z

Should we also include 1594596?

I would hope that this optimizes to the same code but I will check the assembly later to make sure. If the compiler can indeed not optimize this then including it might add a small performance win

pascalkuthe · 2022-11-05T05:58:53Z

BTW the current implementation does not yet use SIMD for finding the next newline within a chunk because the functions available in str_indicies do not do what I need. I already have a SIMD version of find_next_line_break (same as line_to_byte_idx(text, 1) but returns None if no line break is found to disambiguate a newline char at the very end from no newline char) implemented and the reverse iteration function should be decently easy as well. From testing I did locally this has the potential to improve performance even further as most time is actually spent on finding the newlines within chunks and the SIMD versionof those functions is at lest 2x faster (for lines linger then 16 bytes)

Should I prepare a PR to str_indices for that and block this PR on that or should that be a follow-up PR?

archseer · 2022-11-05T10:38:34Z

Maybe open a draft PR targetting this PR. Github will automatically switch it to target master if this one lands first.

src/str_utils.rs

tests/proptest_tests.rs

cessen · 2022-11-05T22:37:07Z

Those results look great! Awesome work.

I still don't have the energy for a full code review yet, but I left a couple of notes that I think can be addressed already.

cessen · 2022-11-05T22:42:15Z

Should I prepare a PR to str_indices for that and block this PR on that or should that be a follow-up PR?

I don't think any changes to str_indices are necessary. I made an in-line review comment about how to implement your function in terms of lines::to_byte_idx().

pascalkuthe · 2022-11-05T23:36:26Z

Those results look great! Awesome work.

I still don't have the energy for a full code review yet, but I left a couple of notes that I think can be addressed already.

Thanks for your comments! I replied to these, both of those are on me.
I anticipated these exact review comments (were my first thoughts aswell) and wanted to add review comments myself to preempt them but I ended up forgetting about it. Sorry about that!
Please take your time to recover, I hope you feel better soon! Health always comes first.

pascalkuthe · 2022-11-07T14:43:19Z

@cessen After rebasing this PR on master the tests you added actually cached a couple of bugs I didn't consider in my implementation (and then some more by the proptests because the changes exercised some new code paths). The implementation really is extremely subtle in places.

In the process I have also switched back to line_to_byte_idx as you suggested. I did not end up implementing that as a separate function because I only needed in one place and when considering the additional edge-cases the condition actually ended up being a bit more complex then you suggested.

The good news is that now all tests pass so this implementation is hopefully correct.

Because this PR is already quite big (and a large performance win in every situation) I would like to outscope the SIMD implementation for the reverse iteration for now. I think it would definitely be good to have that (and I know how to implement it) but it seems reasonable to do this in a followup PR (especially because I actually noticed that the current function needed to be adjusted to fix an edge-case PR and I am not 100% sure what the required API will be)

pascalkuthe · 2022-11-07T14:44:25Z

@cessen I am not sure why the three tests are failing in the miri CI. They work fine in the normal CI/locally? Maybe some SIMD free fallback function used by MIRI behaves differently then the normal function? It doesn't seem related to this PR.

Edit: It seems the same MIRI tests fail on master too so its probably not related to this PR.

src/iter.rs

cessen · 2022-11-08T19:00:22Z

Because this PR is already quite big (and a large performance win in every situation) I would like to outscope the SIMD implementation for the reverse iteration for now.

Yeah, I think that's a good call. Honestly, it's not totally clear to me that it's worth bothering with at all. Because although reverse Char and Byte iterators have some common performance-critical use cases, I struggle to think of similar cases for reverse Lines iterators. So it might be one of those "wait until someone has a use case" things.

It seems the same MIRI tests fail on master too so its probably not related to this PR.

Yeah, I doubt it's related to this PR. Feel free to ignore them. I'll look into and fix them before the next release.

cessen · 2022-11-08T19:55:09Z

src/iter.rs

+    pub(crate) fn new_with_range(
+        node: &Arc<Node>,
+        byte_idx_range: (usize, usize),
+        line_idx_range: (usize, usize),
+    ) -> Lines {
+        Lines::new_with_range_at_impl::<false>(node, 0, byte_idx_range, line_idx_range)
+    }
+
+    pub(crate) fn new_with_range_at(
+        node: &Arc<Node>,
+        at_line: usize,
+        byte_idx_range: (usize, usize),
+        line_idx_range: (usize, usize),
+    ) -> Lines {
+        Lines::new_with_range_at_impl::<true>(node, at_line, byte_idx_range, line_idx_range)
+    }


I may be misunderstanding the code here, but it looks to me like new_with_range() and new_with_range_at() are inconsistent with each other. The following two calls should be identical:

Lines::new_with_range(node, byte_range, line_range); Lines::new_with_range_at(node, 0, byte_range, line_range);

But it looks like they're different. Specifically, the former calls Lines::new_with_range_at_impl() with a false const generic parameter, and the latter with true.

I'm also a little nervous about that const generic parameter existing at all: I would expect it to either always be true or always be false. I haven't looked at the code closely enough yet, though, so I could certainly be wrong about that. But it smells a little to me.

Ok this is definitly subtle but its the most elegant way to implement it I could come up with. I will try to explain:

First a small nitpick about your comment so it doesn't get confusing later:
The two iterators you posted are not always equivalent. They are only equivalent when you are iterating a full Rope (or a RopeSlice that happens to start within the first line of its top Node). However these two iterators are always equal if you use line_range.0 instead of 0:

Lines::new_with_range(node, byte_range, line_range) Lines::new_with_range_at(node, line_range.0, byte_range, line_range)

Its trivial to see that my code fulfill that criteria, because if you look into the new_with_range_at_impl function it actually special cases line == line_range.0 to always return Lines::new_with_range(node, byte_range, line_range) right at the start of the function.
The reason this special case is required is because the first line might start before the start of the RopeSlice (so before byte_range.0). That means the line start is potentially in a different chunk then the position you are looking for. To find the correct chunk you need to perform a tree search for the byte position byte_range.0 instead of the line line_range.0.

So really the const generic indicates:

false => start the iterator at the first line

true => start the iterator at the specified line (with a dynamic check if this line is the first line to switch to false)

This special case might seem odd but I will try to explain my reasoning a bit more:
A similar problem occurs at the end of the line_range (so line >= line_range.1). I handle this special case as follows:

let mut res = Lines::new_with_range_at(node, line_range.1 - 1, byte_range, line_range); res.next(); res

This reuses the logic I already have inside the line iterator to correctly handle the end of the last line and therefore doesn't require yet another specical cased implementation of the new function.

You might considered doing the same for the start of the line:

let mut res = Lines::new_with_range_at(node, line_range.0 + 1, byte_range, line_range); res.prev(); res

However this doesn't work when there is only a single line (then these two cases just endlessly recuse into each other) so you need to handle one of them as a special case.
Its admittedly somewhat of arbitrary which one of those two you special case but there are two main couple reasons I special cased the first line instead of the last line:

the implementation is slightly easier/more natural

the special case is slightly faster (because it does not involve stepping the iterator an additional time) and constructing an iterator at the first line should be way more common (just lines) then constructing an iterator at the last line

I think its a bit easier to think about new_with_range and new_with_range_at as two separate functions (where new_with_range_at will call new_with_range for line == line_range.0).
These two functions are just very similar so I used the const generic here to share most of the implementation.

First a small nitpick about your comment so it doesn't get confusing later:

Ah, right. Thanks for pointing that out!

The reason this special case is required is because the first line might start before the start of the RopeSlice (so before byte_range.0). That means the line start is potentially in a different chunk then the position you are looking for. To find the correct chunk you need to perform a tree search for the byte position byte_range.0 instead of the line line_range.0.

I'll keep an eye out for this when I take a closer look at the code. But at first blush, I think I would have implemented the tree traversal code to account for both the line and byte starts, diving into the the furthest-forward of the two as it goes. Then (unless I'm still missing something, which is possible) there wouldn't need to be any special cases.

A similar problem occurs at the end of the line_range (so line >= line_range.1). I handle this special case as follows:

I guess part of what's throwing me off here is that it's not clear to me why these cases aren't handled in the traversal/stack-building code itself. It feels to me like there's a simpler implementation struggling to emerge here, but we haven't quite hit it yet.

Partly this is from experiences with my own code in the past: special cases that look a lot like this have very often indicated that my perspective on the problem is a little off.

But to be clear, that doesn't mean the code is incorrect. Not at all. Just that a more straightforward and also correct alternative implementation may be possible.

Having said all of that, I don't mean to block this PR on stuff like that. As long as it's correct and I can follow it, let's get this merged. Especially considering that there's already a lot of "too complex" code in Ropey anyway, which is my own doing. Ha ha. :-) I can always revisit the code and try to simplify it myself in the future.

I thaught about your suggestion regarding doing the tree sesach for lines/bytes all at once but my intuition is that while its theoretically possible it would involve negative indexing (because you always have to keep track of your position relatively to the current node) and a bunch of extra conditions to handle that correctly so the sketch I have in my mind right now is actually more complex then the current implementation.

I can experiment a bit with it (the ideas in my mind are often wrong :D) and if I find s better solution in the future I might do a follow-up PR if you can live with the current implementation for now. This function took a while to to get right (edgecases like a rope slice split inside a CRLF must be considered correctlly) and so overall improving this would take a while

pascalkuthe · 2022-11-09T17:31:01Z

There was another problem with CI where medium.txt was removed which made the test I added for #67 fail.
I think I might have just forgotten to stage that file in that PR. I added a commit with that file but feel free to just fix that straight in master. It doesn't warrant another PR I think (I will drop the commit then)

src/iter.rs

pascalkuthe · 2022-11-11T02:04:29Z

Sorry I meant to fix your comments today but totally forgot about it. Its pretty late now here but I will do it tomorrow moring!

pascalkuthe · 2022-11-11T17:34:08Z

I fixed the mistakes in the comments and added a ton more. In the process I went trough the entire code again and found two unnecessary branches in the new_with_range_at function. These branches had simply no effect anymore (they were only needed in previous versions of the implementation). This simplified that function quite a bit (although it did not remove the const generic).

I think/hope that the added comments make this PR a easier to review.
I would recommend starting with the next_impl function. The prev_impl function is a bit more tricky and I think the context of the constructor isn't really required for reviewing the next_impl function.

cessen · 2022-11-12T19:00:50Z

Thanks @pascalkuthe! I'll try to get to this as soon as I can. A somewhat more urgent task came up in another project, but I think that will probably wrap up within a week. Let me know if this gets urgent, and I'll bump up the priority.

pascalkuthe · 2022-11-13T13:21:47Z

I think @archseer was planning to do a new helix release at the end of this month and wanted to include my two PRs that are blocked on a new ropey version (helix-editor/helix#3890 and helix-editor/helix#4457). It would be awesome if that would workout. I think we could still manage that if we continue next week with the review and nothing mayor turns up but I of-course understand that other things have to take priority @cessen.

cessen · 2022-11-15T05:23:26Z

Okay, so things have gotten even more hectic. Long story short: I'm working as the VFX supervisor on a film shoot this week (Mon-Fri). I thought I was going to have time in the evenings to work on the other task I mentioned, but that mostly doesn't seem to be the case. So that other task (which also has a deadline of end of the month) is also getting postponed, which would push out the full review of this PR even further.

I don't think it makes sense to keep you guys waiting on me, particularly if you want to make a release by the end of the month. So I think what I might do is this:

Do a cursory review, just to make sure nothing obviously wrong jumps out.
Merge the PR.
Publish a pre-release to crates.io that you guys can use for the upcoming Helix release.
When I get the time, review the code properly and just make any changes I'd like to see myself. And then make a proper release.

Does that sound reasonable? Basically, I don't want to block you guys, and I definitely want to merge this PR. But I also don't want to make a "proper" release without thoroughly reviewing the code first.

pascalkuthe · 2022-11-15T13:42:47Z

Okay, so things have gotten even more hectic. Long story short: I'm working as the VFX supervisor on a film shoot this week (Mon-Fri). I thought I was going to have time in the evenings to work on the other task I mentioned, but that mostly doesn't seem to be the case. So that other task (which also has a deadline of end of the month) is also getting postponed, which would push out the full review of this PR even further.

I don't think it makes sense to keep you guys waiting on me, particularly if you want to make a release by the end of the month. So I think what I might do is this:
1. Do a cursory review, just to make sure nothing obviously wrong jumps out.

2. Merge the PR.

3. Publish a pre-release to crates.io that you guys can use for the upcoming Helix release.

4. When I get the time, review the code properly and just make any changes I'd like to see myself.  And then make a proper release.
Does that sound reasonable? Basically, I don't want to block you guys, and I definitely want to merge this PR. But I also don't want to make a "proper" release without thoroughly reviewing the code first.

I fully understand that other things have to take priority.
Your proposal sounds great, thank you for offering that!
I have spoken with @archseer and a crates.io prerelease like 0.16.0-rc1) with a cursory reviewed version of this PR would unblock the relevant PRs for helix.

I hope that the comments I added will allow the cursory review to be quick so it doesn't take too much of your time.
Once you have more time again and want dive fully into the details of this PR in the future, I will still be happy to answer any questions about the implementation (or implement improvements if you don't want do it yourself).

pascalkuthe · 2022-11-24T13:01:24Z

@cessen sorry for the ping. Do you think you could take a look at this soon? The end of November is only a week away so there is not that much time left to land the PRs blocked on this for the next helix release. Quite a few people have been daily driving these changes in helix (by using my PR based on this as their daily driver). In that PR a line iteration is performed on every keypress (and incorrect iterations is imminently apparent). So it's been fuzzed quite a bit. That together with the very detailed testsuite gives me a lot of confidence in the implementation.

cessen · 2022-11-25T05:19:31Z

Hi @pascalkuthe,

Sorry for the silence. I'll make sure to get to this by at least end of day Sunday (possibly sooner).

cessen · 2022-11-28T00:21:26Z

Thanks for your patience! November ended up being super busy for me, so I wasn't able to get to things as promptly as I prefer.

I've done a cursory review, and although there are a few things I'd like to look into a bit deeper, there's nothing that jumps out to me as incorrect. Combined with the testing and real-world usage you discussed, I'm more than happy to make an alpha release out of this for Helix to depend on. And then I can dive deeper a bit more at my leisure over the next month or so to get things ready for a proper release.

Thanks a bunch for your work on this! It's been a wart in Ropey for a while, so it's awesome to finally get it addressed.

pascalkuthe mentioned this pull request Nov 1, 2022

Simple improvement to Rope::lines() iterator by adding a cache #69

Closed

pascalkuthe closed this Nov 1, 2022

pascalkuthe reopened this Nov 1, 2022

pascalkuthe force-pushed the master branch 3 times, most recently from 6bc8b8b to df8e38e Compare November 4, 2022 21:24

pascalkuthe marked this pull request as ready for review November 4, 2022 22:01

pascalkuthe mentioned this pull request Nov 4, 2022

Significantly improve performance of :reload helix-editor/helix#4457

Merged

pascalkuthe force-pushed the master branch from df8e38e to a98e9d8 Compare November 4, 2022 22:35

pascalkuthe mentioned this pull request Nov 5, 2022

(Git) diff gutter implementation helix-editor/helix#3890

Merged

cessen reviewed Nov 5, 2022

View reviewed changes

src/str_utils.rs Outdated Show resolved Hide resolved

cessen reviewed Nov 5, 2022

View reviewed changes

tests/proptest_tests.rs Outdated Show resolved Hide resolved

cessen mentioned this pull request Nov 6, 2022

fix Hash implementation #66

Closed

pascalkuthe force-pushed the master branch 2 times, most recently from d465d04 to 6a26ecf Compare November 7, 2022 14:36

pascalkuthe requested a review from cessen November 7, 2022 14:47

cessen reviewed Nov 8, 2022

View reviewed changes

src/iter.rs Outdated Show resolved Hide resolved

cessen reviewed Nov 8, 2022

View reviewed changes

cessen linked an issue Nov 9, 2022 that may be closed by this pull request

Make Lines iterator more efficient #25

Closed

pascalkuthe force-pushed the master branch from 6a26ecf to aac4bac Compare November 9, 2022 17:17

high performance lines iterator

861c553

pascalkuthe force-pushed the master branch from aac4bac to 861c553 Compare November 9, 2022 17:27

chore: add missing file for test case

86e962e

cessen reviewed Nov 9, 2022

View reviewed changes

src/iter.rs Outdated Show resolved Hide resolved

cessen reviewed Nov 9, 2022

View reviewed changes

src/iter.rs Outdated Show resolved Hide resolved

add comments and remove unnecessary branches from constructor

67bd1ec

cessen mentioned this pull request Nov 27, 2022

test case failed lines_exact_size_iter_04 #72

Closed

cessen linked an issue Nov 27, 2022 that may be closed by this pull request

test case failed lines_exact_size_iter_04 #72

Closed

cessen merged commit 431aeab into cessen:master Nov 28, 2022

bjorn-ove mentioned this pull request Jan 31, 2023

subtract with overflow in new line iterator #76

Closed

pascalkuthe mentioned this pull request Jan 31, 2023

Fix Line Iterator creation for empty Rope #77

Merged

pascalkuthe mentioned this pull request May 12, 2023

File explorer and tree helper (v3) helix-editor/helix#5768

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

faster line iterator #70

faster line iterator #70

pascalkuthe commented Nov 1, 2022 •

edited

Loading

cessen commented Nov 1, 2022

cessen commented Nov 1, 2022

pascalkuthe commented Nov 1, 2022 •

edited

Loading

archseer commented Nov 5, 2022

pascalkuthe commented Nov 5, 2022

pascalkuthe commented Nov 5, 2022 •

edited

Loading

archseer commented Nov 5, 2022

cessen commented Nov 5, 2022

cessen commented Nov 5, 2022

pascalkuthe commented Nov 5, 2022

pascalkuthe commented Nov 7, 2022

pascalkuthe commented Nov 7, 2022 •

edited

Loading

cessen commented Nov 8, 2022 •

edited

Loading

cessen Nov 8, 2022

pascalkuthe Nov 8, 2022 •

edited

Loading

cessen Nov 9, 2022 •

edited

Loading

pascalkuthe Nov 10, 2022

pascalkuthe commented Nov 9, 2022 •

edited

Loading

pascalkuthe commented Nov 11, 2022

pascalkuthe commented Nov 11, 2022 •

edited

Loading

cessen commented Nov 12, 2022

pascalkuthe commented Nov 13, 2022

cessen commented Nov 15, 2022

pascalkuthe commented Nov 15, 2022

pascalkuthe commented Nov 24, 2022

cessen commented Nov 25, 2022

cessen commented Nov 28, 2022 •

edited

Loading

faster line iterator #70

faster line iterator #70

Conversation

pascalkuthe commented Nov 1, 2022 • edited Loading

cessen commented Nov 1, 2022

cessen commented Nov 1, 2022

pascalkuthe commented Nov 1, 2022 • edited Loading

archseer commented Nov 5, 2022

pascalkuthe commented Nov 5, 2022

pascalkuthe commented Nov 5, 2022 • edited Loading

archseer commented Nov 5, 2022

cessen commented Nov 5, 2022

cessen commented Nov 5, 2022

pascalkuthe commented Nov 5, 2022

pascalkuthe commented Nov 7, 2022

pascalkuthe commented Nov 7, 2022 • edited Loading

cessen commented Nov 8, 2022 • edited Loading

cessen Nov 8, 2022

Choose a reason for hiding this comment

pascalkuthe Nov 8, 2022 • edited Loading

Choose a reason for hiding this comment

cessen Nov 9, 2022 • edited Loading

Choose a reason for hiding this comment

pascalkuthe Nov 10, 2022

Choose a reason for hiding this comment

pascalkuthe commented Nov 9, 2022 • edited Loading

pascalkuthe commented Nov 11, 2022

pascalkuthe commented Nov 11, 2022 • edited Loading

cessen commented Nov 12, 2022

pascalkuthe commented Nov 13, 2022

cessen commented Nov 15, 2022

pascalkuthe commented Nov 15, 2022

pascalkuthe commented Nov 24, 2022

cessen commented Nov 25, 2022

cessen commented Nov 28, 2022 • edited Loading

pascalkuthe commented Nov 1, 2022 •

edited

Loading

pascalkuthe commented Nov 1, 2022 •

edited

Loading

pascalkuthe commented Nov 5, 2022 •

edited

Loading

pascalkuthe commented Nov 7, 2022 •

edited

Loading

cessen commented Nov 8, 2022 •

edited

Loading

pascalkuthe Nov 8, 2022 •

edited

Loading

cessen Nov 9, 2022 •

edited

Loading

pascalkuthe commented Nov 9, 2022 •

edited

Loading

pascalkuthe commented Nov 11, 2022 •

edited

Loading

cessen commented Nov 28, 2022 •

edited

Loading