Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

faster line iterator #70

Merged
merged 3 commits into from
Nov 28, 2022
Merged

faster line iterator #70

merged 3 commits into from
Nov 28, 2022

Conversation

pascalkuthe
Copy link
Collaborator

@pascalkuthe pascalkuthe commented Nov 1, 2022

Note: This is a very early draft and still contains known issues. CI failures are known and expected (multiple tests relating to the reverse iterator still fail).
It's just posted here to allow comparison with the approach in #69 .

This PR is now mostly ready for review, although I still want to add some more comments and clean up the code a bit. But this PR works now so reviewing makes sense now. I have dropped the unrelated commits and removed debug prints.

This implements a line iterator using a custom chunks iterator that tracks the lowest common parent across line boundries to avoid a binary search in RopeSlice::new.

The results are pretty spectacular. On synthetic benchmarks it outperforms the old implementation by a factor between 3x and 9x.
See the run of the builtin benchmarks below (this is the wrong way around, I benchmarked the new implementation first and then the old one):
image

I have tested the iterator in helix with helix-editor/helix#4457, reloading a 1GB file improves the total reload time (excluding IO) from 14 seconds to 7 second.

@cessen
Copy link
Owner

cessen commented Nov 1, 2022

This is definitely the approach I wanted to take with this. Indeed, it's subtle to get right, which is why I put it off for so long, ha ha. Thanks for taking this on! I'll take a look at the actual code when my brain is functioning properly again.

@cessen
Copy link
Owner

cessen commented Nov 1, 2022

Ah... maybe I'm forgetting how github works, but it looks like the commits from your other PRs are rolled in with this one? Could you keep your PRs on separate branches so it's easier to review? (And also easier to accept some PRs without having to accept others too early.)

@pascalkuthe
Copy link
Collaborator Author

pascalkuthe commented Nov 1, 2022

Hey @cessen, thanks for all the feedback. Please take the time to recover, I hope you get well soon! Regarding the extra commits you are totally right about that. This PR is not ready for review yet (hence marked as draft) and not at all in a state where I would usually post it. It just wanted to post my state for the discussion in #69. There are still comments to add, a few edge cases to fix, benchmarks to perform and indeed removing the other commits. I was just starting off my local master, sorry about that.

Closing the PR was a missclick. Sorry about that

@archseer
Copy link
Contributor

archseer commented Nov 5, 2022

Should we also include 1594596?

@pascalkuthe
Copy link
Collaborator Author

Should we also include 1594596?

I would hope that this optimizes to the same code but I will check the assembly later to make sure. If the compiler can indeed not optimize this then including it might add a small performance win

@pascalkuthe
Copy link
Collaborator Author

pascalkuthe commented Nov 5, 2022

BTW the current implementation does not yet use SIMD for finding the next newline within a chunk because the functions available in str_indicies do not do what I need. I already have a SIMD version of find_next_line_break (same as line_to_byte_idx(text, 1) but returns None if no line break is found to disambiguate a newline char at the very end from no newline char) implemented and the reverse iteration function should be decently easy as well. From testing I did locally this has the potential to improve performance even further as most time is actually spent on finding the newlines within chunks and the SIMD versionof those functions is at lest 2x faster (for lines linger then 16 bytes)

Should I prepare a PR to str_indices for that and block this PR on that or should that be a follow-up PR?

@archseer
Copy link
Contributor

archseer commented Nov 5, 2022

Maybe open a draft PR targetting this PR. Github will automatically switch it to target master if this one lands first.

src/str_utils.rs Outdated Show resolved Hide resolved
tests/proptest_tests.rs Outdated Show resolved Hide resolved
@cessen
Copy link
Owner

cessen commented Nov 5, 2022

Those results look great! Awesome work.

I still don't have the energy for a full code review yet, but I left a couple of notes that I think can be addressed already.

@cessen
Copy link
Owner

cessen commented Nov 5, 2022

Should I prepare a PR to str_indices for that and block this PR on that or should that be a follow-up PR?

I don't think any changes to str_indices are necessary. I made an in-line review comment about how to implement your function in terms of lines::to_byte_idx().

@pascalkuthe
Copy link
Collaborator Author

Those results look great! Awesome work.

I still don't have the energy for a full code review yet, but I left a couple of notes that I think can be addressed already.

Thanks for your comments! I replied to these, both of those are on me.
I anticipated these exact review comments (were my first thoughts aswell) and wanted to add review comments myself to preempt them but I ended up forgetting about it. Sorry about that!
Please take your time to recover, I hope you feel better soon! Health always comes first.

@cessen cessen mentioned this pull request Nov 6, 2022
@pascalkuthe
Copy link
Collaborator Author

@cessen After rebasing this PR on master the tests you added actually cached a couple of bugs I didn't consider in my implementation (and then some more by the proptests because the changes exercised some new code paths). The implementation really is extremely subtle in places.

In the process I have also switched back to line_to_byte_idx as you suggested. I did not end up implementing that as a separate function because I only needed in one place and when considering the additional edge-cases the condition actually ended up being a bit more complex then you suggested.

The good news is that now all tests pass so this implementation is hopefully correct.

Because this PR is already quite big (and a large performance win in every situation) I would like to outscope the SIMD implementation for the reverse iteration for now. I think it would definitely be good to have that (and I know how to implement it) but it seems reasonable to do this in a followup PR (especially because I actually noticed that the current function needed to be adjusted to fix an edge-case PR and I am not 100% sure what the required API will be)

@pascalkuthe
Copy link
Collaborator Author

pascalkuthe commented Nov 7, 2022

@cessen I am not sure why the three tests are failing in the miri CI. They work fine in the normal CI/locally? Maybe some SIMD free fallback function used by MIRI behaves differently then the normal function? It doesn't seem related to this PR.

Edit: It seems the same MIRI tests fail on master too so its probably not related to this PR.

src/iter.rs Outdated Show resolved Hide resolved
@cessen
Copy link
Owner

cessen commented Nov 8, 2022

Because this PR is already quite big (and a large performance win in every situation) I would like to outscope the SIMD implementation for the reverse iteration for now.

Yeah, I think that's a good call. Honestly, it's not totally clear to me that it's worth bothering with at all. Because although reverse Char and Byte iterators have some common performance-critical use cases, I struggle to think of similar cases for reverse Lines iterators. So it might be one of those "wait until someone has a use case" things.

It seems the same MIRI tests fail on master too so its probably not related to this PR.

Yeah, I doubt it's related to this PR. Feel free to ignore them. I'll look into and fix them before the next release.

src/iter.rs Outdated
Comment on lines 1065 to 1292
pub(crate) fn new_with_range(
node: &Arc<Node>,
byte_idx_range: (usize, usize),
line_idx_range: (usize, usize),
) -> Lines {
Lines::new_with_range_at_impl::<false>(node, 0, byte_idx_range, line_idx_range)
}

pub(crate) fn new_with_range_at(
node: &Arc<Node>,
at_line: usize,
byte_idx_range: (usize, usize),
line_idx_range: (usize, usize),
) -> Lines {
Lines::new_with_range_at_impl::<true>(node, at_line, byte_idx_range, line_idx_range)
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be misunderstanding the code here, but it looks to me like new_with_range() and new_with_range_at() are inconsistent with each other. The following two calls should be identical:

Lines::new_with_range(node, byte_range, line_range);
Lines::new_with_range_at(node, 0, byte_range, line_range);

But it looks like they're different. Specifically, the former calls Lines::new_with_range_at_impl() with a false const generic parameter, and the latter with true.

I'm also a little nervous about that const generic parameter existing at all: I would expect it to either always be true or always be false. I haven't looked at the code closely enough yet, though, so I could certainly be wrong about that. But it smells a little to me.

Copy link
Collaborator Author

@pascalkuthe pascalkuthe Nov 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok this is definitly subtle but its the most elegant way to implement it I could come up with. I will try to explain:

First a small nitpick about your comment so it doesn't get confusing later:
The two iterators you posted are not always equivalent. They are only equivalent when you are iterating a full Rope (or a RopeSlice that happens to start within the first line of its top Node). However these two iterators are always equal if you use line_range.0 instead of 0:

Lines::new_with_range(node, byte_range, line_range) 
Lines::new_with_range_at(node, line_range.0, byte_range, line_range)

Its trivial to see that my code fulfill that criteria, because if you look into the new_with_range_at_impl function it actually special cases line == line_range.0 to always return Lines::new_with_range(node, byte_range, line_range) right at the start of the function.
The reason this special case is required is because the first line might start before the start of the RopeSlice (so before byte_range.0). That means the line start is potentially in a different chunk then the position you are looking for. To find the correct chunk you need to perform a tree search for the byte position byte_range.0 instead of the line line_range.0.

So really the const generic indicates:

  • false => start the iterator at the first line
  • true => start the iterator at the specified line (with a dynamic check if this line is the first line to switch to false)

This special case might seem odd but I will try to explain my reasoning a bit more:
A similar problem occurs at the end of the line_range (so line >= line_range.1). I handle this special case as follows:

let mut res = Lines::new_with_range_at(node, line_range.1 - 1, byte_range, line_range);
res.next();
res

This reuses the logic I already have inside the line iterator to correctly handle the end of the last line and therefore doesn't require yet another specical cased implementation of the new function.

You might considered doing the same for the start of the line:

let mut res = Lines::new_with_range_at(node, line_range.0 + 1, byte_range, line_range);
res.prev();
res

However this doesn't work when there is only a single line (then these two cases just endlessly recuse into each other) so you need to handle one of them as a special case.
Its admittedly somewhat of arbitrary which one of those two you special case but there are two main couple reasons I special cased the first line instead of the last line:

  • the implementation is slightly easier/more natural
  • the special case is slightly faster (because it does not involve stepping the iterator an additional time) and constructing an iterator at the first line should be way more common (just lines) then constructing an iterator at the last line

I think its a bit easier to think about new_with_range and new_with_range_at as two separate functions (where new_with_range_at will call new_with_range for line == line_range.0).
These two functions are just very similar so I used the const generic here to share most of the implementation.

Copy link
Owner

@cessen cessen Nov 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First a small nitpick about your comment so it doesn't get confusing later:

Ah, right. Thanks for pointing that out!

The reason this special case is required is because the first line might start before the start of the RopeSlice (so before byte_range.0). That means the line start is potentially in a different chunk then the position you are looking for. To find the correct chunk you need to perform a tree search for the byte position byte_range.0 instead of the line line_range.0.

I'll keep an eye out for this when I take a closer look at the code. But at first blush, I think I would have implemented the tree traversal code to account for both the line and byte starts, diving into the the furthest-forward of the two as it goes. Then (unless I'm still missing something, which is possible) there wouldn't need to be any special cases.

A similar problem occurs at the end of the line_range (so line >= line_range.1). I handle this special case as follows:

I guess part of what's throwing me off here is that it's not clear to me why these cases aren't handled in the traversal/stack-building code itself. It feels to me like there's a simpler implementation struggling to emerge here, but we haven't quite hit it yet.

Partly this is from experiences with my own code in the past: special cases that look a lot like this have very often indicated that my perspective on the problem is a little off.

But to be clear, that doesn't mean the code is incorrect. Not at all. Just that a more straightforward and also correct alternative implementation may be possible.

Having said all of that, I don't mean to block this PR on stuff like that. As long as it's correct and I can follow it, let's get this merged. Especially considering that there's already a lot of "too complex" code in Ropey anyway, which is my own doing. Ha ha. :-) I can always revisit the code and try to simplify it myself in the future.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thaught about your suggestion regarding doing the tree sesach for lines/bytes all at once but my intuition is that while its theoretically possible it would involve negative indexing (because you always have to keep track of your position relatively to the current node) and a bunch of extra conditions to handle that correctly so the sketch I have in my mind right now is actually more complex then the current implementation.

I can experiment a bit with it (the ideas in my mind are often wrong :D) and if I find s better solution in the future I might do a follow-up PR if you can live with the current implementation for now. This function took a while to to get right (edgecases like a rope slice split inside a CRLF must be considered correctlly) and so overall improving this would take a while

@cessen cessen linked an issue Nov 9, 2022 that may be closed by this pull request
@pascalkuthe
Copy link
Collaborator Author

pascalkuthe commented Nov 9, 2022

There was another problem with CI where medium.txt was removed which made the test I added for #67 fail.
I think I might have just forgotten to stage that file in that PR. I added a commit with that file but feel free to just fix that straight in master. It doesn't warrant another PR I think (I will drop the commit then)

src/iter.rs Outdated Show resolved Hide resolved
src/iter.rs Outdated Show resolved Hide resolved
@pascalkuthe
Copy link
Collaborator Author

Sorry I meant to fix your comments today but totally forgot about it. Its pretty late now here but I will do it tomorrow moring!

@pascalkuthe
Copy link
Collaborator Author

pascalkuthe commented Nov 11, 2022

I fixed the mistakes in the comments and added a ton more. In the process I went trough the entire code again and found two unnecessary branches in the new_with_range_at function. These branches had simply no effect anymore (they were only needed in previous versions of the implementation). This simplified that function quite a bit (although it did not remove the const generic).

I think/hope that the added comments make this PR a easier to review.
I would recommend starting with the next_impl function. The prev_impl function is a bit more tricky and I think the context of the constructor isn't really required for reviewing the next_impl function.

@cessen
Copy link
Owner

cessen commented Nov 12, 2022

Thanks @pascalkuthe! I'll try to get to this as soon as I can. A somewhat more urgent task came up in another project, but I think that will probably wrap up within a week. Let me know if this gets urgent, and I'll bump up the priority.

@pascalkuthe
Copy link
Collaborator Author

I think @archseer was planning to do a new helix release at the end of this month and wanted to include my two PRs that are blocked on a new ropey version (helix-editor/helix#3890 and helix-editor/helix#4457). It would be awesome if that would workout. I think we could still manage that if we continue next week with the review and nothing mayor turns up but I of-course understand that other things have to take priority @cessen.

@cessen
Copy link
Owner

cessen commented Nov 15, 2022

Okay, so things have gotten even more hectic. Long story short: I'm working as the VFX supervisor on a film shoot this week (Mon-Fri). I thought I was going to have time in the evenings to work on the other task I mentioned, but that mostly doesn't seem to be the case. So that other task (which also has a deadline of end of the month) is also getting postponed, which would push out the full review of this PR even further.

I don't think it makes sense to keep you guys waiting on me, particularly if you want to make a release by the end of the month. So I think what I might do is this:

  1. Do a cursory review, just to make sure nothing obviously wrong jumps out.
  2. Merge the PR.
  3. Publish a pre-release to crates.io that you guys can use for the upcoming Helix release.
  4. When I get the time, review the code properly and just make any changes I'd like to see myself. And then make a proper release.

Does that sound reasonable? Basically, I don't want to block you guys, and I definitely want to merge this PR. But I also don't want to make a "proper" release without thoroughly reviewing the code first.

@pascalkuthe
Copy link
Collaborator Author

Okay, so things have gotten even more hectic. Long story short: I'm working as the VFX supervisor on a film shoot this week (Mon-Fri). I thought I was going to have time in the evenings to work on the other task I mentioned, but that mostly doesn't seem to be the case. So that other task (which also has a deadline of end of the month) is also getting postponed, which would push out the full review of this PR even further.

I don't think it makes sense to keep you guys waiting on me, particularly if you want to make a release by the end of the month. So I think what I might do is this:

1. Do a cursory review, just to make sure nothing obviously wrong jumps out.

2. Merge the PR.

3. Publish a pre-release to crates.io that you guys can use for the upcoming Helix release.

4. When I get the time, review the code properly and just make any changes I'd like to see myself.  And then make a proper release.

Does that sound reasonable? Basically, I don't want to block you guys, and I definitely want to merge this PR. But I also don't want to make a "proper" release without thoroughly reviewing the code first.

I fully understand that other things have to take priority.
Your proposal sounds great, thank you for offering that!
I have spoken with @archseer and a crates.io prerelease like 0.16.0-rc1) with a cursory reviewed version of this PR would unblock the relevant PRs for helix.

I hope that the comments I added will allow the cursory review to be quick so it doesn't take too much of your time.
Once you have more time again and want dive fully into the details of this PR in the future, I will still be happy to answer any questions about the implementation (or implement improvements if you don't want do it yourself).

@pascalkuthe
Copy link
Collaborator Author

@cessen sorry for the ping. Do you think you could take a look at this soon? The end of November is only a week away so there is not that much time left to land the PRs blocked on this for the next helix release. Quite a few people have been daily driving these changes in helix (by using my PR based on this as their daily driver). In that PR a line iteration is performed on every keypress (and incorrect iterations is imminently apparent). So it's been fuzzed quite a bit. That together with the very detailed testsuite gives me a lot of confidence in the implementation.

@cessen
Copy link
Owner

cessen commented Nov 25, 2022

Hi @pascalkuthe,

Sorry for the silence. I'll make sure to get to this by at least end of day Sunday (possibly sooner).

@cessen cessen linked an issue Nov 27, 2022 that may be closed by this pull request
@cessen
Copy link
Owner

cessen commented Nov 28, 2022

Thanks for your patience! November ended up being super busy for me, so I wasn't able to get to things as promptly as I prefer.

I've done a cursory review, and although there are a few things I'd like to look into a bit deeper, there's nothing that jumps out to me as incorrect. Combined with the testing and real-world usage you discussed, I'm more than happy to make an alpha release out of this for Helix to depend on. And then I can dive deeper a bit more at my leisure over the next month or so to get things ready for a proper release.

Thanks a bunch for your work on this! It's been a wart in Ropey for a while, so it's awesome to finally get it addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

test case failed lines_exact_size_iter_04 Make Lines iterator more efficient
3 participants