-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
faster line iterator #70
Conversation
This is definitely the approach I wanted to take with this. Indeed, it's subtle to get right, which is why I put it off for so long, ha ha. Thanks for taking this on! I'll take a look at the actual code when my brain is functioning properly again. |
Ah... maybe I'm forgetting how github works, but it looks like the commits from your other PRs are rolled in with this one? Could you keep your PRs on separate branches so it's easier to review? (And also easier to accept some PRs without having to accept others too early.) |
Hey @cessen, thanks for all the feedback. Please take the time to recover, I hope you get well soon! Regarding the extra commits you are totally right about that. This PR is not ready for review yet (hence marked as draft) and not at all in a state where I would usually post it. It just wanted to post my state for the discussion in #69. There are still comments to add, a few edge cases to fix, benchmarks to perform and indeed removing the other commits. I was just starting off my local master, sorry about that. Closing the PR was a missclick. Sorry about that |
6bc8b8b
to
df8e38e
Compare
Should we also include 1594596? |
I would hope that this optimizes to the same code but I will check the assembly later to make sure. If the compiler can indeed not optimize this then including it might add a small performance win |
BTW the current implementation does not yet use SIMD for finding the next newline within a chunk because the functions available in str_indicies do not do what I need. I already have a SIMD version of Should I prepare a PR to str_indices for that and block this PR on that or should that be a follow-up PR? |
Maybe open a draft PR targetting this PR. Github will automatically switch it to target |
Those results look great! Awesome work. I still don't have the energy for a full code review yet, but I left a couple of notes that I think can be addressed already. |
I don't think any changes to str_indices are necessary. I made an in-line review comment about how to implement your function in terms of |
Thanks for your comments! I replied to these, both of those are on me. |
d465d04
to
6a26ecf
Compare
@cessen After rebasing this PR on master the tests you added actually cached a couple of bugs I didn't consider in my implementation (and then some more by the proptests because the changes exercised some new code paths). The implementation really is extremely subtle in places. In the process I have also switched back to The good news is that now all tests pass so this implementation is hopefully correct. Because this PR is already quite big (and a large performance win in every situation) I would like to outscope the SIMD implementation for the reverse iteration for now. I think it would definitely be good to have that (and I know how to implement it) but it seems reasonable to do this in a followup PR (especially because I actually noticed that the current function needed to be adjusted to fix an edge-case PR and I am not 100% sure what the required API will be) |
@cessen I am not sure why the three tests are failing in the miri CI. They work fine in the normal CI/locally? Maybe some SIMD free fallback function used by MIRI behaves differently then the normal function? It doesn't seem related to this PR. Edit: It seems the same MIRI tests fail on master too so its probably not related to this PR. |
Yeah, I think that's a good call. Honestly, it's not totally clear to me that it's worth bothering with at all. Because although reverse
Yeah, I doubt it's related to this PR. Feel free to ignore them. I'll look into and fix them before the next release. |
src/iter.rs
Outdated
pub(crate) fn new_with_range( | ||
node: &Arc<Node>, | ||
byte_idx_range: (usize, usize), | ||
line_idx_range: (usize, usize), | ||
) -> Lines { | ||
Lines::new_with_range_at_impl::<false>(node, 0, byte_idx_range, line_idx_range) | ||
} | ||
|
||
pub(crate) fn new_with_range_at( | ||
node: &Arc<Node>, | ||
at_line: usize, | ||
byte_idx_range: (usize, usize), | ||
line_idx_range: (usize, usize), | ||
) -> Lines { | ||
Lines::new_with_range_at_impl::<true>(node, at_line, byte_idx_range, line_idx_range) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be misunderstanding the code here, but it looks to me like new_with_range()
and new_with_range_at()
are inconsistent with each other. The following two calls should be identical:
Lines::new_with_range(node, byte_range, line_range);
Lines::new_with_range_at(node, 0, byte_range, line_range);
But it looks like they're different. Specifically, the former calls Lines::new_with_range_at_impl()
with a false
const generic parameter, and the latter with true
.
I'm also a little nervous about that const generic parameter existing at all: I would expect it to either always be true or always be false. I haven't looked at the code closely enough yet, though, so I could certainly be wrong about that. But it smells a little to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok this is definitly subtle but its the most elegant way to implement it I could come up with. I will try to explain:
First a small nitpick about your comment so it doesn't get confusing later:
The two iterators you posted are not always equivalent. They are only equivalent when you are iterating a full Rope
(or a RopeSlice
that happens to start within the first line of its top Node
). However these two iterators are always equal if you use line_range.0
instead of 0
:
Lines::new_with_range(node, byte_range, line_range)
Lines::new_with_range_at(node, line_range.0, byte_range, line_range)
Its trivial to see that my code fulfill that criteria, because if you look into the new_with_range_at_impl
function it actually special cases line == line_range.0
to always return Lines::new_with_range(node, byte_range, line_range)
right at the start of the function.
The reason this special case is required is because the first line might start before the start of the RopeSlice
(so before byte_range.0
). That means the line start is potentially in a different chunk then the position you are looking for. To find the correct chunk you need to perform a tree search for the byte position byte_range.0
instead of the line line_range.0
.
So really the const generic indicates:
false
=> start the iterator at the first linetrue
=> start the iterator at the specified line (with a dynamic check if this line is the first line to switch tofalse
)
This special case might seem odd but I will try to explain my reasoning a bit more:
A similar problem occurs at the end of the line_range
(so line >= line_range.1
). I handle this special case as follows:
let mut res = Lines::new_with_range_at(node, line_range.1 - 1, byte_range, line_range);
res.next();
res
This reuses the logic I already have inside the line iterator to correctly handle the end of the last line and therefore doesn't require yet another specical cased implementation of the new
function.
You might considered doing the same for the start of the line:
let mut res = Lines::new_with_range_at(node, line_range.0 + 1, byte_range, line_range);
res.prev();
res
However this doesn't work when there is only a single line (then these two cases just endlessly recuse into each other) so you need to handle one of them as a special case.
Its admittedly somewhat of arbitrary which one of those two you special case but there are two main couple reasons I special cased the first line instead of the last line:
- the implementation is slightly easier/more natural
- the special case is slightly faster (because it does not involve stepping the iterator an additional time) and constructing an iterator at the first line should be way more common (just
lines
) then constructing an iterator at the last line
I think its a bit easier to think about new_with_range
and new_with_range_at
as two separate functions (where new_with_range_at
will call new_with_range
for line == line_range.0
).
These two functions are just very similar so I used the const generic here to share most of the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First a small nitpick about your comment so it doesn't get confusing later:
Ah, right. Thanks for pointing that out!
The reason this special case is required is because the first line might start before the start of the RopeSlice (so before byte_range.0). That means the line start is potentially in a different chunk then the position you are looking for. To find the correct chunk you need to perform a tree search for the byte position byte_range.0 instead of the line line_range.0.
I'll keep an eye out for this when I take a closer look at the code. But at first blush, I think I would have implemented the tree traversal code to account for both the line and byte starts, diving into the the furthest-forward of the two as it goes. Then (unless I'm still missing something, which is possible) there wouldn't need to be any special cases.
A similar problem occurs at the end of the line_range (so line >= line_range.1). I handle this special case as follows:
I guess part of what's throwing me off here is that it's not clear to me why these cases aren't handled in the traversal/stack-building code itself. It feels to me like there's a simpler implementation struggling to emerge here, but we haven't quite hit it yet.
Partly this is from experiences with my own code in the past: special cases that look a lot like this have very often indicated that my perspective on the problem is a little off.
But to be clear, that doesn't mean the code is incorrect. Not at all. Just that a more straightforward and also correct alternative implementation may be possible.
Having said all of that, I don't mean to block this PR on stuff like that. As long as it's correct and I can follow it, let's get this merged. Especially considering that there's already a lot of "too complex" code in Ropey anyway, which is my own doing. Ha ha. :-) I can always revisit the code and try to simplify it myself in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thaught about your suggestion regarding doing the tree sesach for lines/bytes all at once but my intuition is that while its theoretically possible it would involve negative indexing (because you always have to keep track of your position relatively to the current node) and a bunch of extra conditions to handle that correctly so the sketch I have in my mind right now is actually more complex then the current implementation.
I can experiment a bit with it (the ideas in my mind are often wrong :D) and if I find s better solution in the future I might do a follow-up PR if you can live with the current implementation for now. This function took a while to to get right (edgecases like a rope slice split inside a CRLF must be considered correctlly) and so overall improving this would take a while
There was another problem with CI where |
Sorry I meant to fix your comments today but totally forgot about it. Its pretty late now here but I will do it tomorrow moring! |
I fixed the mistakes in the comments and added a ton more. In the process I went trough the entire code again and found two unnecessary branches in the I think/hope that the added comments make this PR a easier to review. |
Thanks @pascalkuthe! I'll try to get to this as soon as I can. A somewhat more urgent task came up in another project, but I think that will probably wrap up within a week. Let me know if this gets urgent, and I'll bump up the priority. |
I think @archseer was planning to do a new helix release at the end of this month and wanted to include my two PRs that are blocked on a new ropey version (helix-editor/helix#3890 and helix-editor/helix#4457). It would be awesome if that would workout. I think we could still manage that if we continue next week with the review and nothing mayor turns up but I of-course understand that other things have to take priority @cessen. |
Okay, so things have gotten even more hectic. Long story short: I'm working as the VFX supervisor on a film shoot this week (Mon-Fri). I thought I was going to have time in the evenings to work on the other task I mentioned, but that mostly doesn't seem to be the case. So that other task (which also has a deadline of end of the month) is also getting postponed, which would push out the full review of this PR even further. I don't think it makes sense to keep you guys waiting on me, particularly if you want to make a release by the end of the month. So I think what I might do is this:
Does that sound reasonable? Basically, I don't want to block you guys, and I definitely want to merge this PR. But I also don't want to make a "proper" release without thoroughly reviewing the code first. |
I fully understand that other things have to take priority. I hope that the comments I added will allow the cursory review to be quick so it doesn't take too much of your time. |
@cessen sorry for the ping. Do you think you could take a look at this soon? The end of November is only a week away so there is not that much time left to land the PRs blocked on this for the next helix release. Quite a few people have been daily driving these changes in helix (by using my PR based on this as their daily driver). In that PR a line iteration is performed on every keypress (and incorrect iterations is imminently apparent). So it's been fuzzed quite a bit. That together with the very detailed testsuite gives me a lot of confidence in the implementation. |
Hi @pascalkuthe, Sorry for the silence. I'll make sure to get to this by at least end of day Sunday (possibly sooner). |
Thanks for your patience! November ended up being super busy for me, so I wasn't able to get to things as promptly as I prefer. I've done a cursory review, and although there are a few things I'd like to look into a bit deeper, there's nothing that jumps out to me as incorrect. Combined with the testing and real-world usage you discussed, I'm more than happy to make an alpha release out of this for Helix to depend on. And then I can dive deeper a bit more at my leisure over the next month or so to get things ready for a proper release. Thanks a bunch for your work on this! It's been a wart in Ropey for a while, so it's awesome to finally get it addressed. |
Note: This is a very early draft and still contains known issues. CI failures are known and expected (multiple tests relating to the reverse iterator still fail).It's just posted here to allow comparison with the approach in #69 .
This PR is now mostly ready for review, although I still want to add some more comments and clean up the code a bit. But this PR works now so reviewing makes sense now. I have dropped the unrelated commits and removed debug prints.
This implements a line iterator using a custom chunks iterator that tracks the lowest common parent across line boundries to avoid a binary search in
RopeSlice::new
.The results are pretty spectacular. On synthetic benchmarks it outperforms the old implementation by a factor between 3x and 9x.
See the run of the builtin benchmarks below (this is the wrong way around, I benchmarked the new implementation first and then the old one):
I have tested the iterator in helix with helix-editor/helix#4457, reloading a 1GB file improves the total reload time (excluding IO) from 14 seconds to 7 second.