adt: split interval tree by right endpoint on matched left endpoints#19768
Conversation
…dpoints, to improve find() performance Signed-off-by: redwrasse <mail@redwrasse.io>
|
Hi @redwrasse. Thanks for your PR. I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Please include benchmark from #19769 in the PR. |
Will do, I'll investigate using benchstats. |
Will try to get benchstats added in the next few days. |
Added benchmarks to this PR description, using benchstat. cc @serathius |
|
Can you add the benchmarks to the PR so we can use them to evaluate performance in the future? We could also consider merging them first in separate PR. |
|
If there is no performance change for |
|
If I'm understanding , are you suggesting removing the code changes to I can definitely add in the benchmarks from the description, if we feel they are appropriate. |
Sorry, might have misunderstood. Didn't noticed that you changed tree structure. |
|
cc @ahrtr |
|
Cormen "Introduction to Algorithms", Chapter 14 Exercise 14.3.5 is I think relevant: 14.3-5 |
|
/ok-to-test |
| for x != ivt.sentinel { | ||
| y = x | ||
| if z.iv.Ivl.Begin.Compare(x.iv.Ivl.Begin) < 0 { | ||
| // Split on left endpoint. If left endpoints match, instead split on right endpoint. |
There was a problem hiding this comment.
Cormen "Introduction to Algorithms", Chapter 14 Exercise 14.3.5 is I think relevant:
14.3-5
Suggest modifications to the interval-tree procedures to support the new operation INTERVAL-SEARCH-EXACTLY(T,i), where T is an interval tree and i is an interval. The operation should return a pointer to a node x in T such that x.int.low = i.low and x.int.high = i.high, or T.nil if T contains no such node. All operations, including INTERVAL-SEARCH-EXACTLY, should run in O(lg n) time on an n-node interval tree.
Can you update the function docstring?
There was a problem hiding this comment.
cc @redwrasse @ahrtr
Can we address this in followup?
There was a problem hiding this comment.
Sorry, missed this comment. The comment for the Insert method isn't accurate anymore, we need to update it. We also need to add comment for the find method. @redwrasse
Lines 437 to 468 in c849507
There was a problem hiding this comment.
@serathius @ahrtr I'll open an MR for updating the comments.
There was a problem hiding this comment.
Created a pull request with docstring updates: #20015
Codecov ReportAttention: Patch coverage is
Additional details and impacted files
... and 38 files with indirect coverage changes @@ Coverage Diff @@
## main #19768 +/- ##
==========================================
+ Coverage 68.78% 68.87% +0.09%
==========================================
Files 421 424 +3
Lines 35857 35880 +23
==========================================
+ Hits 24665 24714 +49
+ Misses 9759 9739 -20
+ Partials 1433 1427 -6 Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
|
Coverage looks pretty good https://app.codecov.io/gh/etcd-io/etcd/pull/19768?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=etcd-io Just one line missing in find |
|
/retest |
| if beginCompare < 0 { | ||
| x = x.left | ||
| } else if beginCompare == 0 { | ||
| if endCompare < 0 { | ||
| x = x.left | ||
| } else { | ||
| x = x.right | ||
| } | ||
| } else { | ||
| x = x.right |
There was a problem hiding this comment.
Pls revert the change to the Insert method
| if beginCompare < 0 { | |
| x = x.left | |
| } else if beginCompare == 0 { | |
| if endCompare < 0 { | |
| x = x.left | |
| } else { | |
| x = x.right | |
| } | |
| } else { | |
| x = x.right | |
| if beginCompare < 0 { | |
| x = x.left | |
| } else { | |
| x = x.right |
| } else if beginCompare == 0 { | ||
| if z.iv.Ivl.End.Compare(x.iv.Ivl.End) < 0 { | ||
| x = x.left | ||
| } else { | ||
| x = x.right | ||
| } |
There was a problem hiding this comment.
You changed the behaviour, so it's a breaking change.
Previously it only checks the interval's Begin; an intervalTree inserts a new node to the left if newNode.Ivl.Begin < x.Ivl.Begin, otherwise inserts it to the right.
Now you not only checks the Begin, but also checks the End, and inserts the new node to the left if Begin matches and newNode's End is less.
This PR's purpose is to optimize the exact search (find), I don't see a reason why update the Insert. So please consider to revert the change. Pls also see comment for the find method.
There was a problem hiding this comment.
@ahrtr thanks very much for reviewing.
So now I'm confused. The goal I had with this MR / issue is indeed to speed up Find to logarithmic time (as, for example, the Cormen 14.3-5 excercise addresses), instead of the existing visitor implementation. To do this, in say a textbook implementation, I thought requires updating both Find and Insert operations to further split on right endpoint if the left endpoints are matched?
There was a problem hiding this comment.
to further split on right endpoint if the left endpoints are matched
As mentioned above,
- You can optimize the
find, no matter you update the Insert. Please let me know if it isn't true. - Changing the Insert changes the behaviour. So I suggest not to change it.
There was a problem hiding this comment.
I think you're proposing: keep the existing interval tree structure of splitting left if less than left endpoint, else split right. Update the Find implementation to follow this split logic, hence an optimization in the sense that Find will never be checking both left and right subtrees.
With this logic I see the existing etcd interval tree tests fail. The issue is I think this approach doesn't preserve red-black tree rotation invariance.
(code changes for below snippets on this branch 8412021)
Here is updated find, using the proposed logic of split left if less than left endpoint, else split right:
// find the exact node for a given interval
func (ivt *intervalTree) find(ivl Interval) *intervalNode {
x := ivt.root
// Search until hit sentinel or exact match.
for x != ivt.sentinel {
beginCompare := ivl.Begin.Compare(x.iv.Ivl.Begin)
endCompare := ivl.End.Compare(x.iv.Ivl.End)
if beginCompare == 0 && endCompare == 0 {
// Found a match.
return x
}
// Split on left endpoint: if less than, go left, else go right.
if beginCompare < 0 {
x = x.left
} else {
x = x.right
}
}
return x
}
and a corresponding unit test (which fails) I think illustrating the point about rotations:
func TestProposedFindExample(t *testing.T) {
//won't work because won't satisfy tree rotation invariance property required of red-black trees, eg. if [2,7] is written then can't asssme all subsequently written [2,*] entities will remain in right subtree of [2,7]- won't, because tree rotations are possible.
// OTOH with the proposed approach, all subsequent [2, x] writes with x > 7 will land in right/after of [2,7], and those with x < 7 will land to left /before. Under tree rotation ordering is preserved.
ivt := NewIntervalTree()
lEndp := int64(2)
rEndps := []int64{7, 3, 9}
val := 123
for _, re := range rEndps {
ivt.Insert(NewInt64Interval(lEndp, re), val)
}
// What we would expect from 'insert left if less than left endpoint, else insert right' without tree rotations:
// (we can generate this tree by commenting out the `ivt.insertFixup(z)` line in the `Insert` op)
// Insert [2, 7), then Insert [2, 3), then Insert [2, 9) becomes:
// [2, 7)
// \
// [2, 3)
// \
// [2, 9)
// Instead, due to rotations (rb-fixup), we get:
// [2, 3)
// / \
// [2, 7) [2, 9)
// Find fails because it assumes the former tree structure, eg. thinks it can always search right in the above example:
for _, re := range rEndps {
ivl := NewInt64Interval(lEndp, re)
assert.NotNil(t, ivt.Find(ivl))
assert.Equal(t, ivl, ivt.Find(ivl).Ivl)
}
}
There was a problem hiding this comment.
I see. I was thinking the find should work as long as it matches logic (search path) as Insert, but actually it might not match due to insertFixup.
The question for now is will insertFixup cause the same problem for this PR?
In this PR, you updated both find and Insert, and follow the same logic (search path): splitting both left(Begin) and right(End) endpoints. Due to insertFixup, is it possible that it may also cause the find fail?
There was a problem hiding this comment.
I'm not an expert on these algorithms, but my working assumption has been that the total ordering introduced by secondary split on right endpoints allows satisfying the tree rotation invariance needed of red-black tree structure. I think this is the textbook approach (eg. the Cormen exercise referenced earlier.)
Any thoughts for how to further test/guarantee correctness? During development I relied on the TestIntervalTreeRandom, which is parameterized by # of nodes maxv, set to 128. Increasing that by an order of magnitude, and rerunning, I didn't encounter any test failures.
There was a problem hiding this comment.
Any thoughts for how to further test/guarantee correctness?
Good question.
- I think both
rotateLeftandrotateRightalways keep the invariable property:x.left < x < x.right
rotateLeft (a):
a b
\ / \
b ---> a d
/ \ \
c d c
rotateRight (a):
a b
/ / \
b ---> c a
/ \ /
c d d
- I ran
go test -run TestIntervalTreeRandom -v -count 200 -failfastmultiple times, and always passed.
There was a problem hiding this comment.
As mentioned in #19768 (comment), it changes the behaviour, but I think we are good as long as we don't break the IntervalTree API
There was a problem hiding this comment.
Sounds good, thanks for reviewing and investigating @ahrtr!
ahrtr
left a comment
There was a problem hiding this comment.
LGTM
Thanks for the optimisation.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ahrtr, redwrasse, serathius The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
We should add a performance item in 3.7 changelog, let me take care of together with another change. |
Splits interval trees by right endpoint on matched left endpoints, to improve exact interval matching performance.
This addresses the issue: #19769
Additionally I've attached below code snippets and outputs for non-rigorous (Go benchmarks on before/after performance of
Find()andInsert()on randomly built interval trees with multiple occurrences of given left/right endpoint values, which is the regime where I think a performance difference may appear. Run on my laptop, there appears to be no change to theInsert()op performance, but a roughly 88% at p-value of 0 (according to benchstat output below) performance increase inFind(), on such interval trees.Benchmark results for before (count of 10 in each case)
Benchmark results for after:
Benchstat (https://pkg.go.dev/golang.org/x/perf/cmd/benchstat#hdr-Example) summary:
which I believe says, for the above benchmark cases, 88% speedup with p-value of 0 for
Find, and no noticeable difference forInsert.