-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance of RadixTree#getValuesForKeysStartingWith() #18
Comments
FWIW, I did some very basic testing with a custom interface NodeVisitor {
boolean visit(Node node);
} The return value is just there to allow the client to prune / abort. I suppose there should really be a second For large trees (I am testing with a tree with about 900K values) it looks like the performance is quite a bit better. Would it be possible to include something along these lines? Would it require splitting the Node interface just so that the visitor doesn't have access to |
Hi Knut, Yes that sounds reasonable, and I'd be open to adding something like this. It might indeed as you say require splitting the Node interface. The visitor must have read-only access. I don't have much time to spend investigating or working on this myself at the moment. Is this something you could implement yourself, with a view to submitting a pull request? From what you said above, your approach and ideas make sense AFAICS. So basically, if you could follow the code style/conventions in the project, and also maintain full test coverage, then I'd be happy to accept a pull request to add this feature. Of course I'd be available and more than happy to help out and answer any questions if you want to take this on! Niall |
Hi Niall, Thanks for the feedback. Yes, I would be willing to supply a corresponding PR. I will have to try to fit it in somehow... Here some questions:
Thinking ahead I already know of one more method I would like to have. Something like an So, if you don't see how you could benefit from these extensions anywhere in your projects, I would suggest to instead make |
In any case I could supply a PR which optimizes |
Hi Knut, Well I guess if you add a generic tree.visit(Visitor) method, then the actual algorithm to traverse the tree will be encapsulated in the Visitor object which is provided by the application. Does that make sense? So you would write a VisitNodesStartingWithVisitor in your application, and then supply this to the tree.visit() method. If this is the case, then we would not need to add many API methods to the trees themselves at all, even to support esoteric traversal algorithms. Applications would be free to define their own Visitor objects, external to this library. CQEngine isn't the only user of concurrent-trees. For example recently Cassandra database has been using these trees in SASI indexes. So if by adding generic support for the visitor pattern we would allow improved performance for power users without complicating things for regular users then I think this would be worthwhile. |
Hi Niall, In my prototype I had the actual traversal algorithm in the tree. That may be a bit unusual for the visitor pattern, but I have seen so many different implementations of that pattern, so I am not even sure how unusual it actually is. But what you suggest sounds like a good idea, as it doesn't pollute your API. I will look into it. If it can be decoupled that way, it may even make sense to supply some standard visitors as part of concurrent-trees. Or maybe at least the tree implementation classes could provide an abstract visitor base class, so the visitors don't necessarily need to duplicate standard traversal logic as in Any thoughts or reservations about an |
Yes sure, we could look at including some abstract or general purpose visitors as part of concurrent-trees. Sorry I forgot to say - yes the idea for the update() method seems reasonable too! |
Great! BTW, what's the minimum Java version supported by concurrent-trees? I am asking because I wonder if I can add a default |
It targets Java 6, so we can't use default methods I'm afraid :( I use Java 8 personally, but I'd like to keep Java 6 compatibility for Sent from my Android On Sep 23, 2016 13:24, "Knut Wannheden" [email protected] wrote:
|
OK. Thanks. |
RadixTree#searchForKeysStartingWith() replaces all of the three methods RadixTree#getKeysStartingWith(), #getValuesForKeysStartingWith(), and #getKeyValuePairsForKeysStartingWith(). The new method returns a TreeSearchResult object which declares the methods asKeyIterable(), asValueIterable(), and asKeyValuePairIterable(). So instead of calling tree.getKeysStartingWith("foo") a client would now call tree.searchForKeysStartingWith("foo").asKeyIterable(). The above refactoring allows for TreeSearchResult to be extended with other kinds of traversal methods. E.g. some kind of accept(Visitor) or a collectInto(Collection).
I found a solution which avoids some of the problems I saw with a generic As you can see that solution currently breaks the API. That would not actually be necessary, I just wanted to make the intention of the refactoring as clear as possible and get your feedback on that. But I am thinking it may make sense to go in this direction and deprecate (and in the future derelease) some parts of the API. This design would then make it simple to support other query result traversals like visitors or collectors. |
Hi Niall, Did you find some time to give my proposal some thought? A slight variation would be to add some kind of The thing I don't like about the generic Knut |
Hi Knut, Sorry I've been away for a few days, but I got some time just now to look into this in more detail. It's not really an option to change the existing APIs unless there would be a significant improvement. This library has become quite widely used so I'm cautious about making changes which would not be backward compatible, especially if they would not improve performance across the board. In this case, the proposal would improve performance for the getValuesForXXX() methods, but it would not improve the performance of and it would introduce the creation of additional objects into the getKeysXXX() and getKeyValuePairsXXX() methods. Your idea to restructure the APIs looks good, and ordinarily would totally make sense! But just in the case of this library I'm especially cautious about making significant changes internally because the existing code has been battle-tested in production for a number of years, and it's already in use in Cassandra and a number of apps. But going back to your original question - your concern was about object creation in the ConcurrentRadixTree#getDescendantValues() method? Can we just refactor that method to improve performance in a targeted way without impacting the other methods? I was able to eliminate the object creation in getDescendantValues() based on your excellent points in this branch. It seems quite straightforward and it's an isolated/targeted performance improvement. Would this be enough to solve the problem? While it eliminates creation of the NodeKeyPair objects, it still uses a stack, but we need to use some kind of a stack to traverse the tree anyway right? I'm still open to adding a visitor method if you'd like also. |
Hi Niall, Thanks for getting back to me and thanks for the small performance improvement. I was planning to to create a separate PR for that. I have a few other small performance improvements I would like to suggest. I will create separate issues or PRs for those. The improvement to Regarding the visitor method: What about adding the alternative methods I suggest (returning the search result object) without deleting the current methods? To not "pollute" the With a generic visitor method there are still some methods like Knut |
Hi Knut, Sounds good! Yes the searchTree method could be made public. Or rather, I guess it would be a method similar to it (which perhaps wraps it) but which returns read-only views of Node objects as we discussed earlier? Best regards, |
I wonder about the read-only Sorry if I keep going on about this, but do I understand you correctly that you don't like the idea of adding an additional |
Hi Knut, Yes the public interface RadixTree does not extend PrettyPrintable, so this is not part of the public API (intentionally). PrettyPrintable is indeed only an internal implementation detail. It could be changed or reimplemented as a visitor. Nobody should be depending on the current implementation. That said, I don't know if it will be worth changing it, as there's not a major benefit per-se - but it would indeed suit the visitor pattern quite well. We should decide on the name of the read-only node interface. If we call it ReadOnlyNode, then it may be counterintuitive that a modifiable Node interface would extend it. What do you think about VisitableNode? Also what naming conventions are you thinking regarding the visitor methods? -
Regarding the addition of searchForKeysStartingWith() etc. This method would just exist to allow a visitor to search for nodes starting with the current key, right? In other words, to make the existing algorithms accessible to a visitor? If so then I don't think we should use this approach, because the benefit of the Visitor pattern should be that the traversal algorithms are decoupled from the tree. And then, that any kind of algorithms could be used to traverse the tree without a need for supporting methods in the public API. So if you'd like to access methods such as searchTree() from visitors, another option would be to make those public static methods, and move them into a TraversalAlgorithms class? Would this work? |
The idea behind If a client needs to control the tree traversal, then I think it is probably enough to rely on |
There is already a forEach(Consumer) method, inherited from As I mentioned, the existing methods are widely used, so I have no intention of breaking backward compatibility unless doing so will introduce some performance benefit or other improvement. So if this new visitor feature will be implemented, it must be done in a backward compatible way, with minimal disruption and changes to the library. I don't think there would be any problem in moving the existing searchTree method into a separate TreeTraversals class though. As that way, that method will likely be inlined by HotSpot anyway, giving identical performance without API changes. |
Yes, with Java 8 the provided I will try to work out a change with a |
Would you for the 3.0 release consider changing the RadixTree API to include a |
Looking at the implementation of ConcurrentRadixTree#getDescendantValues() (as used by #getValuesForKeysStartingWith()) I get the impression that the performance could be improved by not using lazyTraverseDescendants(), as it creates all these intermediate NodeKeyPair objects, while here only the Node objects are relevant.
Maybe it would even be possible / make sense to support a visitor pattern. Then it should also be possible to skip the "stack" which now has to be maintained by the Iterator. For my use case that would also work very nicely, since I in the end want to process all returned values.
The text was updated successfully, but these errors were encountered: