-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replaceordereddict #386
Replaceordereddict #386
Conversation
On branch newiterationsortedcontainers Changes to be committed: modified: ../docs/src/sorted_containers.md modified: DataStructures.jl modified: balanced_tree.jl modified: container_loops.jl modified: ../test/test_sorted_containers.jl This commit updates container_loops.jl to use the new iteration protocol (introduced in 0.7.0-DEV). It should be backwards compatible with 0.6.2. In addition, it fixes a bug in container_loops.jl in which the length() function when applied to subranges (i.e., inclusive(a,b,c) or exclusive(a,b,c)) returned the length of the whole container instead of the length of the subrange. (There should be no value returned for the length of the subrange since the data structure does not support an O(1) algorithm or even O(log n) algorithm to compute the length.) Some other smaller changes in this commit are as follows. - IntSet was changed to BitSet (name change in 0.7.0-DEV) - Small updates to documentation - Some assert statements in balanced_tree.jl that were present during development are deleted.
This PR completely rewrites OrderedDict so that it is implemented in terms of Base containers Dict and Vector instead of the previous implementation, which was a modified version of dict.jl. This is to improve long-term maintainability of the code. However, there are performance regressions with this rewrite for which I will open issues on GitHub. This PR also reimplements OrderedSet for the new version of OrderedDict. This PR removes the getindex and setindex! methods for OrderedSet. These methods were not documented, and it is not clear how they are supposed to work in the presence of deletions. It also removes the findfirst (etc) methods for OrderedSet, since those rely on getindex. The getindex, setindex! and find tests have been removed from test_ordered_set. Typos; add using statements to runtests more typos Typos added push! typo generalize push bug remove push tuple add "in" for keys; wrong assertion bug needs odx instead of od change a line fix tuple -> pair another get method typo in sizehint Fix typos in 0.6.2 iteration fixed dict_sorting fixed dict_sorting again Remove getindex and setindex from ordered set fixed next fixed pop!(c) (single arg) remove leftover line bug in pop! removed findfirst tests delete! no longer errors on missing key no change
First, some history: I wrote the first version of Regarding this version:
Going to review this now. |
Thanks for the history! I suspected something like that must have occurred. I will try to rebase these changes on the latest master and split into two PR's. In the next comment I'll post my thoughts on partially recovering performance. |
I see three ways to get a more performant
First, as observed by Martin Holters, if one repeatedly inserts and deletes keys and then carries out many loops over the whole OrderedDict, then the performance of this PR can be arbitrarily worse (i.e., a ratio greater than any constant factor) compared to the old implementation. This can be fixed as follows: Every time a key is added, one checks whether a1 has become more than twice as long as d1, and if so, one runs a compression on a1. This will return the performance ratio to a constant factor in all cases, I believe. Second, the new code often makes multiple probes (as many as 3) into the underlying Dict for a single operation. To fix this, a token interface to Dict is needed. There is already an open issue for this here: JuliaLang/julia#24454, so maybe we can press for this issue to be addressed. If these two fixes are applied, then the new code for OrderedDict will still be slower than the old; the ratio will depend on how long hashing takes, but I expect the ratio to range from 1.25 to 2. |
This PR is no longer needed because of #448. |
Addresses problems with making OrderedDict compatible with 0.7 via a complete rewrite.