-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hashing Tuples Based on Type #6870
Comments
More generally, we need to figure out a coherent story for collection hashing. Options:
Other options? Related issues are ranges – ranges are conceptually just arrays of values, but hashing them that way is ridiculously slow, so we may not want to do that, even though it's what option 3 would dictate. |
We already figured all of that out. I think this is just a mistake: https://github.com/JuliaLang/julia/blob/master/base/hashing2.jl#L170 We should only distinguish based on the general kind of collection (tuple vs. array vs. associative vs. range). |
Ah, well, I wasn't privy to that particular decision, so I wasn't sure what to do when I implemented that part of the new hashing behavior. |
That aspect of |
That issue doesn't say anything about collections or containers in general, just ranges. You may have made a more general decision, but it wasn't communicated in the discussion thread in any way. |
Of course that issue only covers ranges, but isequal has worked that way for other types for as long as I can remember. The range thing was the only relatively recent decision in that category. |
Really? julia> zeros(5,5) == spzeros(5,5)
true
julia> isequal(zeros(5,5), spzeros(5,5))
true I know you mentioned that in the issue, but clearly we're not applying a consistent policy yet. |
Without the range change, we would be comparing all AbstractArrays by shape and contents. The range change introduces the possibility of container distinctions within AbstractArray. We know we want to compare strings only by character sequence, and I think the general approach to |
That's a pretty complicated policy. Could you outline it? |
Containers are isequal if they have the same general structure, and all elements are isequal. Obviously the first part is subject to interpretation. |
So the question is what it means for two collections to have the "same general structure" – that's the hard part that needs to be spelled out. So far we have the following:
What about sets? I'm not sure that sparse arrays and dense arrays should be the same. It would be good to have a generic function that maps a collection to its "hash type" – which is what I would prefer to call the "general structure" (better names would also be good). |
One further complication is that shape is significant for arrays.
|
If I have a tuple ("a", "b", "c") where "a", "b", and "c" are substrings then this is not equivalent to a tuple ("a", "b", "c") where "a", "b", and "c" are strings.
Since substrings are strings, it seems like these two tuples should be equivalent.
The text was updated successfully, but these errors were encountered: