Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hashing Tuples Based on Type #6870

Closed
AndrewBMartin opened this issue May 16, 2014 · 12 comments
Closed

Hashing Tuples Based on Type #6870

AndrewBMartin opened this issue May 16, 2014 · 12 comments
Labels
bug Indicates an unexpected problem or unintended behavior

Comments

@AndrewBMartin
Copy link

If I have a tuple ("a", "b", "c") where "a", "b", and "c" are substrings then this is not equivalent to a tuple ("a", "b", "c") where "a", "b", and "c" are strings.

Since substrings are strings, it seems like these two tuples should be equivalent.

@StefanKarpinski
Copy link
Member

More generally, we need to figure out a coherent story for collection hashing. Options:

  1. hash by collection type – this behavior would stay the same
  2. hash by collection eltype – this behavior would still stay the same
  3. hash by collection content – this behavior would change.

Other options? Related issues are ranges – ranges are conceptually just arrays of values, but hashing them that way is ridiculously slow, so we may not want to do that, even though it's what option 3 would dictate.

@JeffBezanson
Copy link
Member

We already figured all of that out. I think this is just a mistake:

https://github.com/JuliaLang/julia/blob/master/base/hashing2.jl#L170

We should only distinguish based on the general kind of collection (tuple vs. array vs. associative vs. range). isequal already does the right thing; hash just needs to follow it.

@StefanKarpinski
Copy link
Member

Ah, well, I wasn't privy to that particular decision, so I wasn't sure what to do when I implemented that part of the new hashing behavior.

@JeffBezanson
Copy link
Member

That aspect of isequal hasn't changed in a long time, except for ranges where we decided to make them unequal to arrays. That particular decision was in an issue that you filed (#5778).

@StefanKarpinski
Copy link
Member

That issue doesn't say anything about collections or containers in general, just ranges. You may have made a more general decision, but it wasn't communicated in the discussion thread in any way.

@JeffBezanson
Copy link
Member

Of course that issue only covers ranges, but isequal has worked that way for other types for as long as I can remember. The range thing was the only relatively recent decision in that category.

@StefanKarpinski
Copy link
Member

Really?

julia> zeros(5,5) == spzeros(5,5)
true

julia> isequal(zeros(5,5), spzeros(5,5))
true

I know you mentioned that in the issue, but clearly we're not applying a consistent policy yet.

@JeffBezanson
Copy link
Member

Without the range change, we would be comparing all AbstractArrays by shape and contents. The range change introduces the possibility of container distinctions within AbstractArray.

We know we want to compare strings only by character sequence, and I think the general approach to isequal on arrays and dicts is well established. An array shouldn't equal a dict, and since isequal(1, int32(1)), isequal(Int[1], Int32[1]) makes sense. So I think there is a policy in place, but one can question whether sparse should be interpreted as a different kind of container.

@StefanKarpinski
Copy link
Member

That's a pretty complicated policy. Could you outline it?

@JeffBezanson
Copy link
Member

Containers are isequal if they have the same general structure, and all elements are isequal.

Obviously the first part is subject to interpretation.

@StefanKarpinski
Copy link
Member

So the question is what it means for two collections to have the "same general structure" – that's the hard part that needs to be spelled out. So far we have the following:

  1. all strings have the same "general structure"
  2. ranges and arrays do not have the same "general structure"
  3. other arrays have the same "general structure"
  4. dictionaries all have the same "general structure"

What about sets? I'm not sure that sparse arrays and dense arrays should be the same. It would be good to have a generic function that maps a collection to its "hash type" – which is what I would prefer to call the "general structure" (better names would also be good).

@JeffBezanson
Copy link
Member

One further complication is that shape is significant for arrays.
On May 16, 2014 1:11 PM, "Stefan Karpinski" [email protected]
wrote:

So the question is what it means for two collections to have the "same
general structure" – that's the hard part that needs to be spelled out. So
far we have the following:

  1. all strings have the same "general structure"
  2. ranges and arrays do not have the same "general structure"
  3. other arrays have the same "general structure"
  4. dictionaries all have the same "general structure"

What about sets? I'm not sure that sparse arrays and dense arrays should
be the same. It would be good to have a generic function that maps a
collection to its "hash type" – which is what I prefer to call the "general
structure".


Reply to this email directly or view it on GitHubhttps://github.com//issues/6870#issuecomment-43356021
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

3 participants