-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hashing of ranges is awful #5778
Comments
I'm curious, why do you have to iterate through the entire range? |
That's the point, you don't. It's just that ranges lack a specific implementation of |
Oh I know how this happened: ranges are |
Add this to the list of isequal issues. We should maybe have a label for this. |
There might not be anything we can do here; I expect |
I didn't realize that |
We also have |
I feel like the |
In theory; but there would need to be an O(1) algorithm that gives the same answer as the O(n) hash of a dense array of the same values. Or we could check whether a given dense vector is equal to some range, but that doesn't sound very easy either. |
We already decided in #3385 to make Why doesn't the same logic apply here? Have |
That decision has been an almost unmitigated disaster. We really need to change it back. |
What disastrous consequences have ensued? I just don't see how any other decision could possibly be practical if you want (a) |
I wouldn't describe it as an unmitigated disaster, but as sometimes mildly inconvenient. |
This, among other things: julia> d = Dict{Int32,Any}()
Dict{Int32,Any}()
julia> d[1] = "foo"
ERROR: 1 is not a valid key for type Int32
in setindex! at dict.jl:502 I don't have time to give a comprehensive list of the problems, but it does not work well. It makes dictionaries super finicky and brittle. |
That seems like an orthogonal issue. Why not just define |
@simonster, what's wrong with throwing an error in that case? Seems like that's what I would want: if |
Well, this has been fun to play around with. I went ahead and wrote something to calculate the equivalent I'm solving it right now by using |
@staticfloat, you need to special case zero element ranges julia> my_hash(1:-1:10)
0x90cc5625201d9479
julia> my_hash([1:-1:10])
0xdb33fc5eee865dad |
NaN throws a wrench into the works again because |
also decreasing ranges don't work julia> my_hash([10:-1:1])
0xbd06ea18dc0adc1d
julia> my_hash(10:-1:1)
0x0158e13259077bd8 |
@jakebolewski Thanks for those, I've updated it, and added checks in that second cell. |
The problem with calling |
@staticfloat did you try fuzz testing the floating point implementation? It fails the majority of the time for me. julia> n_failures = 0
0
julia> for _ in 1:1_000_000
start = 0.0
step = rand()
stop = 10.0
r = start:step:stop; rv = [r]
if my_hash(r) != my_hash(rv)
n_failures += 1
end
end
julia> n_failures / float(1_000_000)
0.875346 |
Sigh. I mean, I really wish this weren't such problem, but it is. |
@jakebolewski I think that's because you were using I've updated the notebook again with a fuzz test, which shows all 10,000 cases passing. |
Cool! I didn't catch the name change. |
Fair enough, there really shouldn't be a name change if something like this ever gets used. |
I don't see the problem with relying on the dictionary type. Why shouldn't a |
The more general problem is that you lose some invariants that most people would expect to hold. There can be a value |
It's looking like we will change |
Just had another thought: if
|
Whatever may happen here, |
@JeffBezanson – that particular example would be fixed by my float range PR. There remain, however, other cases that are still broken with the current logic. This just means our equality predicate for ranges is currently wrong. This stuff is very subtle when you mix types. But we already have to face that with |
@GlenHertz adding more equality functions probably won't help (many people already think there are too many). You can never have enough; at a certain point the kind of equality needed is application-specific. @StefanKarpinski true, I just don't yet know a fast algorithm for |
make ranges and arrays unequal, faster hashing of ranges, fixes #5778
Regarding #7867, it could be possible to hash ranges the same as vectors if hashing of vectors was changed so that it looks at a few elements to figure out what range the vector would correspond to, if any. The hash for the vector could be created from the hash of that range, plus deltas between the predicted and actual elements. But it's a pretty brittle solution, and I'm not even sure it's possible given the number of range types that we have by now. |
The first issue is that this is incredibly slow:
julia/base/dict.jl
Lines 246 to 252 in 72a65be
It's unclear to me why we're using indexing here instead of just iterating, but the code if very old so maybe it made sense at the time.
The text was updated successfully, but these errors were encountered: