Conversation
|
If I remember the idea correctly (@nalimilan worked on it) - we wante to avoid the |
The reason was just that the tests had an explicit method error check for that method. Edit: Okay there are other test failures on nightly |
The julia> code_llvm(Base.hash, Tuple{DataFrames.OnColRow})
; Function Signature: hash(DataFrames.OnColRow{T} where T)
; @ hashing.jl:28 within `hash`
define i64 @julia_hash_8877(ptr noundef nonnull %"x::<unknown type>") #0 {
top:
%jlcallframe1 = alloca [2 x ptr], align 8
; @ hashing.jl:28 within `hash` @ /Users/kc/CSV_slow/dev/DataFrames/src/join/core.jl:36
; ┌ @ Base_compiler.jl:54 within `getproperty`
store ptr %"x::<unknown type>", ptr %jlcallframe1, align 8
%0 = getelementptr inbounds ptr, ptr %jlcallframe1, i64 1
store ptr @"jl_sym#h#8879.jit", ptr %0, align 8
%jl_f_getfield_ret = call nonnull ptr @jl_f_getfield(ptr null, ptr nonnull %jlcallframe1, i32 2)
store ptr %"x::<unknown type>", ptr %jlcallframe1, align 8
store ptr @"jl_sym#row#8880.jit", ptr %0, align 8
%jl_f_getfield_ret1 = call nonnull ptr @jl_f_getfield(ptr null, ptr nonnull %jlcallframe1, i32 2)
; └
; ┌ @ essentials.jl:920 within `getindex`
%memoryref_data = load ptr, ptr %jl_f_getfield_ret, align 8
%jl_f_getfield_ret1.unbox2 = load i64, ptr %jl_f_getfield_ret1, align 8
%memoryref_offset = shl i64 %jl_f_getfield_ret1.unbox2, 3
%1 = getelementptr i8, ptr %memoryref_data, i64 %memoryref_offset
%memoryref_data3 = getelementptr i8, ptr %1, i64 -8
%2 = load i64, ptr %memoryref_data3, align 8
; └
; ┌ @ int.jl:379 within `xor`
ret i64 %2
; └
}even if it wasn't optimized away I am 99% sure it's performance impact would be unmeasurable. |
|
Based on the nightly failures I removed the xor with |
(the default is no longer zero) but should still be constant propagated |
|
There is some weird assumption made in this hashing code because there shouldn't be any real reason why you couldn't mix in the |
|
Nightly error is JuliaLang/julia#59857 |
|
The idea of @nalimilan was that |
|
I kind of feel like this should not be extending as here |
This was my first thought how we should refactor this. |
|
I suggest getting this in, in order to fix the immediate (quite severe) invalidation, and then further discussing if this needs to be a |
bkamins
left a comment
There was a problem hiding this comment.
OK. Can you please bump the patch version of DataFrames.jl in this PR, so that we can make a patch release? Thank you. (I will also wait some time for @nalimilan to have a look at this)
|
Done |
nalimilan
left a comment
There was a problem hiding this comment.
Thanks for spotting this @KristofferC. Looks OK as a quick fix, though refactoring to call a custom _hash function is probably a good idea to avoid defining a hash method which ignores its second argument.
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
|
Two approvals here and CI is ok (nightly is a parsing regression). Good to go? |
|
Thank you! |
|
I had a quick look at the code and I'm actually not sure we can avoid using |
is that an important invariant? because I don't think it is satisfied |
That comment isn't written in the most explicit way. What it means is that the result of this method (used for |
I was also checking this and yes - it seems we would need to replace |
|
the performance difference of mixing in one more hash value seems really very minimal. something like both avoids safety concerns of calling
ah my bad, I thought it was referring to |
|
Definitely work trying if performance seems acceptable. The function could almost call |
On 1.12 the following script:
takes
and with this PR it takes