-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand copy! to handle mixed nullability and views #97
Conversation
Codecov Report
@@ Coverage Diff @@
## master #97 +/- ##
==========================================
- Coverage 97.87% 97.76% -0.12%
==========================================
Files 9 9
Lines 660 672 +12
==========================================
+ Hits 646 657 +11
- Misses 14 15 +1
Continue to review full report at Codecov.
|
src/array.jl
Outdated
destinds, srcinds = linearindices(dest), linearindices(src) | ||
(dstart ∈ destinds && dstart+n-1 ∈ destinds) || throw(BoundsError(dest, dstart:dstart+n-1)) | ||
(sstart ∈ srcinds && sstart+n-1 ∈ srcinds) || throw(BoundsError(src, sstart:sstart+n-1)) | ||
n == 0 && return dest | ||
n < 0 && throw(ArgumentError(string("tried to copy n=", n, " elements, but n should be nonnegative"))) | ||
|
||
drefs = dest.refs | ||
srefs = src.refs | ||
srefs = isa(src, SubArray) ? [x.level for x in src] : src.refs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really inefficient. Why can't we access the refs of the parent?
src/array.jl
Outdated
@@ -433,9 +434,15 @@ function copy!(dest::CategoricalArray{T, N}, dstart::Integer, | |||
dest | |||
end | |||
|
|||
copy!(dest::CategoricalArray{T, N}, src::CategoricalArray{T, N}) where {T,N} = | |||
CA_OR_CA_VIEW = Union{CategoricalArray, SubArray{A,B,C,D} where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type names should be in CamelCase. D
isn't needed. Actually, you should even be able to use SubArray{<:Any, <:Any, <:CategoricalArray}
.
src/array.jl
Outdated
n::Integer=length(src)-sstart+1) where {T, N} | ||
function copy!(dest::CategoricalArray{<:Union{T,Null}, N}, dstart::Integer, | ||
src::S, sstart::Integer, n::Integer=length(src)-sstart+1) where | ||
{T,N,S<:Union{CategoricalArray{<:Union{T,Null},N},SubArray{A,B,C,D}} where {A,B,C<:CategoricalArray{<:Union{T,Null},N},D}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to make this shorter by parameterizing CA_OR_CA_VIEW
on the element type and using it here. Also avoid defining type parameters that are not used where possible, and note that the package currently uses spaces after commas even for type parameters.
src/array.jl
Outdated
else | ||
srefs = src.refs | ||
end | ||
spool = isa(src, SubArray) ? src.parent.pool : src.pool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good example why it might be good to have CategoricalArrays.pool(A::CategoricalArray)
and CategoricalArrays.pool(A::SubArray{CategoricalArray})
; the same for refs()
.
All type-dependent logic would be automatically handled by dispatch; less worries that type inference did its job right and such code blocks are efficiently compiled.
It would also be easy to extend to handle e.g. ReshapedArray{CategoricalArray}
.
src/array.jl
Outdated
sstart::Integer, | ||
n::Integer=length(src)-sstart+1) where | ||
{T, N, S <: Union{CategoricalArray{<:Union{T ,Null}, N}, | ||
SubArray{<:Any, <:Any, <:CategoricalArray{<:Union{T, Null}, N}}}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, the 2nd tparam of SubArray
is dimensionality, so it should match N
of the destination, whereas the dimensionality of the parent should not be checked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much this copy!()
really depends on the explicit nullability? Could <:Union{T,Null}
be simplified into T
?
There's a use of T
in levels!()
below, but there you can use leveltype(dest)
.
So in principle the declaration could be simplified to dest::CategoricalArray{T,N}
, src::Union{CategoricalArray{S, N}, SubArray{<:Any, N, <:CategoricalArray{S}}
, where T===S
or T === Union{S, Null}
or S === Union{T, Null}
, but it should be ok to do this check (leveltype(dest) == leveltype(src)
) in the beginning of the function body.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding nullability, I think I had left a comment in the original PR. The problem is that you're allowed to copy from a nullable array into a non-nullable array. As for what's the best way of ensuring consistency, I'd say the most compact solution is the best one.
Actually it looks like the current method is too restrictive, as we should also preserve levels when the types of source and destination are different (e.g. different integer types). So we could as well make remove restrictions on the element types (but test that it works with different types, and that in case of conversion error we don't corrupt the destination).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cjprybol Have you tried using CatArrSrc
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it looks like for nullability there's no other way than to do a run-time check for zeros in src
if dest
is non-nullable.
I'm not sure it's the most compact way, but since there's no easy way to express the requirements of src
and dest
category level types in the function signature, src
could also be declared as src::AbstractArray{<:Union{CatValue, Null}, N}
. There could be a weird case of src
being neither CategoricalArray
nor its wrapper, but then calls like leveltype(src)
would fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you both for the helpful comments!
@cjprybol Have you tried using CatArrSrc here?
I originally had src::CatArrSrc
(although it was a different name before) and I couldn't get the dispatch to work correctly. The element-type T
in the CatArrSrc
definition wasn't being recognized as the same element-type T
of the dest
inside of the copy!
function. When I copied & the CatArrSrc
definition into the copy!
function parameters everything worked. However, it sounds like we're all in agreement that src
and dest
don't need to have the same element-type T
because we may want to copy CategoricalArrays where the eltypes of src
and dest
are supertypes or subtypes of one another. I may be able to use src::CatArrSrc
again as long as I can get the dimensionality parameterization to work. Working on testing that out now
Function parameterization updated to be a little cleaner. I again was unsuccessful trying to use |
Have you tried with something like Regarding the handling of different element types, it looks like you could just do |
src/extras.jl
Outdated
@@ -122,3 +122,6 @@ Cut a numeric array into `ngroups` quantiles, determined using | |||
cut(x::AbstractArray, ngroups::Integer; | |||
labels::AbstractVector{U}=String[]) where {U<:AbstractString} = | |||
cut(x, quantile(x, (1:ngroups-1)/ngroups); extend=true, labels=labels) | |||
|
|||
refs(A::CategoricalArray) = A.refs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better put this in array.jl, and the SubArray
method in subarray.jl. Same for pool
.
src/array.jl
Outdated
@@ -412,7 +415,7 @@ function copy!(dest::CategoricalArray{T, N}, dstart::Integer, | |||
if dstart == dstart == 1 && n == length(dest) == length(src) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By adding !isa(dest, SubArray) && ...
here, you could relax the method signature to accept dest::SubArray
. I don't think any other changes are needed, but testing is of course needed to check that.
Any news? Would you prefer me to finish it? |
@cjprybol I've added a commit allowing for destination |
Yes, your changes look great! Very glad to see that you got it to work with the 👍 I approve 👏 |
I'm still hesitant about whether copying into a
I haven't tried, since that seems redundant with the third parameter. But note that it needs to be |
- add refs and pool functions for accessing fields - implement internal copy checks to avoid erroring mid-copy and corrupting the dest - introduce tests for various combinations of src and dest types and expected failures
Thanks for doing most of the work! |
replaces #92