Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OffsetArray support for cat/vcat/hcat #37629

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

johnnychen94
Copy link
Member

@johnnychen94 johnnychen94 commented Sep 17, 2020

When doing concatenation OffsetArrays are treated as plain lists and offsets are ignored.

This support for OffsetArrays only works on OffsetArrays@1 where IdOffsetRange is defined.

Blocked by #37643, the first commit will be removed after that get resolved.

closes #37493
closes #37628

cc: @timholy @jishnub

if dense
for k=1:nargs
for k=eachindex(A)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spaces around =

@johnnychen94

This comment has been minimized.

@@ -1472,17 +1471,17 @@ function _typed_hcat(::Type{T}, A::AbstractVecOrTuple{AbstractVecOrMat}) where T
nd = ndims(Aj)
ncols += (nd==2 ? size(Aj,2) : 1)
end
B = similar(A[1], T, nrows, ncols)
pos = 1
B = similar(first(A), T, nrows, ncols)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How difficult would it be to adapt this to have B = similar(first(A), T, axes(first(A),1), axes(A,1)), so that all the offsets get propagated?

I guess that's only the case of a vector of vectors; when some elements are matrices you'd have to give up on the 2nd axis.

Copy link
Member Author

@johnnychen94 johnnychen94 Sep 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand your comment correctly.

As explained in JuliaArrays/OffsetArrays.jl#63 (comment), a meaningful offset propagation is only possible for very restricted conditions. This restriction doesn't make cats of OffsetArrays indeed useful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I didn't see the issue, let me look there first.

Copy link
Member

@timholy timholy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for tackling this! The overall design seems fine.

p1 = pos + Int(length(Vk))::Int - 1
a[pos:p1] = Vk
pos = p1+1
n = length(Vk)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
n = length(Vk)
n = Int(length(Vk))::Int

because this often gets called with poor inference. Try @code_warntype Base._typed_vcat(Float64, AbstractVector[[1,2,3], [4.0]]).

Not putting in these annotation re-opens us to all sorts of code invalidation, see https://julialang.org/blog/2020/08/invalidations/.

This isn't a general pattern you need to start adopting in all your Julia code, but this is a very low-level function called by lots of things (all of which would be invalidated if this is invalidated) and from its signature alone you might guess it's a bit of an inference nightmare.

I'm not adding comments like these elsewhere in this function, but you'll probably need to add other annotations to the initialization of pos above. Ideally, it would be great to check with SnoopCompile and loading various packages like Makie to see if you've greatly increased the number of invalidations.

@@ -1604,7 +1602,7 @@ function __cat(A, shape::NTuple{M,Int}, catdims, X...) where M
for x in X
for i = 1:N
if concat[i]
inds[i] = offsets[i] .+ cat_indices(x, i)
inds[i] = offsets[i] .+ parent(cat_indices(x, i))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this about?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This strips the offsets and uses linear indexing when computing indices. parent(r::IdOffsetRange) = r.parent.

This could be fragile and a little bit tricky, though. Phehaps there's a better way to make things work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would generally try to stay away from assumptions about how x might be structured if at all possible...

ncols = size(A[1], 2)
for j = 2:nargs
ncols = size(first(A), 2)
for j = first(axes(A))[2:end]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for j = first(axes(A))[2:end]
for j = firstindex(A)+1:lastindex(A)

@@ -22,7 +22,7 @@ using Distributed: splitrange
@test splitrange(-1, 1, 4) == Array{UnitRange{Int64},1}([-1:-1,0:0,1:1])

const BASE_TEST_PATH = joinpath(Sys.BINDIR, "..", "share", "julia", "test")
isdefined(Main, :OffsetArrays) || @eval Main include(joinpath($(BASE_TEST_PATH), "testhelpers", "OffsetArrays.jl"))
isdefined(Main, :OffsetArrays) || @eval Main @everywhere include(joinpath($(BASE_TEST_PATH), "testhelpers", "OffsetArrays.jl"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for this change? Could this cause problems in tests running in other workers? What if it defines methods that are supposed to fail in those tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure about this, though.

Without this Distributed test is broken:

      From worker 10:	ERROR: LoadError: LoadError: TaskFailedException
      From worker 10:	
      From worker 10:	    nested task error: On worker 25:
      From worker 10:	    UndefVarError: OffsetArrays not defined
      From worker 10:	    Stacktrace:
      From worker 10:	      [1] deserialize_module
      From worker 10:	        @ /buildworker/worker/package_linux32/build/usr/share/julia/stdlib/v1.6/Serialization/src/Serialization.jl:957
      From worker 10:	      [2] handle_deserialize
      From worker 10:	        @ /buildworker/worker/package_linux32/build/usr/share/julia/stdlib/v1.6/Serialization/src/Serialization.jl:856
      From worker 10:	      [3] deserialize
      From worker 10:	        @ /buildworker/worker/package_linux32/build/usr/share/julia/stdlib/v1.6/Serialization/src/Serialization.jl:774

Adding this helps remove this error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe

using .Main.OffsetArrays

Copy link
Member Author

@johnnychen94 johnnychen94 Sep 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's already using .Main.OffsetArrays one line down below on this.

I think it's because eachindex now requires OffsetArrays.IdOffsetRange and triggers the error, while for the old version of OffsetArrays it uses IdentityUnitRange and doesn't require anything in OffsetArrays.

It would be better to get #37643 first and I'll drop the OffsetArrays upgrade commit here.


using Base: Indices, IndexCartesian, IndexLinear, tail
using Base: Indices, tail, @propagate_inbounds
@static if !isdefined(Base, :IdentityUnitRange)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is always defined on master, which is what this PR targets.

I didn't go through the rest carefully, is the idea here to basically "sync" the version here with that in OffsetArrays.jl? Seems reasonable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooops, it turns out that the first c2fd49e commit is broken here and I've separate this commit out in #37643. To first upgrade the OffsetArrays version.

@codecov-commenter
Copy link

Codecov Report

Merging #37629 into master will increase coverage by 0.04%.
The diff coverage is 84.55%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #37629      +/-   ##
==========================================
+ Coverage   87.53%   87.57%   +0.04%     
==========================================
  Files         362      362              
  Lines       71830    71888      +58     
==========================================
+ Hits        62873    62953      +80     
+ Misses       8957     8935      -22     
Impacted Files Coverage Δ
base/compiler/ssair/ir.jl 83.07% <0.00%> (-0.65%) ⬇️
base/float.jl 88.26% <ø> (ø)
base/refpointer.jl 70.68% <ø> (ø)
base/refvalue.jl 100.00% <ø> (ø)
base/compiler/optimize.jl 75.00% <50.00%> (ø)
base/binaryplatforms.jl 85.38% <82.14%> (+3.53%) ⬆️
base/compiler/ssair/inlining.jl 91.61% <85.91%> (-1.46%) ⬇️
stdlib/Artifacts/src/Artifacts.jl 61.83% <90.00%> (+8.26%) ⬆️
base/compiler/ssair/driver.jl 90.24% <100.00%> (ø)
base/compiler/typeinfer.jl 90.23% <100.00%> (ø)
... and 41 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4ef8313...a61b4e0. Read the comment docs.

@jishnub
Copy link
Contributor

jishnub commented Feb 21, 2023

Gentle bump, would be good to move this along

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s]
Projects
None yet
7 participants