Make `seed!` and `extract_jacobian!` gpu-friendly #472

charleskawczynski · 2020-11-09T19:03:54Z

Per @vchuravy's suggestion to open a new PR: Now that JuliaLang/julia#14955 is resolved, this PR revives the work of #353 and #406.

charleskawczynski · 2020-11-09T19:10:11Z

Interestingly, when trying

    ip = Iterators.product(ydual, 1:n)
    out_reshaped .= map(rc->partials(T, rc[1], rc[2]), ip)

I can see in the GPU stack trace:

ERROR: GPU compilation of kernel broadcast_kernel(CUDA.CuKernelContext, CuDeviceArray{Float32,2,1}, Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},typeof(identity),Tuple{Base.Broadcast.Extruded{Array{Float32,2},Tuple{Bool,Bool},Tuple{Int64,Int64}}}}, Int64) failed
KernelError: passing and using non-bitstype argument

Argument 4 to your kernel function is of type Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},typeof(identity),Tuple{Base.Broadcast.Extruded{Array{Float32,2},Tuple{Bool,Bool},Tuple{Int64,Int64}}}}, which is not isbits:
  .args is of type Tuple{Base.Broadcast.Extruded{Array{Float32,2},Tuple{Bool,Bool},Tuple{Int64,Int64}}} which is not isbits.
    .1 is of type Base.Broadcast.Extruded{Array{Float32,2},Tuple{Bool,Bool},Tuple{Int64,Int64}} which is not isbits.
      .x is of type Array{Float32,2} which is not isbits.

that the array in Extruded is not similar to the device array. I'm not familiar enough with Extruded, but maybe we can change initialization to use similar so that this solution is isbits?

Not sure who to ping here

src/apiutils.jl

charleskawczynski · 2020-11-09T22:14:30Z

Alright, I worked out the issue with extract_jacobian!, thanks @vchuravy! With this PR:

using ForwardDiff, BenchmarkTools
x = rand(1000);
cfg = ForwardDiff.GradientConfig(nothing, x);
@btime ForwardDiff.seed!($(cfg.duals), $x, $(cfg.seeds));

Results in 0 allocations, and

using CUDA, ForwardDiff, Test;
CUDA.allowscalar(false);

function f!(F, x)
  t = Tuple(x)
  F_nt = ntuple(Val(length(F))) do i
      i==1 && (F_i = (t[1] + 3) * (t[2]^3 - 7) + 18)
      i==2 && (F_i = sin(t[2] * exp(t[1]) - 1))
      F_i
  end
  F .= F_nt
end;

x_initial = CUDA.CuArray(Float32[0.1; 1.2]);
# x_initial = Float32[0.1; 1.2];
J = similar(x_initial, (length(x_initial),length(x_initial)));
F = similar(x_initial);
Ja =[-5.272    13.392
  1.25627   1.04689]
ForwardDiff.jacobian!(J, f!, F, x_initial);
@test all(Array(J) .≈ Ja)

passes.

src/apiutils.jl

charleskawczynski · 2020-11-10T20:22:47Z

Ok, looks like Travis is passing. Some of AppVeyor is, too, but it looks like Windows has not consistently passed in CI.

simonbyrne · 2020-11-11T22:08:06Z

This would be a good candidate for the julialang GPU CI.

charleskawczynski · 2020-11-16T14:20:19Z

bump

oxinabox

LGTM, though some small comments about naming thing and using vec for reshaping to a vector.

This might be a tiny bit slower than it is now for extract_jacobian_chunk!,
but i think it probably optimized out, at least in 1.5, and that is not perforance critical anyway, so if this gets us GPU support than that is a win.

Please bump the version number as a non-breaking change.

Strong agreement re: setting up JuliaGPU tests, but I won't block the PR over that.

Once these things are addressed I will merge.

src/apiutils.jl

src/jacobian.jl

charleskawczynski · 2020-11-17T19:29:31Z

LGTM, though some small comments about naming thing and using vec for reshaping to a vector.

I changed to vec, looks like it still runs fine on the GPU.

Please bump the version number as a non-breaking change.

I bumped version = "0.10.12" ➡️ version = "0.10.13", this is the version bump you mean, right?

KristofferC · 2020-11-17T19:30:28Z

How did the benchmark show 0 allocations when views and broadcasting were missed in some places? Where those not benchmarked? Would be good to ensure all the stuff changed here has been benchmarked and confirmed not to allocate.

charleskawczynski · 2020-11-17T19:34:27Z

How did the benchmark show 0 allocations when views and broadcasting were missed in some places? Where those not benchmarked? Would be good to ensure all the stuff changed here has been benchmarked and confirmed not to allocate.

Where were views/broadcasts missing? I agree that adding a test would be nice. I'll see if I can add that somewhere. The only piece I checked locally was

julia> using ForwardDiff, BenchmarkTools


julia> x = rand(1000);

julia> cfg = ForwardDiff.GradientConfig(nothing, x);

julia> @btime ForwardDiff.seed!($(cfg.duals), $x, $(cfg.seeds));
  54.821 ns (0 allocations: 0 bytes)

KristofferC · 2020-11-17T20:15:15Z

I though e.g #472 (comment) would have caused allocations.

charleskawczynski · 2020-11-17T20:51:21Z

I though e.g #472 (comment) would have caused allocations.

I must have had the view in the seed!(duals::AbstractArray{Dual{T,V,N}}, x, seeds::NTuple{N,Partials{N,V}}) where {T,V,N} signature but not in seed!(duals::AbstractArray{Dual{T,V,N}}, x, index, seed::Partials{N,V} = zero(Partials{N,V})) where {T,V,N}, where you commented-- I'm not sure since I squashed that commit.

For good measure, I added BenchmarkTools to the test environment and added tests to ensure that we're getting zero allocations for all three seed! definitions.

charleskawczynski · 2020-11-17T21:03:05Z

Ah, glad I added the tests, I just realized that I missed one method (the seed! with NTuple seeds and index), and that one happens to be failing. Will update soon. It turns out I ended up having to change the chunk seed! to use getindex.(Ref(seeds), 1:chunksize)) to ensure zero allocations.

charleskawczynski · 2020-11-17T22:26:59Z

Alright, now all of the seed! methods are non-allocating and are tested to be non-allocating.

charleskawczynski · 2020-11-17T22:33:55Z

Ok, I also squashed. I think this should be ready to go.

oxinabox · 2020-11-18T11:36:34Z

Allocation tests are failing on julia 1.4 and 1.0.
You probably want to only run that file on 1.5 and above.

The failure on nightly is unrelated and is fixed by
#476

KristofferC · 2020-11-18T13:32:35Z

test/AllocationsTest.jl

+    x = rand(1000)
+    cfg = ForwardDiff.GradientConfig(nothing, x)
+
+    balloc = @ballocated ForwardDiff.seed!($(cfg.duals), $x, $(cfg.seeds))


Doesn't just @allocated work?

I tried, but I wasn't able to interpolate with @allocated, so it results in non-zero allocations.

Actually, it looks like we can use @allocated, we'll just need to duplicate the calls and remove the getproperty/getindex in the calls:

@testset "Test seed! allocations" begin x = rand(1000) cfg = ForwardDiff.GradientConfig(nothing, x) duals = cfg.duals seeds = cfg.seeds seed = cfg.seeds[1] alloc = @allocated ForwardDiff.seed!(duals, x, seeds) alloc = @allocated ForwardDiff.seed!(duals, x, seeds) @test alloc == 0 alloc = @allocated ForwardDiff.seed!(duals, x, seed) alloc = @allocated ForwardDiff.seed!(duals, x, seed) @test alloc == 0 index = 1 alloc = @allocated ForwardDiff.seed!(duals, x, index, seeds) alloc = @allocated ForwardDiff.seed!(duals, x, index, seeds) @test alloc == 0 index = 1 alloc = @allocated ForwardDiff.seed!(duals, x, index, seed) alloc = @allocated ForwardDiff.seed!(duals, x, index, seed) @test alloc == 0 end

Let me know which one is preferred

I think that originally (in e.g. 1.0) julia tried to excluded compilation from the output of @allocated, but that proved unreliable, and so that was removed in 1.4 (ish?).
In response to that I added @ballocated to BenchmarkTools that just pulled the allocations out of the fastest run according to @benchmark (which will naturally not be the one that included the timing).

I think @ballocated is nicer than duplicating personally, but I will defer to others.

I agree that @ballocated seems nicer, but I'll also defer to others as this is my first PR to this repo.

@ballocated doesn't really make sense to me. You run the function thousands of time and then pick out the lowest one. For functions that don't allocate, that will be the second run. So why run the function 998 more times? Seems like it will just slow down the test suite.

Ok, I'll switch to the code snippet above

charleskawczynski · 2020-11-18T15:13:24Z

Allocation tests are failing on julia 1.4 and 1.0.
You probably want to only run that file on 1.5 and above.

The failure on nightly is unrelated and is fixed by
#476

Ah, good point. I'll update to test on 1.5 and above.

charleskawczynski · 2020-11-27T17:29:25Z

Bump

Use broadcast/macro consistently Fix jac Add AllocationsTest.jl

charleskawczynski · 2020-11-30T14:34:52Z

Rebased and re-bumped the version.

KristofferC reviewed Nov 9, 2020

View reviewed changes

src/apiutils.jl Outdated Show resolved Hide resolved

charleskawczynski force-pushed the ck/gpu branch from a08d02d to b6e6984 Compare November 9, 2020 22:12

KristofferC reviewed Nov 10, 2020

View reviewed changes

src/apiutils.jl Outdated Show resolved Hide resolved

charleskawczynski force-pushed the ck/gpu branch from b6e6984 to f39a36b Compare November 10, 2020 18:56

charleskawczynski changed the title ~~Make seeds! and extract_jacobian! gpu-friendly~~ Make seed! and extract_jacobian! gpu-friendly Nov 10, 2020

charleskawczynski mentioned this pull request Nov 10, 2020

Incompatibility with CuArrays JuliaNLSolvers/NLsolve.jl#234

Open

oxinabox approved these changes Nov 17, 2020

View reviewed changes

src/apiutils.jl Outdated Show resolved Hide resolved

src/jacobian.jl Outdated Show resolved Hide resolved

src/jacobian.jl Outdated Show resolved Hide resolved

charleskawczynski force-pushed the ck/gpu branch from f39a36b to b4cb1ac Compare November 17, 2020 19:27

charleskawczynski force-pushed the ck/gpu branch 2 times, most recently from f303aea to a5b9e7f Compare November 17, 2020 22:32

KristofferC reviewed Nov 18, 2020

View reviewed changes

charleskawczynski force-pushed the ck/gpu branch 2 times, most recently from 23ed88e to 46bd294 Compare November 23, 2020 15:49

andreasnoack closed this Nov 29, 2020

andreasnoack reopened this Nov 29, 2020

charleskawczynski force-pushed the ck/gpu branch from 46bd294 to 3b6f62b Compare November 30, 2020 14:32

Make seeds and extract_jacobian gpu-friendly

e0be439

Use broadcast/macro consistently Fix jac Add AllocationsTest.jl

charleskawczynski force-pushed the ck/gpu branch from 3b6f62b to e0be439 Compare November 30, 2020 14:33

andreasnoack merged commit 7e41f8c into JuliaDiff:master Nov 30, 2020

This was referenced Feb 9, 2021

Broadcast views to avoid allocations. #353

Closed

WIP: GPU-compatibility of Jacobians #406

Closed

KristofferC mentioned this pull request Mar 9, 2023

add back the iteration based seeding for Array #634

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `seed!` and `extract_jacobian!` gpu-friendly #472

Make `seed!` and `extract_jacobian!` gpu-friendly #472

charleskawczynski commented Nov 9, 2020 •

edited

Loading

charleskawczynski commented Nov 9, 2020 •

edited

Loading

charleskawczynski commented Nov 9, 2020 •

edited

Loading

charleskawczynski commented Nov 10, 2020 •

edited

Loading

simonbyrne commented Nov 11, 2020

charleskawczynski commented Nov 16, 2020

oxinabox left a comment

charleskawczynski commented Nov 17, 2020

KristofferC commented Nov 17, 2020

charleskawczynski commented Nov 17, 2020 •

edited

Loading

KristofferC commented Nov 17, 2020

charleskawczynski commented Nov 17, 2020

charleskawczynski commented Nov 17, 2020 •

edited

Loading

charleskawczynski commented Nov 17, 2020 •

edited

Loading

charleskawczynski commented Nov 17, 2020

oxinabox commented Nov 18, 2020 •

edited

Loading

KristofferC Nov 18, 2020

charleskawczynski Nov 18, 2020

charleskawczynski Nov 18, 2020 •

edited

Loading

oxinabox Nov 18, 2020

charleskawczynski Nov 18, 2020

KristofferC Nov 23, 2020 •

edited

Loading

charleskawczynski Nov 23, 2020

charleskawczynski Nov 27, 2020

charleskawczynski commented Nov 18, 2020

charleskawczynski commented Nov 27, 2020

charleskawczynski commented Nov 30, 2020

Make seed! and extract_jacobian! gpu-friendly #472

Make seed! and extract_jacobian! gpu-friendly #472

Conversation

charleskawczynski commented Nov 9, 2020 • edited Loading

charleskawczynski commented Nov 9, 2020 • edited Loading

charleskawczynski commented Nov 9, 2020 • edited Loading

charleskawczynski commented Nov 10, 2020 • edited Loading

simonbyrne commented Nov 11, 2020

charleskawczynski commented Nov 16, 2020

oxinabox left a comment

Choose a reason for hiding this comment

charleskawczynski commented Nov 17, 2020

KristofferC commented Nov 17, 2020

charleskawczynski commented Nov 17, 2020 • edited Loading

KristofferC commented Nov 17, 2020

charleskawczynski commented Nov 17, 2020

charleskawczynski commented Nov 17, 2020 • edited Loading

charleskawczynski commented Nov 17, 2020 • edited Loading

charleskawczynski commented Nov 17, 2020

oxinabox commented Nov 18, 2020 • edited Loading

KristofferC Nov 18, 2020

Choose a reason for hiding this comment

charleskawczynski Nov 18, 2020

Choose a reason for hiding this comment

charleskawczynski Nov 18, 2020 • edited Loading

Choose a reason for hiding this comment

oxinabox Nov 18, 2020

Choose a reason for hiding this comment

charleskawczynski Nov 18, 2020

Choose a reason for hiding this comment

KristofferC Nov 23, 2020 • edited Loading

Choose a reason for hiding this comment

charleskawczynski Nov 23, 2020

Choose a reason for hiding this comment

charleskawczynski Nov 27, 2020

Choose a reason for hiding this comment

charleskawczynski commented Nov 18, 2020

charleskawczynski commented Nov 27, 2020

charleskawczynski commented Nov 30, 2020

Make `seed!` and `extract_jacobian!` gpu-friendly #472

Make `seed!` and `extract_jacobian!` gpu-friendly #472

charleskawczynski commented Nov 9, 2020 •

edited

Loading

charleskawczynski commented Nov 9, 2020 •

edited

Loading

charleskawczynski commented Nov 9, 2020 •

edited

Loading

charleskawczynski commented Nov 10, 2020 •

edited

Loading

charleskawczynski commented Nov 17, 2020 •

edited

Loading

charleskawczynski commented Nov 17, 2020 •

edited

Loading

charleskawczynski commented Nov 17, 2020 •

edited

Loading

oxinabox commented Nov 18, 2020 •

edited

Loading

charleskawczynski Nov 18, 2020 •

edited

Loading

KristofferC Nov 23, 2020 •

edited

Loading