Skip to content

Conversation

@devmotion
Copy link
Member

Fixes #738.

Since I had made #695 I felt somewhat responsible for #738. Defining seed! and extract_gradient! for triangular matrices seems to fix the example.

Probably this should b extended to additional matrix types.

@testset "LowerTriangular and UpperTriangular" begin
M = rand(3, 3)
for T in (LowerTriangular, UpperTriangular)
@test ForwardDiff.gradient(sum, T(randn(3, 3))) == T(ones(3, 3))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this fails if I increase the size of the matrix above the chunk size:

julia> @testset "LowerTriangular and UpperTriangular" begin
           M = rand(3, 3)
           for T in (LowerTriangular, UpperTriangular)
               @test ForwardDiff.gradient(sum, T(randn(3, 3))) == T(ones(3, 3))
               @test ForwardDiff.gradient(x -> dot(M, x), T(randn(3, 3))) == T(M)
           end
       end
Test Summary:                       | Pass  Total  Time
LowerTriangular and UpperTriangular |    4      4  1.0s
Test.DefaultTestSet("LowerTriangular and UpperTriangular", Any[], 4, false, false, true, 1.74360103628159e9, 1.743601037319704e9, false, "REPL[8]")

julia> @testset "LowerTriangular and UpperTriangular" begin
           M = rand(10, 10)
           for T in (LowerTriangular, UpperTriangular)
               @test ForwardDiff.gradient(sum, T(randn(10, 10))) == T(ones(10, 10))
               @test ForwardDiff.gradient(x -> dot(M, x), T(randn(10, 10))) == T(M)
           end
       end
LowerTriangular and UpperTriangular: Error During Test at REPL[9]:4
  Test threw exception
  Expression: ForwardDiff.gradient(sum, T(randn(10, 10))) == T(ones(10, 10))
  ArgumentError: cannot set index in the upper triangular part (1, 2) of an LowerTriangular matrix to a nonzero value (Dual{ForwardDiff.Tag{typeof(sum), Float64}}(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0))
  Stacktrace:
    [1] throw_nonzeroerror(T::Type, x::Any, i::Int64, j::Int64)
      @ LinearAlgebra ~/.julia/juliaup/julia-1.11.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/LinearAlgebra/src/triangular.jl:295
    [2] setindex!
      @ ~/.julia/juliaup/julia-1.11.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/LinearAlgebra/src/triangular.jl:326 [inlined]
    [3] _unsafe_setindex!
      @ ./reshapedarray.jl:297 [inlined]
    [4] setindex!
      @ ./reshapedarray.jl:286 [inlined]
    [5] setindex!
      @ ./subarray.jl:372 [inlined]
    [6] macro expansion
      @ ./broadcast.jl:973 [inlined]
    [7] macro expansion
      @ ./simdloop.jl:77 [inlined]
    [8] copyto!
      @ ./broadcast.jl:972 [inlined]
    [9] copyto!
      @ ./broadcast.jl:925 [inlined]
   [10] materialize!
      @ ./broadcast.jl:883 [inlined]
   [11] materialize!
      @ ./broadcast.jl:880 [inlined]
   [12] seed!(duals::LowerTriangular{ForwardDiff.Dual{ForwardDiff.Tag{typeof(sum), Float64}, Float64, 12}, Matrix{ForwardDiff.Dual{ForwardDiff.Tag{typeof(sum), Float64}, Float64, 12}}}, x::LowerTriangular{Float64, Matrix{Float64}}, index::Int64, seeds::NTuple{12, ForwardDiff.Partials{12, Float64}}, chunksize::Int64)
      @ ForwardDiff ~/.julia/packages/ForwardDiff/L0kjR/src/apiutils.jl:69
   [13] seed!
      @ ~/.julia/packages/ForwardDiff/L0kjR/src/apiutils.jl:66 [inlined]
   [14] chunk_mode_gradient(f::typeof(sum), x::LowerTriangular{Float64, Matrix{Float64}}, cfg::ForwardDiff.GradientConfig{ForwardDiff.Tag{typeof(sum), Float64}, Float64, 12, LowerTriangular{ForwardDiff.Dual{ForwardDiff.Tag{typeof(sum), Float64}, Float64, 12}, Matrix{ForwardDiff.Dual{ForwardDiff.Tag{typeof(sum), Float64}, Float64, 12}}}})

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good catch 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another catch is: This PR doesn't (yet) fix the fact that still for every element of the matrices seeds are generated - even though (IMO) we should limit it to the structurally non-zero entries.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have forgotten how all this code works... but 5-arg seed! implies the iteration is happening elsewhere?

The other thing worth checking would be GradientConfig and DiffResult things from https://juliadiff.org/ForwardDiff.jl/stable/user/advanced/ as I'm not sure what paths they take.

Copy link
Member Author

@devmotion devmotion Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, seed! (and extract_gradient_chunk!) with > 3 arguments are called if gradients are computed in multiple chunks:

# seed work vectors
xdual = cfg.duals
seeds = cfg.seeds
seed!(xdual, x)
# do first chunk manually to calculate output type
seed!(xdual, x, 1, seeds)
ydual = f(xdual)
$(result_definition)
extract_gradient_chunk!(T, result, ydual, 1, N)
seed!(xdual, x, 1)
# do middle chunks
for c in middlechunks
i = ((c - 1) * N + 1)
seed!(xdual, x, i, seeds)
ydual = f(xdual)
extract_gradient_chunk!(T, result, ydual, i, N)
seed!(xdual, x, i)
end
# do final chunk
seed!(xdual, x, lastchunkindex, seeds, lastchunksize)
ydual = f(xdual)
extract_gradient_chunk!(T, result, ydual, lastchunkindex, lastchunksize)

Copy link
Member

@mcabbott mcabbott Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those look good. Some chance this is flawed copy-pasting at my end, but for chunk mode I see strange effects:

julia> ForwardDiff.gradient(x -> begin @show(eltype(x)); sum(x) end, UpperTriangular(rand(5, 5)))
eltype(x) = ForwardDiff.Dual{ForwardDiff.Tag{var"#27#28", Float64}, Float64, 8}
eltype(x) = ForwardDiff.Dual{ForwardDiff.Tag{var"#27#28", Float64}, Float64, 8}
eltype(x) = ForwardDiff.Dual{ForwardDiff.Tag{var"#27#28", Float64}, Float64, 8}
eltype(x) = ForwardDiff.Dual{ForwardDiff.Tag{var"#27#28", Float64}, Float64, 8}
5×5 UpperTriangular{Float64, Matrix{Float64}}:
 1.0  1.0  1.0  1.0  1.0
     1.0  1.0  1.0  1.0
         1.0  1.0  1.0
             1.0  1.0
                 1.0

julia> sum(ans)
15.0

julia> ForwardDiff.gradient(x -> begin @show(eltype(x)); sum(x) end, Diagonal(rand(11)));  # ok
eltype(x) = ForwardDiff.Dual{ForwardDiff.Tag{var"#21#22", Float64}, Float64, 11}

julia> ForwardDiff.gradient(x -> begin @show(eltype(x)); sum(x) end, Diagonal(rand(13)));  # weird
eltype(x) = ForwardDiff.Dual{ForwardDiff.Tag{var"#23#24", Float64}, Float64, 7}
eltype(x) = ForwardDiff.Dual{ForwardDiff.Tag{var"#23#24", Float64}, Float64, 7}
eltype(x) = ForwardDiff.Dual{ForwardDiff.Tag{var"#23#24", Float64}, Float64, 7}
[.... >15 more]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, strange. I get

julia> ForwardDiff.gradient(x -> begin @show(eltype(x)); sum(x) end, UpperTriangular(rand(5, 5)))
eltype(x) = ForwardDiff.Dual{ForwardDiff.Tag{var"#40#41", Float64}, Float64, 8}
eltype(x) = ForwardDiff.Dual{ForwardDiff.Tag{var"#40#41", Float64}, Float64, 8}
5×5 UpperTriangular{Float64, Matrix{Float64}}:
 1.0  1.0  1.0  1.0  1.0
     1.0  1.0  1.0  1.0
         1.0  1.0  1.0
             1.0  1.0
                 1.0

julia> sum(ans)
15.0

julia> ForwardDiff.gradient(x -> begin @show(eltype(x)); sum(x) end, Diagonal(rand(11)));
eltype(x) = ForwardDiff.Dual{ForwardDiff.Tag{var"#42#43", Float64}, Float64, 11}

julia> ForwardDiff.gradient(x -> begin @show(eltype(x)); sum(x) end, Diagonal(rand(13)));
eltype(x) = ForwardDiff.Dual{ForwardDiff.Tag{var"#44#45", Float64}, Float64, 7}
eltype(x) = ForwardDiff.Dual{ForwardDiff.Tag{var"#44#45", Float64}, Float64, 7}

which seems expected given

function Chunk(input_length::Integer, threshold::Integer = DEFAULT_CHUNK_THRESHOLD)
N = pickchunksize(input_length, threshold)
Base.@nif 12 d->(N == d) d->(Chunk{d}()) d->(Chunk{N}())
end

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only test failures are the ones already present on the master branch (possibly related to some recent changes of acosh or NaNMath.acosh?).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My weird results above must have been my mistake, lazy to check out the branch...

We could add tests of how many executions are used in chunk mode (not sure whether there are any such now) but not essential.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test in d2a8af7.

@devmotion devmotion changed the title Fix gradient with LowerTriangular and UpperTriangular input Do not seed structural zeros Apr 2, 2025
@devmotion devmotion merged commit fce7f76 into master Apr 2, 2025
2 of 3 checks passed
@devmotion devmotion deleted the dw/lower_upper_triangular branch April 3, 2025 07:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UpperTriangular / LowerTriangular broken on 1.0.0

3 participants