Skip to content

Conversation

@wsmoses
Copy link
Contributor

@wsmoses wsmoses commented Apr 19, 2025

No description provided.

@wsmoses
Copy link
Contributor Author

wsmoses commented May 1, 2025

@mcabbott @ToucheSir a lot of these remaining errors seem to be bugs in the testing infra?

Enzyme grad check ConvTranspose: Error During Test at /home/runner/work/Flux.jl/Flux.jl/test/ext_reactant/reactant.jl:46
  Got exception outside of a @test
  all arguments must have at least the same length of the firs one
  Stacktrace:
    [1] error(s::String)
      @ Base ./error.jl:35
    [2] _map(::Function, ::Tuple{KeyPath{Tuple{Int64, Symbol, Int64}}, KeyPath{Tuple{Int64, Symbol, Int64}}}, ::Tuple{Int64, Int64}, ::Vararg{Any})
      @ Functors ~/.julia/packages/Functors/LbNAu/src/walks.jl:2
    [3] (::Functors.StructuralWalkWithPath)(recurse::Function, kp::KeyPath{Tuple{Int64, Symbol}}, x::Tuple{Int64, Int64}, ys::Nothing)
      @ Functors ~/.julia/packages/Functors/LbNAu/src/walks.jl:108
    [4] (::Functors.ExcludeWalkWithKeyPath{Functors.StructuralWalkWithPath, var"#4#5"{Float64, Float64}, typeof(Functors.isleaf)})(recurse::Function, kp::KeyPath{Tuple{Int64, Symbol}}, x::Tuple{Int64, Int64}, ys::Nothing)
      @ Functors ~/.julia/packages/Functors/LbNAu/src/walks.jl:135
    [5] (::Functors.CachedWalkWithPath{Functors.ExcludeWalkWithKeyPath{Functors.StructuralWalkWithPath, var"#4#5"{Float64, Float64}, typeof(Functors.isleaf)}, Functors.NoKeyword, Functors.WalkCache{Any, Any, Functors.ExcludeWalkWithKeyPath{Functors.StructuralWalkWithPath, var"#4#5"{Float64, Float64}, typeof(Functors.isleaf)}, IdDict{Any, Any}}})(recurse::Function, kp::KeyPath{Tuple{Int64, Symbol}}, x::Tuple{Int64, Int64}, ys::Nothing)
      @ Functors ~/.julia/packages/Functors/LbNAu/src/walks.jl:199
    [6] (::Functors.var"#recurse#27"{Functors.CachedWalkWithPath{Functors.ExcludeWalkWithKeyPath{Functors.StructuralWalkWithPath, var"#4#5"{Float64, Float64}, typeof(Functors.isleaf)}, Functors.NoKeyword, Functors.WalkCache{Any, Any, Functors.ExcludeWalkWithKeyPath{Functors.StructuralWalkWithPath, var"#4#5"{Float64, Float64}, typeof(Functors.isleaf)}, IdDict{Any, Any}}}})(::KeyPath{Tuple{Int64, Symbol}}, ::Vararg{Any})
      @ Functors ~/.julia/packages/Functors/LbNAu/src/walks.jl:54

Do you have any idea what's happening (or could help make an isolated example of what's going awry)

@mcabbott
Copy link
Member

mcabbott commented May 6, 2025

From reading the log... error is from this utility function:

function check_equal_leaves(a, b; rtol=1e-4, atol=1e-4)
fmapstructure_with_path(a, b) do kp, x, y
# @show kp
if x isa AbstractArray
@test x y rtol=rtol atol=atol
elseif x isa Number
@test x y rtol=rtol atol=atol

Seems to be called with 1st argument Zygote gradient, and 2nd argument Enzyme gradient:

check_equal_leaves(g, g_ez; rtol, atol)

And Zygote will often truncate branches which are like (nothing, nothing). I guess that's what's happening?

Signature in another error is other order... but it looks like Enzyme has left Tuple{Int64, Int64} in the gradient, where Zygote has nothing. Then in the next recursion inwards, you could get map(..., x::Tuple{Int64, Int64}, ys::Nothing) as above, I think?

map(f::Function, 
   t1::NTuple{6, KeyPath{Tuple{Int64, Symbol}}}, # who knows why, ignore
   t2::Tuple{Vector{Float32}, Tuple{Int64, Int64}, NTuple{4, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Int64},  # presumably Enzyme
   ts::Tuple{Vector{Float32}, Vararg{Nothing, 5}},  # presumably Zygote
)

@wsmoses
Copy link
Contributor Author

wsmoses commented May 13, 2025

yeah Enzyme keeps the primal and shadow structures the same, so it sounds like the test works -- but the test function itself is busted here?

@mcabbott
Copy link
Member

Yes, that's right. Sorry I stopped before the conclusion!

Somehow the function for doing approximate equality on nested structs needs to be made smart enough to handle the difference between Zygote & Enzyme conventions.

@wsmoses
Copy link
Contributor Author

wsmoses commented Jul 14, 2025

@mcabbott do you know what's needed to move the needle here?

@wsmoses
Copy link
Contributor Author

wsmoses commented Oct 24, 2025

bumping again here @mcabbott @CarloLucibello

@mcabbott
Copy link
Member

Sorry I don't have bandwidth to think about this.

If anyone else does, trying to simplify gradient tests would be very welcome. Zygote may be on the way out, so far there appears to be nobody trying to make it run on 1.12. Possibly working on tests and on removing this would best be done together.

@wsmoses
Copy link
Contributor Author

wsmoses commented Jan 2, 2026

@mcabbott @CarloLucibello this should now pass, if it can get a review/merge?

@CarloLucibello
Copy link
Member

CUDA and Metal buildkite tests are failing

@codecov
Copy link

codecov bot commented Jan 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 68.47%. Comparing base (2a91836) to head (fdb0238).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff             @@
##           master    #2600       +/-   ##
===========================================
+ Coverage   32.09%   68.47%   +36.38%     
===========================================
  Files          32       33        +1     
  Lines        2038     2087       +49     
===========================================
+ Hits          654     1429      +775     
+ Misses       1384      658      -726     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@wsmoses
Copy link
Contributor Author

wsmoses commented Jan 2, 2026

Looks like all passes, including all Julia versions and devices

@CarloLucibello CarloLucibello merged commit 7589641 into FluxML:master Jan 2, 2026
9 of 11 checks passed
@wsmoses wsmoses deleted the ret branch January 2, 2026 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants