Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with StaticArrays #126

Closed
bjack205 opened this issue Nov 19, 2021 · 9 comments
Closed

Error with StaticArrays #126

bjack205 opened this issue Nov 19, 2021 · 9 comments

Comments

@bjack205
Copy link

I'm getting an error when running the following code:

using FiniteDiff, StaticArrays
n = 5
x = @SVector randn(n)
f(x) = x'x + sin(x[1])
grad = zeros(n)
cache = FiniteDiff.GradientCache(grad, zeros(n),Val(:forward))
FiniteDiff.finite_difference_gradient!(grad, f, x, cache)

which produces this error:

ERROR: LoadError: MethodError: no method matching ndims(::Type{Nothing})
Closest candidates are:
  ndims(::Number) at number.jl:83
  ndims(::LinearAlgebra.UniformScaling) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/uniformscaling.jl:87
  ndims(::Base.Iterators.ProductIterator) at iterators.jl:967
  ...
Stacktrace:
 [1] Base.Broadcast.BroadcastStyle(#unused#::Type{Nothing})
   @ Base.Broadcast ./broadcast.jl:103
 [2] combine_styles(c::Nothing)
   @ Base.Broadcast ./broadcast.jl:420
 [3] combine_styles(c1::Nothing, c2::Base.Broadcast.Broadcasted{StaticArrays.StaticArrayStyle{1}, Nothing, typeof(FiniteDiff.compute_epsilon), Tuple{Base.RefValue{Val{:forward}}, SVector{5, Float64}, Float64, Float64, Bool}})
   @ Base.Broadcast ./broadcast.jl:421
 [4] materialize!
   @ ./broadcast.jl:891 [inlined]
 [5] finite_difference_gradient!(df::Vector{Float64}, f::typeof(f), x::SVector{5, Float64}, cache::FiniteDiff.GradientCache{Nothing, Nothing, Nothing, Vector{Float64}, Val{:forward}(), Float64, Val{true}()}; relstep::Float64, absstep::Float64, dir::Bool)
   @ FiniteDiff ~/.julia/packages/FiniteDiff/msXcU/src/gradients.jl:140
 [6] finite_difference_gradient!(df::Vector{Float64}, f::Function, x::SVector{5, Float64}, cache::FiniteDiff.GradientCache{Nothing, Nothing, Nothing, Vector{Float64}, Val{:forward}(), Float64, Val{true}()})
   @ FiniteDiff ~/.julia/packages/FiniteDiff/msXcU/src/gradients.jl:138
 [7] top-level scope
   @ ~/.julia/dev/RobotDynamics/test/scalar_function_test.jl:165
in expression starting at /home/brian/.julia/dev/RobotDynamics/test/scalar_function_test.jl:165

I'm explicitly not passing in x to the GradientCache because I want to be able to also support normal Vector inputs with the same cache. There's definitely seems to be some errors in the non-StridedVector gradient function, since there are calls to compute_epsilon passing in the vector x instead of an element of x.

If I copy the StridedVector code and allow a StaticVector, the gradient computes just fine and without any allocations. Since you're already depending on StaticArrays, maybe make the StridedVector version take in a Union{StridedVector{<:Number},<:StaticVector{<:Any,<:Any,<:Number}} for the input vector x?

I'm not quite following all the logic in the GradientCache, and maybe my simple suggestion is missing something, but it'd be nice to allow for this type of use-case.

@ChrisRackauckas
Copy link
Member

The in-place forms cannot use static arrays because static arrays are immutable. Did you try the out-of-place form?

@bjack205
Copy link
Author

Does in-place vs out-of-place even make sense in vector-to-scalar case? It has to return a scalar, whether or not the input vector is immutable or not.

With my proposed change, you could (with the same cache) pass in either static vectors or normal (mutable) vectors. Right now it results some rather obscure errors if you try to mix the cache and input types.

@ChrisRackauckas
Copy link
Member

You never want to use static vectors and mutable vectors in the same code. If you do, then one of the two is seeing a very suboptimal code path. Just use the out-of-place form for the differentiation of static arrays: you don't want to use it on vectors because it's allocating, but of course it's not allocating for static arrays.

@bjack205
Copy link
Author

Yeah, I agree it's usually not a great way to go, but there are cases where it's warranted (like using leveraging fast computational kernels using static arrays to populate pieces of larger arrays). I'll see if I can rethink my code design a bit to simplify this.

Out of curiosity, would allowing a Union{StridedVector, StaticVector} break the functionality of the in-place version?

@ChrisRackauckas
Copy link
Member

You literally can never use StaticVector with in-place codes. Everything about it always fails.

using StaticArrays
x = SA[1.0,2.0]
x[1] = 3.0 # errors

Any code that is mutating on static arrays is thus either accidentally not using static arrays and is thus inefficient for static arrays, or is not mutating and is thus not efficient for arrays. There's two code paths because they are literally not compatible.

@bjack205
Copy link
Author

Yeah, that's technically only true for SVector, not StaticVector, right?
I'd love to pick your brain some more about writing a codebase that works efficiently with both paths without having a massive amount of duplicate code, but I don't think this is the right forum 😃

As a little background, I'm the lead author of TrajectoryOptimization.jl, which has SotA performance thanks to StaticArrays, but I'm currently working on allowing it to use non-static arrays since the compilation time blows up for anything over a given size. I rely on super-efficient computation kernels for computing lots of 1st and 2nd order Taylor approximations and then caching them to solve the large optimization problem.

Feel free to close this issue if you think it's not a concern (I'll just continue with my current hack until I figure something else out).

@ChrisRackauckas
Copy link
Member

Yeah, that's technically only true for SVector, not StaticVector, right?

No, because StaticVector is just an alias StaticArray{Tuple{N}, T, 1}.

I'd love to pick your brain some more about writing a codebase that works efficiently with both paths without having a massive amount of duplicate code, but I don't think this is the right forum 😃

Right now you can't. DiffEq has a code duplication for this reason. There is work going on to try and aliviate the issue, specifically JuliaLang/julia#42465, but even with that what will happen is non-mutating could code could be as efficient as mutating code, but only if the compiler can sufficiently prove everything it needs to. And that PR does not include the compilation passes required for any off the proofs, so we're quite a long ways away from being able to really use immutable vectors and allow the compiler to transform it into fast mutating code. When ImmutableArrays are good enough for DiffEq they should be good enough for other places, but right now it's a core compiler dev dream to make that a reality.

As for proper use of out of place, see the test:

https://github.com/JuliaDiff/FiniteDiff.jl/blob/master/test/out_of_place_tests.jl#L1-L20

Again, there is no cache because caching static vectors doesn't make sense: if you make an mutate a cache they will allocate, while if there is no cache they won't allocate, so not caching with immutables is a better idea. Because of that I'll close this but feel free to ask any further questions.

@bjack205
Copy link
Author

Thanks for the links Chris! So while obviously it's the most efficient to work entirely with static arrays (for small sizes) to avoid copy-operations and keep everything on the stack, you can efficiently copy from a static array to a normal one. My current approach is to represent all my cached gradients and vectors using mutable containers, usually a SizedArray because it can be converted to an SArray (at basically the expense of another copy) if needed. I usually do this if I have a very linear-algebra-dense computation, like a Riccati recursion, but only if the sizes are small enough, thereby isolating the cases when I need to have separate static-vs-non-static code into the core computational kernels.

But since a lot of the computational overhead of the algorithms I deal with is actually evaluating the gradients, Hessians, or Jacobians I need (I have a unified interface for user-defined, ForwardDiff, and FiniteDiff, and hopefully Symbolics in the near future), it's usually a win (for medium-sized input vectors) to pass in an SVector and save the result to a mutable container. This unifies the output interface while allowing the input to use either an in-place function operating on mutable containers or something that returns an SVector. I think having this option to pay some performance (the price of copying to a mutable container) for a unified output interface, reducing code duplication, is a nice one to have. In most of my benchmarks, I've found that the price of the copy is usually pretty small for the small sizes we typically work with for StaticArrays. The other reason this is nice is because it extends the input dimension that feasibly work with static arrays (since the vector alone can be represented by a StaticVector but the output matrices don't, saving you from needing to compile all the methods for those static matrices which gets prohibitively expensive for an input dimension above about 20, and increases quadratically, it seems).

When you do similar things and need to cache the output, do you just store a vector of array like Vector{M} where M <: AbstractArray and change the storage type based on whether it's in-place or out-of-place? That way you just overwrite the immutable array in the vector of cached elements?

@ChrisRackauckas
Copy link
Member

you can efficiently copy from a static array to a normal one

No you can't, that will allocate.

My current approach is to represent all my cached gradients and vectors using mutable containers, usually a SizedArray because it can be converted to an SArray (at basically the expense of another copy) if needed. I usually do this if I have a very linear-algebra-dense computation, like a Riccati recursion, but only if the sizes are small enough, thereby isolating the cases when I need to have separate static-vs-non-static code into the core computational kernels.

This will hit the speed of static array kernels but would still require allocating mutables. It depends on the operations being done but this can be a good approach in some cases, but in the "smallest" cases never allocating the mutable will be better. It depends on the application whether that makes an appreciable difference.

But since a lot of the computational overhead of the algorithms I deal with is actually evaluating the gradients, Hessians, or Jacobians I need (I have a unified interface for user-defined, ForwardDiff, and FiniteDiff, and hopefully Symbolics in the near future)

Use AbstractDifferentiation.jl?

In most of my benchmarks, I've found that the price of the copy is usually pretty small for the small sizes we typically work with for StaticArrays.

The real key here is that the algorithms you're using are likely O(n^k) for k>1 somewhere (lu-factorization?) while this copy cost is just O(n), and so sooner or later it won't matter as much as getting some optimized microkernels. That's what's done for example in RecursiveFactorization.jl or Octavian.jl, though for very small matrices (<20 x 20 or so) a purely static approach will win.

When you do similar things and need to cache the output, do you just store a vector of array like Vector{M} where M <: AbstractArray and change the storage type based on whether it's in-place or out-of-place?

Yes, and it's lazily reconstructs the time series via the RecursiveArrayTools.jl VectorOfArray abstraction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants