Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are Dual4 and GraDual trying to do the same thing? #17

Open
davidanthoff opened this issue May 13, 2015 · 13 comments
Open

Are Dual4 and GraDual trying to do the same thing? #17

davidanthoff opened this issue May 13, 2015 · 13 comments

Comments

@davidanthoff
Copy link

@mlubin: I saw the work on Dual4 you started, and that looks VERY exciting to me. I assume you eventually want to use tuples (i.e. fixed size arrays) to store the epsilons, so that this would essentially be a DualN datatype, right?

I am just wondering how that relates to the GraDual type in ForwardDiff.jl? Is that essentially exactly the same thing? Except for the implementation details. Right now GraDual uses an array to store the epsilons, which probably makes it very slow. But once fixed size arrays are part of core julia, one would just have to change the implementation details and probably make this really fast?

I guess this issue is just to clarify whether these two efforts could be unified. In some way it seems to me that this kind of functionality actually fits better into the ForwardDiff.jl package, but not sure.

Also pinging @Scidom.

@papamarkou
Copy link
Contributor

Hi there @davidanthoff, thanks for keeping me posted. I have been slow lately (and will be for the next 2-3 months) due to change of research position and town. About your question, I haven't followed up with the fixed size arrays - it would be great if you could provide some directions/link. Same holds for Dual4.

@mlubin
Copy link
Contributor

mlubin commented May 13, 2015

@davidanthoff, yes I wrote Dual4 to do some preliminary experiments with multiple epsilon components, and eventually it will be using tuples.
It's the same idea as GraDual but actually fast enough to be useful for JuMP. I don't really mind which package it ends up in though.

@davidanthoff
Copy link
Author

@mlubin, I think it would be nicest if this functionality just ends up as a speedup of GraDual in the existing ForwarDiff.jl package, if that is possible. The are a whole bunch of benefits: we wouldn't have two packages that are essentially doing the same thing, the faster implementation would automatically be picked up for hessians etc that are already implemented in ForwardDiff, and finally I think it is more discover-able in that package for people. And then JuMP could just use ForwardDiff.

@Scidom, my understanding is that julia 0.4 has a new tuple system that is just equivalent to fixed size arrays, and that if you declare immutables with tuple members you might end up without any heap allocations at all.

Any implementation like this would probably also pick up all the SIMD instruction stuff that has gone into julia. This seems almost like the ideal application for SIMD instructions, right?

@papamarkou
Copy link
Contributor

Thanks @davidanthoff, I just read the updated documentation for the new tuple types, haven't tried any benchmark comparisons yet, it sounds a great improvement.

I agree that it would make sense to integrate this new functionality as a speedup improvement of GraDual, happy whichever way this gets settled. @mlubin I think you have admin access to ForwardDiff if you decide to merge your speedup.

@mlubin
Copy link
Contributor

mlubin commented May 13, 2015

I wouldn't say that DualN would be a drop-in replacement for GraDual. Assuming DualN is stack allocated, it could still be useful to have a type which is heap allocated if you want 50 or 100 or 1000 epsilon components.

@davidanthoff
Copy link
Author

That is a good point. Would still be nice if they had the same interface and lived in the same package. And if FADHessian could take either one for its internal storage.

@mlubin
Copy link
Contributor

mlubin commented May 14, 2015

Would you also want a stack-allocated FADHessian?

@mlubin
Copy link
Contributor

mlubin commented May 14, 2015

(Because there's a separate storage vector in FADHessian)

@papamarkou
Copy link
Contributor

The analogue that came to my mind is that of some "intelligent" linear algebra routines of Julia that pick under the hood which algorithm to run depending on input size. What we could do here would be to choose either automatically whether to use stack or heap allocated arrays depending on vector size or to provide an optional argument to allow the user to determine this @davidanthoff.

@mlubin the answer is I don't know, we may want to run some benchmarks re FADHessian to see if stack allocation is of any use there :)

@mlubin
Copy link
Contributor

mlubin commented May 14, 2015

@Scidom, yes, it could definitely be useful to have some layer which makes that decision, but as far as I know it would have to be at some level above the definition of the types themselves. So first of all we need to implement this DualN.
If there's no abstraction penalty (post 0.4 release), I'd actually like to have the existing Dual{T} type just be an instance of DualN{T,1}, which would lead me to argue for keeping the code in this package. It will certainly need a bit of experimentation, but I believe that the current uses of Dual to calculate gradients (e.g., in Optim), could be made faster by having more than one epsilon component (but still a small number).

@papamarkou
Copy link
Contributor

@mlubin I will let you and @davidanthoff decide on what is the most convenient package hosting of DualN, I don't mind about the logistics of it. I agree with you that whichever way it's sorted, we will need an extra layer to switch between use cases on the basis of user input.

@davidanthoff
Copy link
Author

Well, I certainly don't want to decide, whoever implements this should make that call.

My sense is that autodiff is just one application of dual numbers, and I'm not sure whether multiple epsilons make any sense for any of the other uses. For forward autodiff it clearly should be hugely beneficial. If that is so, it seems more logical to have the DualNumbers package have the "pure and general" implementation, and the ForwardDiff package the implementation that is optimized for the autodiff case.

It actually seems to me that e.g. Optim should just use ForwardDiff. The autodiff function in Optim is essentially doing the same thing as the autodiff in ForwardDiff, so that is just another code duplication that could go away.

For the FADHessian, I'm not sure. I don't know whether tuples of tuples will also be stack allocated in 0.4. But at least in theory it might also be useful to have a stack allocated FADHessian at some point.

@papamarkou
Copy link
Contributor

I agree @davidanthoff. ForwardDiff is a rather reasonably narrowed-down theme-specific package, as it focuses on a sub-set of automatic differentiation (that is forward mode AD), so in my opinion code becomes unreasonably fragmented if then slight variations in the internals of the implementation of forward AD (such as memory management) lead to separate packages.

Then again, what I or we consider to be logical is only one side of the story - I also agree that whoever implements this should have the last say and pick where to place their code based on what makes them feel more comfortable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants