Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flux.jl #457

Closed
yebai opened this issue Jul 24, 2018 · 5 comments
Closed

Flux.jl #457

yebai opened this issue Jul 24, 2018 · 5 comments
Assignees

Comments

@yebai
Copy link
Member

yebai commented Jul 24, 2018

Hi @ChrisRackauckas, I recently learned about the existence of the Flux.jl package.

I'm now looking at this package and have a question related to AD: It seems Flux.jl has its own AD; is this going to be supported or will it be replaced when a new one based on Cassette.jl is out? Thanks!

@ChrisRackauckas
Copy link
Collaborator

@MikeInnes

@MikeInnes
Copy link

Yes, Flux.Tracker will continue to be supported. The more advanced ADs we are building will take some time to mature, so we won't be changing the default for a while yet.

The plan is that the current and future ADs will be API compatible, with the only real difference being that the current data and param functions become no-ops. Code using the current AD should be able to update or try out the new one with a one-line change, so there's pretty low risk on depending on it.

Happy to answer any other questions that would help you get going with it.

@yebai
Copy link
Member Author

yebai commented Jul 30, 2018

@MikeInnes Many thanks.

At the moment, Turing makes use of ForwardDiff.jl and ReverseDiff.jl for AD. We are considering to switch to Capstan when it's more mature.

It would be nice if Flux's AD can stay API compatiable with Capstan. We might consider to switch earlier if it represents Julia's future AD. What are the advantages of Flux.jl's AD engine over these existing ones?

@MikeInnes
Copy link

The main reason Flux has its own AD is in order to support modularity in parameters (discussed at length in denizyuret/Knet.jl#144). Other than that, things like performance characteristics are broadly similar.

@yebai
Copy link
Member Author

yebai commented Jul 30, 2018

The main reason Flux has its own AD is in order to support modularity in parameters (discussed at length in denizyuret/Knet.jl#144). Other than that, things like performance characteristics are broadly similar.

We struggled with the same issue. Basically, Turing models can introduce new variables/parameters anywhere inside the @model block. This means before running the model function, we know nothing about

  • the total number of parameters
  • the dimension of each parameter
  • the data type of each parameter

@xukai92 managed to hack the ForwardDiff.jl and ReverseDiff.jl in order to the support the Turing modelling syntax. The implementation involves several runs of the model function to prepare the AD. Therefore it would be great if the modularity support from Flux can be incorporated into Turing.

@xukai92 @MikeInnes

Related issue: #421 (comment)

@xukai92 xukai92 self-assigned this Sep 1, 2018
yebai added a commit that referenced this issue Sep 4, 2018
@yebai yebai closed this as completed Sep 4, 2018
yebai added a commit that referenced this issue Sep 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants