Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support both Theano and Tensorflow backend #611

Open
trungnt13 opened this issue Feb 16, 2016 · 17 comments
Open

Support both Theano and Tensorflow backend #611

trungnt13 opened this issue Feb 16, 2016 · 17 comments

Comments

@trungnt13
Copy link

The idea comes from keras: https://github.com/fchollet/keras/tree/master/keras/backend

Some functions have the same functional, but 2 backends can return different format of results. However, it is not a big issue if we carefully abstract them.

The biggest advantages of adding tensorflow backend is that it significantly speed up the building and running process on CPU. Hence, it helps a lot with the development process.

I ported updates.py and objectives.py from Lasagne to use multiple backend. Calculating the objective itself can be 3 times faster than Theano on CPU.

I am aware that some Lasagne modules are specialised for only Theano, however, I think it is possible to adapt all common layers for both frameworks.

@f0k
Copy link
Member

f0k commented Feb 16, 2016

We discussed this as well when TensorFlow came out, and decided to wait until it's more mature and/or faster than Theano on GPU. One of the problems is that unlike Keras, Lasagne does not (and does not want to) abstract away from Theano, but accepts and returns Theano expressions. This makes it a bit harder to transparently switch backends.

PS: I've seen that your "odin" project is a potpourri of Keras, Lasagne and your own (even including the README). Please make sure to adhere to the licenses of Keras and Lasagne, i.e., include their license texts where appropriate ("The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software."). Thanks!

@trungnt13
Copy link
Author

Thanks for your remind, the fact I just start the code and haven't got a clear direction, re-implement all Lasagne layers to be compatible with both backends would be a huge amount of work. Hence, I prefer to integrate it directly to Lasagne if possible.

Since the expression of tensorflow and theano both represent a computational graph, it is also possible to manually manipulate the expression from keras networks. And I think this would be perfect features for Lasagne.

However, It seems that the advantages of faster CPU does not surpass the drawback on huge amount of work now.

@benanne
Copy link
Member

benanne commented Feb 16, 2016

I think a better strategy would be to create a sister library for TensorFlow, that provides largely the same API as Lasagne (where possible), but returns TensorFlow objects where Lasagne would return Theano objects. That would make it much easier for users and devs who only care about one or the other. Of course we'd need a few devs involved in both of them, to keep the APIs roughly in sync. But it would of course be fine if they didn't support exactly the same stuff.

The main thing I would want to avoid is for it to slow things down. If we commit to supporting everything on both backends, suddenly implementing a feature is twice as much work. I think that would be a bad idea.

@trungnt13
Copy link
Author

I think that would be too much work. What about only create a wrapper that makes an interface for tensorflow to be similar to theano.

Basically, all the code in Lasagne can be kept unchanged (maybe some operators like: x**2 must be changed to T.pow(x, 2)). At highest level of abstraction, from theano import tensor as T would be identical to from theano_tf import tensor as T. I haven't think this one through but I think it highly possible.

So in this case, even if the tensorflow wrapper mess up, it doesn't affect the theano part anyway.

btw, tensorflow just releases new version: allow installed Cuda >= 7.0 and cuDNN >= R2, and add support for cuDNN R4 It should be faster, but the key is how much.

@ebenolson
Copy link
Member

Possibly relevant thread on r/machinelearning yesterday. I haven't looked at Keras' code myself in any detail, but some commenters felt that the dual backend made the source harder to understand.

Personally I think it sounds like a nightmare trying to keep everything working consistently and efficiently - we've had plenty of issues just with different Theano optimizations/implementations creating corner cases.

@benanne
Copy link
Member

benanne commented Feb 17, 2016

I'm with @ebenolson here, which is why I think in the long term, having separate libraries is the way to go. The design principles underlying Lasagne, and even most of the interface choices we have made, are not closely tied to Theano. They could just as well be applied in a library built on top of TensorFlow instead (or even MXNet, CGT, ...).

I don't think this would be too much work -- for one, other people could work on it (i.e., TensorFlow users), and there would be no extra burden for either "side" of the divide to keep up with the other. They would just follow the same design principles and make similar interface choices. An added benefit would be that it would be really easy to switch from one to the other (although this would not be a primary goal, more of a side effect).

@ebattenberg
Copy link
Contributor

Agree with @ebenolson and @benanne.

@aukejw
Copy link

aukejw commented Feb 29, 2016

Will Lasagne team members still be involved in its development, or oversee the major design decisions somehow? IMO it's important that the Lasagne team has a say in what happens: to have some sort of quality assurance if this sister library carries the Lasagne label, and (to a lesser extent) to ensure that credit is given where it is due.

Anyone can copy Lasagne and replace the theano operations by their tensorflow-equivalents, but high test-coverage and active discussion on issues and PRs is just as important -- if not more.

@benanne
Copy link
Member

benanne commented Feb 29, 2016

Absolutely. I think most of the Lasagne core team is still using Theano almost exclusively, so it might take a while before such an undertaking even becomes feasible.

@f0k
Copy link
Member

f0k commented Feb 29, 2016

With the current amount of active developers we won't be able to develop two libraries in parallel. Even on Lasagne alone we're not exactly progressing fast (partly because we're thinking things through extensively before forming design decisions, but also because participation on those discussions is... mhm... choppy).

Maybe the most likely scenario is that we'll switch over to TensorFlow if and when it becomes more mature and faster than Theano (for now it has more disadvantages than advantages, at least for what I'm using Lasagne for). I.e., at some point we might create a sister library and focus on that, and just oversee porting things back to the Theano side as needed. Who knows!

@cancan101
Copy link

What about using something like: https://github.com/dementrock/tensorfuse ?

@benanne
Copy link
Member

benanne commented Feb 29, 2016

Sounds good in theory, but that project looks quite limited and has no tests... also we would probably miss out on some features that TF offers over Theano. I think whatever we end up doing for the TF case, it should be as 'vanilla' as possible, because one of the key ideas behind Lasagne is not to hide the underlying library behind abstraction layers.

@f0k
Copy link
Member

f0k commented Mar 1, 2016

+1 -- a Lasagne for TensorFlow will probably look a bit different in some respects, as far as needed to support multi-GPU and distributed models. I haven't looked at the TensorFlow API in detail, but I doubt a generic theano-to-tensorflow wrapper would suffice.

@benanne
Copy link
Member

benanne commented Mar 10, 2016

This just got open-sourced: https://github.com/tensorflow/models/blob/master/inception/slim/README.md

Some of the design principles are quite similar to those of Lasagne, although there are a few things that are not as flexible as they could be, imo (most notably weight initialization). Still, worth taking a look at :)

@dnouri
Copy link
Member

dnouri commented Mar 11, 2016

Link in the previous comment is now broken. Here's a working one: https://github.com/tensorflow/models/blob/master/inception/inception/slim/README.md

@justheuristic
Copy link

One more cross-breed: https://github.com/openai/rllab/blob/master/sandbox/rocky/tf/core/layers.py#L377
It wraps several core layers with 100% same interface.

@benanne
Copy link
Member

benanne commented Oct 29, 2016

interesting, thanks for the heads up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants