-
Notifications
You must be signed in to change notification settings - Fork 947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support both Theano and Tensorflow backend #611
Comments
We discussed this as well when TensorFlow came out, and decided to wait until it's more mature and/or faster than Theano on GPU. One of the problems is that unlike Keras, Lasagne does not (and does not want to) abstract away from Theano, but accepts and returns Theano expressions. This makes it a bit harder to transparently switch backends. PS: I've seen that your "odin" project is a potpourri of Keras, Lasagne and your own (even including the README). Please make sure to adhere to the licenses of Keras and Lasagne, i.e., include their license texts where appropriate ("The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software."). Thanks! |
Thanks for your remind, the fact I just start the code and haven't got a clear direction, re-implement all Lasagne layers to be compatible with both backends would be a huge amount of work. Hence, I prefer to integrate it directly to Lasagne if possible. Since the expression of tensorflow and theano both represent a computational graph, it is also possible to manually manipulate the expression from keras networks. And I think this would be perfect features for Lasagne. However, It seems that the advantages of faster CPU does not surpass the drawback on huge amount of work now. |
I think a better strategy would be to create a sister library for TensorFlow, that provides largely the same API as Lasagne (where possible), but returns TensorFlow objects where Lasagne would return Theano objects. That would make it much easier for users and devs who only care about one or the other. Of course we'd need a few devs involved in both of them, to keep the APIs roughly in sync. But it would of course be fine if they didn't support exactly the same stuff. The main thing I would want to avoid is for it to slow things down. If we commit to supporting everything on both backends, suddenly implementing a feature is twice as much work. I think that would be a bad idea. |
I think that would be too much work. What about only create a wrapper that makes an interface for tensorflow to be similar to theano. Basically, all the code in Lasagne can be kept unchanged (maybe some operators like: x**2 must be changed to T.pow(x, 2)). At highest level of abstraction, from theano import tensor as T would be identical to from theano_tf import tensor as T. I haven't think this one through but I think it highly possible. So in this case, even if the tensorflow wrapper mess up, it doesn't affect the theano part anyway. btw, tensorflow just releases new version: allow installed Cuda >= 7.0 and cuDNN >= R2, and add support for cuDNN R4 It should be faster, but the key is how much. |
Possibly relevant thread on r/machinelearning yesterday. I haven't looked at Keras' code myself in any detail, but some commenters felt that the dual backend made the source harder to understand. Personally I think it sounds like a nightmare trying to keep everything working consistently and efficiently - we've had plenty of issues just with different Theano optimizations/implementations creating corner cases. |
I'm with @ebenolson here, which is why I think in the long term, having separate libraries is the way to go. The design principles underlying Lasagne, and even most of the interface choices we have made, are not closely tied to Theano. They could just as well be applied in a library built on top of TensorFlow instead (or even MXNet, CGT, ...). I don't think this would be too much work -- for one, other people could work on it (i.e., TensorFlow users), and there would be no extra burden for either "side" of the divide to keep up with the other. They would just follow the same design principles and make similar interface choices. An added benefit would be that it would be really easy to switch from one to the other (although this would not be a primary goal, more of a side effect). |
Agree with @ebenolson and @benanne. |
Will Lasagne team members still be involved in its development, or oversee the major design decisions somehow? IMO it's important that the Lasagne team has a say in what happens: to have some sort of quality assurance if this sister library carries the Lasagne label, and (to a lesser extent) to ensure that credit is given where it is due. Anyone can copy Lasagne and replace the theano operations by their tensorflow-equivalents, but high test-coverage and active discussion on issues and PRs is just as important -- if not more. |
Absolutely. I think most of the Lasagne core team is still using Theano almost exclusively, so it might take a while before such an undertaking even becomes feasible. |
With the current amount of active developers we won't be able to develop two libraries in parallel. Even on Lasagne alone we're not exactly progressing fast (partly because we're thinking things through extensively before forming design decisions, but also because participation on those discussions is... mhm... choppy). Maybe the most likely scenario is that we'll switch over to TensorFlow if and when it becomes more mature and faster than Theano (for now it has more disadvantages than advantages, at least for what I'm using Lasagne for). I.e., at some point we might create a sister library and focus on that, and just oversee porting things back to the Theano side as needed. Who knows! |
What about using something like: https://github.com/dementrock/tensorfuse ? |
Sounds good in theory, but that project looks quite limited and has no tests... also we would probably miss out on some features that TF offers over Theano. I think whatever we end up doing for the TF case, it should be as 'vanilla' as possible, because one of the key ideas behind Lasagne is not to hide the underlying library behind abstraction layers. |
+1 -- a Lasagne for TensorFlow will probably look a bit different in some respects, as far as needed to support multi-GPU and distributed models. I haven't looked at the TensorFlow API in detail, but I doubt a generic theano-to-tensorflow wrapper would suffice. |
This just got open-sourced: https://github.com/tensorflow/models/blob/master/inception/slim/README.md Some of the design principles are quite similar to those of Lasagne, although there are a few things that are not as flexible as they could be, imo (most notably weight initialization). Still, worth taking a look at :) |
Link in the previous comment is now broken. Here's a working one: https://github.com/tensorflow/models/blob/master/inception/inception/slim/README.md |
One more cross-breed: https://github.com/openai/rllab/blob/master/sandbox/rocky/tf/core/layers.py#L377 |
interesting, thanks for the heads up! |
The idea comes from keras: https://github.com/fchollet/keras/tree/master/keras/backend
Some functions have the same functional, but 2 backends can return different format of results. However, it is not a big issue if we carefully abstract them.
The biggest advantages of adding tensorflow backend is that it significantly speed up the building and running process on CPU. Hence, it helps a lot with the development process.
I ported updates.py and objectives.py from Lasagne to use multiple backend. Calculating the objective itself can be 3 times faster than Theano on CPU.
I am aware that some Lasagne modules are specialised for only Theano, however, I think it is possible to adapt all common layers for both frameworks.
The text was updated successfully, but these errors were encountered: