You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently Tensorflow(TF) backend is about 2-4X slower than Theano on gpu during runtime.
I was really intrigued by the result, since according to the benchmark, vanilla TF's performance actually comes quite close to most of the best-performers in the field.
After spending the past few days tinkering with the TF backend, I found 2 possible ways to bring down the TF runtime, and I think I should share publicly, so others can save their time.
1. The image_dim_ordering setting
according to this thread setting the dim_ordering parameter to 'tf' could cut the runtime in half, and that's because TF's default input shape for image is different from that of Theano, and we are doing additional transpose operations when we encounter 'th' dim_ordering. (The transpose ops seem quite redundant as of TF 0.8.0, and I have opened another issue discussing why. #3149)
2. modify TF-backend's relu code
Theano has built-in leaky-relu support, and TF doesn't. To be back-end agnostic, Keras added external support for leaky relu in TF backend, and the code looks like this
(from keras/backend/tensorflow_backend.py)
However, with this implementation, TF is forced to compute the values and the gradients for the negative parts even when alpha is 0. To avoid this, Theano uses a switch internally to skip the calculation when alpha is 0. I tried to mimic the switch operation with
defrelu(x, alpha=0., max_value=None):
negative_part=tf.nn.relu(-x)
x=tf.nn.relu(x)
ifmax_valueisnotNone:
x=tf.clip_by_value(x, tf.cast(0., dtype=_FLOATX),
tf.cast(max_value, dtype=_FLOATX))
ifisinstance(alpha, (tuple, list, np.ndarray)) ornp.isscalar(alpha):
alpha=tf.constant(alpha, dtype=_FLOATX)
leaked_x=x-alpha*negative_partx=switch(alpha, leaked_x, x) #switch is defined in the original tensorflow_backend.pyreturnx
but for some reason, it doesn't reduce the runtime at all, so I just temporarily commented out the leaky calculations like this.
Right now, with both fixes, I was able to bring the runtime of mnist_cnn.py with TF backend to about 1.5X of Theano backend. There is probably still some room for improvement, but I haven't found a good way to profile Keras code, so anything beyond this will be quite hard for me.
The text was updated successfully, but these errors were encountered:
Currently Tensorflow(TF) backend is about 2-4X slower than Theano on gpu during runtime.
I was really intrigued by the result, since according to the benchmark, vanilla TF's performance actually comes quite close to most of the best-performers in the field.
After spending the past few days tinkering with the TF backend, I found 2 possible ways to bring down the TF runtime, and I think I should share publicly, so others can save their time.
1. The image_dim_ordering setting
according to this thread setting the dim_ordering parameter to 'tf' could cut the runtime in half, and that's because TF's default input shape for image is different from that of Theano, and we are doing additional transpose operations when we encounter 'th' dim_ordering. (The transpose ops seem quite redundant as of TF 0.8.0, and I have opened another issue discussing why. #3149)
2. modify TF-backend's relu code
Theano has built-in leaky-relu support, and TF doesn't. To be back-end agnostic, Keras added external support for leaky relu in TF backend, and the code looks like this
(from keras/backend/tensorflow_backend.py)
However, with this implementation, TF is forced to compute the values and the gradients for the negative parts even when alpha is 0. To avoid this, Theano uses a switch internally to skip the calculation when alpha is 0. I tried to mimic the switch operation with
but for some reason, it doesn't reduce the runtime at all, so I just temporarily commented out the leaky calculations like this.
Right now, with both fixes, I was able to bring the runtime of mnist_cnn.py with TF backend to about 1.5X of Theano backend. There is probably still some room for improvement, but I haven't found a good way to profile Keras code, so anything beyond this will be quite hard for me.
The text was updated successfully, but these errors were encountered: