-
Notifications
You must be signed in to change notification settings - Fork 150
debug filterShapes and match network to citation #158
Conversation
If widenFactor is greater than 1, the input shape of the first residual block had the wrong number of input channels. Similarly the same filterShape cannot be used for both conv1 and conv2 when the number of channels is being increased. Added identity connections, dropout between convolutions, and the use of preactivation in the 1x1 conv2D shortcuts. The paper also uses a weight decay of 5e-4. I can add a function for this but I have not seen any implementations of L2 decay in any of the models so I wondered if there was some plan to add it somewhere else in the API.
|
Thanks for debugging this + adding dropout! I was working off of this implementation for reference, is that where I missed the identity layer? Re weight decay, I haven't seen a pattern yet for defining model-specific optimizer values, if you have any suggestions/ideas I'm sure they're welcome! |
|
I just spent a couple of days trying to train the network. Ultimately I discovered a bug in _vjpRsqrt which is called in the gradient calculation of the BatchNorm layer. Now it looks like the changes to autodifferentiation code (possibly related to the removal of CotangentVector) is breaking Anyway by manually iterating through blocks of a |
|
We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google. ℹ️ Googlers: Go here for more info. |
|
I’ll take a look at the differentiableReduce breakage tonight. |
|
This pull request runs with the differentiable reduce as updated by Brett. The crash I was seeing was because I had added an L2 loss locally. It appears to be also related to the differentiableReduce and the updated autodiff code also but I will try to create a minimal reproduction and file the bug separately. |
|
I have created TF-533 for the differentiableReduce bug mentioned above. |
…orflow#158) * Moving the tests for TensorGroup from swift repo to swift-apis. * Fix indentation and whitespace issues.
|
replaced by #193. |
If widenFactor is greater than 1, the input shape of the first residual block had the wrong number of input channels. Similarly the same filterShape cannot be used for both conv1 and conv2 when the number of channels is being increased.
Added identity connections, dropout between convolutions, and the use of preactivation in the 1x1 conv2D shortcuts.
The paper also uses a weight decay of 5e-4. I can add a function for this but I have not seen any implementations of L2 decay in any of the models so I wondered if there was some plan to add it somewhere else in the API.