Skip to content
This repository was archived by the owner on Apr 23, 2025. It is now read-only.

Conversation

@mikowals
Copy link
Contributor

If widenFactor is greater than 1, the input shape of the first residual block had the wrong number of input channels. Similarly the same filterShape cannot be used for both conv1 and conv2 when the number of channels is being increased.

Added identity connections, dropout between convolutions, and the use of preactivation in the 1x1 conv2D shortcuts.

The paper also uses a weight decay of 5e-4. I can add a function for this but I have not seen any implementations of L2 decay in any of the models so I wondered if there was some plan to add it somewhere else in the API.

If widenFactor is greater than 1, the input shape of the first residual block had the wrong number of input channels.  Similarly the same filterShape cannot be used for both conv1 and conv2 when the number of channels is being increased.

Added identity connections, dropout between convolutions, and the use of preactivation in the 1x1 conv2D shortcuts.  

The paper also uses a weight decay of 5e-4.  I can add a function for this but I have not seen any implementations of L2 decay in any of the models so I wondered if there was some plan to add it somewhere else in the API.
@brettkoonce
Copy link
Contributor

Thanks for debugging this + adding dropout! I was working off of this implementation for reference, is that where I missed the identity layer?

Re weight decay, I haven't seen a pattern yet for defining model-specific optimizer values, if you have any suggestions/ideas I'm sure they're welcome!

@mikowals
Copy link
Contributor Author

I just spent a couple of days trying to train the network. Ultimately I discovered a bug in _vjpRsqrt which is called in the gradient calculation of the BatchNorm layer.

Now it looks like the changes to autodifferentiation code (possibly related to the removal of CotangentVector) is breaking Array.differentiableReduce. I attempted a quick fix replacing all ContangentVector references with TangentVector but no luck. I am working in Colab with "(LLVM 082dec2e22, Swift f0e8864)". The Colab session crashes with no errors displayed building the call function with Blocks.differentiableReduce, so I assume it is the attempt to autodiff that call function that is causing the crash.

Anyway by manually iterating through blocks of a depthFactor 6, widthFactor 4 model. I can see that the network does now train as expected.

@googlebot
Copy link

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and have the pull request author add another comment and the bot will run again. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

@rxwei
Copy link
Contributor

rxwei commented May 24, 2019

I’ll take a look at the differentiableReduce breakage tonight.

@mikowals
Copy link
Contributor Author

This pull request runs with the differentiable reduce as updated by Brett.

The crash I was seeing was because I had added an L2 loss locally. It appears to be also related to the differentiableReduce and the updated autodiff code also but I will try to create a minimal reproduction and file the bug separately.

@mikowals
Copy link
Contributor Author

I have created TF-533 for the differentiableReduce bug mentioned above.

pschuh pushed a commit to pschuh/swift-models that referenced this pull request Jul 30, 2019
…orflow#158)

* Moving the tests for TensorGroup from swift repo to swift-apis.
* Fix indentation and whitespace issues.
@mikowals
Copy link
Contributor Author

mikowals commented Aug 7, 2019

replaced by #193.

@mikowals mikowals closed this Aug 7, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants