Implementation of ATKD #9

akshaykulkarni07 · 2020-05-31T07:00:38Z

Adding implementation of Attention Transfer KD (ATKD) which is an ICLR '17 paper. Some inspiration for the source code is from their official implementation.

Please go through the code (in case there are any obvious mistakes).

I ran one experiment with ResNet10 full data and the result is 92% validation accuracy (comparable to 92.2% of simultaneous KD for same settings).

akshaykulkarni07 · 2020-06-03T15:59:41Z

The authors mention that they use beta = 1000 / (batch size * number of elements in attention map) (around 0.1). They also say that they decay this beta parameter when using it along with KD.
Now, we have 2 choices:

Use beta = 1 because our implementation doesn't rely on such weighting.
Use beta as in the paper. However, they don't mention how they actually decay it and their code is not exactly readable (but if you can go through and find, then it would be good).

Please advise on what to do? As far as implementation is concerned, both are equally straightforward.

navidpanchi

Everything looks good to me.

SharathRaparthy

Not completely aware of this method but approving

SharathRaparthy · 2020-06-06T23:00:38Z

The authors mention that they use beta = 1000 / (batch size * number of elements in attention map) (around 0.1). They also say that they decay this beta parameter when using it along with KD.
Now, we have 2 choices:

Use beta = 1 because our implementation doesn't rely on such weighting.

Use beta as in the paper. However, they don't mention how they actually decay it and their code is not exactly readable (but if you can go through and find, then it would be good).

Please advise on what to do? As far as implementation is concerned, both are equally straightforward.

Implement #1 as of now and let's see how the experiments go. In the meanwhile, you can look into the decaying. Are they using any schedulers?

akshaykulkarni07 · 2020-06-07T05:11:23Z

@navidpanchi @SharathRaparthy
Some more information of their experiments:

Use SGD with weight decay (1e-4) (L2 regularization) (we use Adam without weight decay)
Use learning rate scheduling in steps. Multiply LR by 0.1 at 30, 60, 90 epochs. (we don't use any LR scheduling in image classification experiments).
No information about decaying the beta parameter. They mention in their README that they plan to add the code for it, but last commit was in July 2018. There is also an open issue regarding this without reply.

akshaykulkarni07 added 2 commits May 31, 2020 12:24

added initial atkd files (part 1)

0e746eb

added initial atkd files (part 2)

b54e8ba

akshaykulkarni07 added the enhancement New feature or request label May 31, 2020

akshaykulkarni07 requested review from navidpanchi and SharathRaparthy May 31, 2020 07:00

akshaykulkarni07 self-assigned this May 31, 2020

akshaykulkarni07 marked this pull request as ready for review May 31, 2020 07:00

akshaykulkarni07 mentioned this pull request Jun 5, 2020

Implementation of FSPKD #10

Merged

navidpanchi reviewed Jun 6, 2020

View reviewed changes

SharathRaparthy approved these changes Jun 6, 2020

View reviewed changes

akshaykulkarni07 merged commit 0b7e95d into master Jun 7, 2020

akshaykulkarni07 deleted the atkd branch June 7, 2020 18:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of ATKD #9

Implementation of ATKD #9

akshaykulkarni07 commented May 31, 2020 •

edited

Loading

akshaykulkarni07 commented Jun 3, 2020

navidpanchi left a comment

SharathRaparthy left a comment

SharathRaparthy commented Jun 6, 2020

akshaykulkarni07 commented Jun 7, 2020 •

edited

Loading

Implementation of ATKD #9

Implementation of ATKD #9

Conversation

akshaykulkarni07 commented May 31, 2020 • edited Loading

akshaykulkarni07 commented Jun 3, 2020

navidpanchi left a comment

Choose a reason for hiding this comment

SharathRaparthy left a comment

Choose a reason for hiding this comment

SharathRaparthy commented Jun 6, 2020

akshaykulkarni07 commented Jun 7, 2020 • edited Loading

akshaykulkarni07 commented May 31, 2020 •

edited

Loading

akshaykulkarni07 commented Jun 7, 2020 •

edited

Loading