Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SJE Normalization #2

Open
tchittesh opened this issue Apr 28, 2021 · 1 comment
Open

SJE Normalization #2

tchittesh opened this issue Apr 28, 2021 · 1 comment

Comments

@tchittesh
Copy link

I believe that line 112 of SJE/sje.py (replicated here) serves to normalize the projected image feature vector before taking its dot product with the class embeddings.
line 112: XW = preprocessing.scale(XW)

However, using this function normalizes the elements of the vector XW such that it has zero mean and unit variance, instead of scaling it by its L2 norm. This would then change the direction of XW itself, making the dot product meaningless.

Should this be changed to the following?
line 112: XW = XW / np.linalg.norm(XW)

What's surprising to me is that decent results are achieved even with the original version. I tested a few hyperparameter configs with the new code (by no means the full grid search), and achieved similar results on AWA2 with a lower margin of 0.25 (makes sense given that it now actually has unit norm).

@mvp18
Copy link
Owner

mvp18 commented May 14, 2021

Hi @tchittesh,

Sorry for the super late reply. In hindsight, I can't really recall the logic behind line 112. Going by the algorithm in the paper, XW shouldn't need normalization of any sort. But L2 can be an option considering class embeddings are also L2 normalized.

Let me know if you get better numbers on any of the datasets with L2 or no-scaling. I'll be happy to update accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants