Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Fix minor mistakes in vocab-embedding extension #2899

Closed
wants to merge 3 commits into from

Conversation

HarshTrivedi
Copy link
Contributor

  1. Make sure num_embeddings is updated after embedding extension.
  2. Actually pass pretrained_file from from_params to Embedding initializer. Don't know why this was missed!

1. Make sure `num_embeddings` is updated after embedding extension.
2. Actually pass `pretrained_file` from `from_params` to `Embedding` initializer. Don't know why this was missed!
@schmmd schmmd added this to the 0.8.4 milestone May 29, 2019
@joelgrus
Copy link
Contributor

looks fine, if this was secretly broken should we add a test for it?

@schmmd
Copy link
Member

schmmd commented May 30, 2019

@harsh19 I'm merging to unblock cutting a release--but please follow up with a PR with tests.

@HarshTrivedi
Copy link
Contributor Author

@joelgrus sorry, I didn't see the message yesterday and have been busy with something since morning. I will add the tests.

@HarshTrivedi
Copy link
Contributor Author

Making sure _pretrained_file attribute is set is still a problem. The goal of that attribute was to remember what pretrained file was used so that we can use it directly for extension (finetuning/evaluation time). But when archive is loaded, this removes it from params and from_params of Embedding never sees pretrained_file key.

@joelgrus any thoughts on what can be done?

@joelgrus
Copy link
Contributor

I can think of two obvious options:

  1. modify the remove_pretrained_embeddings function to skip Embedding modules
  2. rename that parameter to original_pretrained_file or something like that

@schmmd
Copy link
Member

schmmd commented May 31, 2019

Well shoot--I meant to merge this but instead I updated the branch and never revisited this so it's not in the last release. From the discussion above perhaps that's a good thing though.

@HarshTrivedi
Copy link
Contributor Author

  1. If I skip pretrained_file keys of Embedding in remove_pretrained_embedding_params, I think it defies its purpose. When model archive is loaded, this line can potentially fail if the path to pre-trained embedding file is not available.

  2. I don't understand how this can help. The problem is that params here don't have pretrained_file when archive is loaded.

A combination of both should work though. I can change remove_pretrained_embedding_params function to replace pretrained_file keys of Embedding module to original_pretrained_file. And then change the from_params of Embedding to handle both pretrained_file and original_pretrained_file . Basically, changing this line to

pretrained_file = params.pop("pretrained_file", None) or params.pop("original_pretrained_file", None)

@matt-gardner
Copy link
Contributor

@joelgrus, @HarshTrivedi, what's the status of this PR?

@HarshTrivedi
Copy link
Contributor Author

@matt-gardner the problem isn't solved yet. This will require some more work. I am planning to get back on this after my internship, which is ending in 2 weeks. Hope that's fine.

@schmmd
Copy link
Member

schmmd commented Sep 6, 2019

@harsh19 do you still intend to work on this?

@harsh19
Copy link

harsh19 commented Sep 6, 2019

I think you intended to tag @HarshTrivedi

@schmmd
Copy link
Member

schmmd commented Sep 6, 2019

Oops--sorry. @HarshTrivedi do you still intend to work on this?

@schmmd
Copy link
Member

schmmd commented Dec 13, 2019

Please comment here if you'd like us to reopen this PR.

@schmmd schmmd closed this Dec 13, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants