-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trainable Tokens: Support for Weight Tying #2399
Draft
githubnemo
wants to merge
10
commits into
huggingface:main
Choose a base branch
from
githubnemo:feature/custom-token-tuner-weight-tying
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Trainable Tokens: Support for Weight Tying #2399
githubnemo
wants to merge
10
commits into
huggingface:main
from
githubnemo:feature/custom-token-tuner-weight-tying
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
6 tasks
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
69948b9
to
ac70db6
Compare
Notably we are removing the duplication filter of `named_modules` when searching for the (tied) target modules since tied weights are by definition duplicates.
It's now possible to let the adapter decide which is the input embedding layer based on the output of `model.get_input_embeddings()`. If that fails, the default is still `embed_tokens`.
This is probably just a case of model misconfiguration but there are cases in the tests where tie_embedding_weights is set to true in the config but no tied_weights_keys is set on the model.
Before this change only the selection of the module that was supposed to have the queried attribute was given to the wrapper implemention (via `_{has,get}attr_wrapped`). Now the full `getattr()` call is done by the implementation. This change is motivated by the need for access to `embedding.weight` at certain times which, for `ModulesToSaveWrapper` is not a problem - but it is for `TrainableTokensWrapper` since the original module's weights differ from the current weights, at least potentially. What we do now is to merge the weights and return those when `embedding.weight` is accessed. No other attributes are currently forwarded.
Mixed batch is still broken, though.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a follow-up PR of #2376 to add support for weight-tying. Do not merge before the other is not merged.
What is this
Some models, such as gpt2, tie the weights between the LM head and the input embeddings for various reasons. If we use the trainable tokens adapter, we're changing the result of the
forward()
of the input embeddings but we do not change the weights (unless wemerge()
). This means that the changes are not reflected in the tied weights, such as the LM head, leading to wrong results when training.How it is solved
The current approach is searching for tied layers and putting
TrainableTokensLayer
adapters on them as well but initialized to use the parameters from the embedding layer'sTrainableTokensLayer
. This is done via thetied_adapter
argument ofTrailableTokensLayer.__init__()
.What needs to be done
TrainableTokens
adapter