Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change tie_weights to no-op and add docstring for weight tying clarification #503

Merged
merged 8 commits into from
Apr 2, 2024

Conversation

djliden
Copy link
Contributor

@djliden djliden commented Mar 15, 2024

This pull request addresses #485 and updates the tie_weights function in the OLMoForCausalLM class to be a no-op and adds a docstring explaining the reasoning behind this change.

The tie_weights function is being made a no-op because weight tying is already handled explicitly in other parts of the codebase. This change avoids redundant weight tying logic and improves code clarity.

The added docstring explains why the function is a no-op and directs developers to the specific locations where weight tying is handled. This improves code readability and maintainability.

This doesn't fix the warning message from the accelerate library, which expects a fundamentally different mechanism for weight tying.

@2015aroras ready for your initial review—thanks!

djliden and others added 4 commits March 15, 2024 13:04
The `tie_weights` function in the `OLMoForCausalLM` class is now a no-op
because weight tying is already handled in other parts of the codebase:

1. During model initialization in `olmo/model.py`:
   - The `ff_out` layer is conditionally defined based on the `weight_tying`
     configuration using `if not config.weight_tying: self.transformer.update(...)`.

2. When computing logits in the `forward` method:
   - The `wte` weights are used directly if `weight_tying` is enabled using
     `if self.config.weight_tying: logits = F.linear(x, self.transformer.wte.weight, None)`.

Since weight tying is handled explicitly in these places, there is no need
for the `tie_weights` function to perform any additional weight tying.

The function is updated with a docstring that explains the reasoning behind
the no-op and provides references to the relevant code snippets where weight
tying is handled.

This change improves code clarity and avoids redundant weight tying logic.
@2015aroras 2015aroras requested a review from AkshitaB March 15, 2024 22:31
@2015aroras 2015aroras self-requested a review March 18, 2024 23:38
Copy link
Collaborator

@2015aroras 2015aroras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@2015aroras 2015aroras merged commit db2dee2 into allenai:main Apr 2, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants