From 5fbebd936ea97445e3ad61a7789bcb56a7554a49 Mon Sep 17 00:00:00 2001 From: djliden <7102904+djliden@users.noreply.github.com> Date: Fri, 15 Mar 2024 13:04:21 -0400 Subject: [PATCH 1/3] redefines tie_weights as no-op The `tie_weights` function in the `OLMoForCausalLM` class is now a no-op because weight tying is already handled in other parts of the codebase: 1. During model initialization in `olmo/model.py`: - The `ff_out` layer is conditionally defined based on the `weight_tying` configuration using `if not config.weight_tying: self.transformer.update(...)`. 2. When computing logits in the `forward` method: - The `wte` weights are used directly if `weight_tying` is enabled using `if self.config.weight_tying: logits = F.linear(x, self.transformer.wte.weight, None)`. Since weight tying is handled explicitly in these places, there is no need for the `tie_weights` function to perform any additional weight tying. The function is updated with a docstring that explains the reasoning behind the no-op and provides references to the relevant code snippets where weight tying is handled. This change improves code clarity and avoids redundant weight tying logic. --- hf_olmo/modeling_olmo.py | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/hf_olmo/modeling_olmo.py b/hf_olmo/modeling_olmo.py index a1cc569f7..d80a54217 100644 --- a/hf_olmo/modeling_olmo.py +++ b/hf_olmo/modeling_olmo.py @@ -148,8 +148,18 @@ def set_output_embeddings(self, value: torch.nn.Module): self.model.transformer.ff_out = value def tie_weights(self): - if self.config.weight_tying: - self.model.transformer.ff_out = self.model.transformer.wte + """ + This function is intentionally left as a no-op. + + Weight tying is handled as follows: + - When the model is initialized, the `ff_out` layer is conditionally defined based on the `weight_tying` configuration. + See: `if not config.weight_tying: self.transformer.update(...)` in `olmo/model.py`. + - When computing logits, the `wte` weights are used directly if `weight_tying` is enabled. + See: `if self.config.weight_tying: logits = F.linear(x, self.transformer.wte.weight, None)` in the `forward` method. + + Therefore, there is no need to explicitly tie the weights in this function. + """ + pass # Register the model so that it is available for transformer pipelines, auto-loading, etc. From c3465f0312563f7a275ccee148fdda7de25f22fe Mon Sep 17 00:00:00 2001 From: djliden <7102904+djliden@users.noreply.github.com> Date: Fri, 15 Mar 2024 13:14:39 -0400 Subject: [PATCH 2/3] updates changelog --- CHANGELOG.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index b013e863c..c8c048508 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -16,9 +16,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - Removed `AMDLayerNorm`, since the original layer norm bug has been fixed and we don't need this workaround anymore. ### Fixed - - Don't log garbage on nodes that aren't rank 0 - Don't crash in the HF code when we are referring to a tokenizer in a local file +- Changed `tie_weights` method to a no-op as weight tying is handled in olmo/model.py ## [v0.2.5](https://github.com/allenai/OLMo/releases/tag/v0.2.5) - 2024-03-06 From f9a68c35e150933374b7b6958b8dc9bfb47e5ee0 Mon Sep 17 00:00:00 2001 From: djliden <7102904+djliden@users.noreply.github.com> Date: Fri, 15 Mar 2024 13:20:29 -0400 Subject: [PATCH 3/3] fixes spacing --- CHANGELOG.md | 1 + 1 file changed, 1 insertion(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index c8c048508..c9bd80e0a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -16,6 +16,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - Removed `AMDLayerNorm`, since the original layer norm bug has been fixed and we don't need this workaround anymore. ### Fixed + - Don't log garbage on nodes that aren't rank 0 - Don't crash in the HF code when we are referring to a tokenizer in a local file - Changed `tie_weights` method to a no-op as weight tying is handled in olmo/model.py