From 5fbebd936ea97445e3ad61a7789bcb56a7554a49 Mon Sep 17 00:00:00 2001
From: djliden <7102904+djliden@users.noreply.github.com>
Date: Fri, 15 Mar 2024 13:04:21 -0400
Subject: [PATCH 1/3] redefines tie_weights as no-op

The `tie_weights` function in the `OLMoForCausalLM` class is now a no-op
because weight tying is already handled in other parts of the codebase:

1. During model initialization in `olmo/model.py`:
   - The `ff_out` layer is conditionally defined based on the `weight_tying`
     configuration using `if not config.weight_tying: self.transformer.update(...)`.

2. When computing logits in the `forward` method:
   - The `wte` weights are used directly if `weight_tying` is enabled using
     `if self.config.weight_tying: logits = F.linear(x, self.transformer.wte.weight, None)`.

Since weight tying is handled explicitly in these places, there is no need
for the `tie_weights` function to perform any additional weight tying.

The function is updated with a docstring that explains the reasoning behind
the no-op and provides references to the relevant code snippets where weight
tying is handled.

This change improves code clarity and avoids redundant weight tying logic.
---
 hf_olmo/modeling_olmo.py | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/hf_olmo/modeling_olmo.py b/hf_olmo/modeling_olmo.py
index a1cc569f7..d80a54217 100644
--- a/hf_olmo/modeling_olmo.py
+++ b/hf_olmo/modeling_olmo.py
@@ -148,8 +148,18 @@ def set_output_embeddings(self, value: torch.nn.Module):
             self.model.transformer.ff_out = value
 
     def tie_weights(self):
-        if self.config.weight_tying:
-            self.model.transformer.ff_out = self.model.transformer.wte
+        """
+        This function is intentionally left as a no-op.
+
+        Weight tying is handled as follows:
+        - When the model is initialized, the `ff_out` layer is conditionally defined based on the `weight_tying` configuration.
+        See: `if not config.weight_tying: self.transformer.update(...)` in `olmo/model.py`.
+        - When computing logits, the `wte` weights are used directly if `weight_tying` is enabled.
+        See: `if self.config.weight_tying: logits = F.linear(x, self.transformer.wte.weight, None)` in the `forward` method.
+
+        Therefore, there is no need to explicitly tie the weights in this function.
+        """
+        pass
 
 
 # Register the model so that it is available for transformer pipelines, auto-loading, etc.

From c3465f0312563f7a275ccee148fdda7de25f22fe Mon Sep 17 00:00:00 2001
From: djliden <7102904+djliden@users.noreply.github.com>
Date: Fri, 15 Mar 2024 13:14:39 -0400
Subject: [PATCH 2/3] updates changelog

---
 CHANGELOG.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index b013e863c..c8c048508 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -16,9 +16,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - Removed `AMDLayerNorm`, since the original layer norm bug has been fixed and we don't need this workaround anymore.
 
 ### Fixed
-
 - Don't log garbage on nodes that aren't rank 0
 - Don't crash in the HF code when we are referring to a tokenizer in a local file
+- Changed `tie_weights` method to a no-op as weight tying is handled in olmo/model.py
 
 
 ## [v0.2.5](https://github.com/allenai/OLMo/releases/tag/v0.2.5) - 2024-03-06

From f9a68c35e150933374b7b6958b8dc9bfb47e5ee0 Mon Sep 17 00:00:00 2001
From: djliden <7102904+djliden@users.noreply.github.com>
Date: Fri, 15 Mar 2024 13:20:29 -0400
Subject: [PATCH 3/3] fixes spacing

---
 CHANGELOG.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index c8c048508..c9bd80e0a 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -16,6 +16,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - Removed `AMDLayerNorm`, since the original layer norm bug has been fixed and we don't need this workaround anymore.
 
 ### Fixed
+
 - Don't log garbage on nodes that aren't rank 0
 - Don't crash in the HF code when we are referring to a tokenizer in a local file
 - Changed `tie_weights` method to a no-op as weight tying is handled in olmo/model.py