Transformer bridge layer norm folding #1071

bryce13950 · 2025-09-27T17:37:15Z

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (issue)

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Screenshots

Please attach before and after screenshots of the change if applicable.

Checklist:

I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have not rewritten tests relating to key interfaces which would affect backward compatibility

* created individual processing functions * extracted state dict and inserted back into instance after processing * created weight processing shared class * added test coverage for new functions * updated hooked transformer to use new shared functions * created test * moved over weight processing * replaced keys * used the correct function * created test for making sure path translation works correctly * fixed weight processing * added additional tests * formatted tests a bit * cleaned up * fixed unit test * fixed indentation * fixed doc string * fixed unit test * fixed type * fixed some tests * fixed test * fixed setup of tests

* created individual processing functions * extracted state dict and inserted back into instance after processing * created weight processing shared class * added test coverage for new functions * updated hooked transformer to use new shared functions * created test * moved over weight processing * replaced keys * used the correct function * created test for making sure path translation works correctly * fixed weight processing * added additional tests * formatted tests a bit * cleaned up * fixed unit test * fixed indentation * fixed doc string * fixed unit test * fixed type * fixed some tests * fixed test * fixed setup of tests * cleaned up test * started working through individual matches * added test coverage * tested function a bit * integrated weight conversion into weight proccessing * simplified functions * identified individual problem lines * identified divergences more clearly * brought back error lines

…for already initialized components (#1066)

* imporoved accuracy a bit * got models to match * removed forward pass stuff * cleaned up weight processing a bit * removed working attention * restored files

* imporoved accuracy a bit * got models to match * removed forward pass stuff * cleaned up weight processing a bit * removed working attention * restored files * created loop to verify weight conversion * finished compatibility layer * finished testing hugging face weights * setup correct init * added some tests * removed seperate component * fixed some integration tests

* imporoved accuracy a bit * got models to match * removed forward pass stuff * cleaned up weight processing a bit * removed working attention * restored files * created loop to verify weight conversion * finished compatibility layer * finished testing hugging face weights * setup correct init * added some tests * removed seperate component * fixed some integration tests * fixed typing issue * fixed typing and format issues * fixed ci issues * ran format * fixed mypy issues * removed extra file * removed old scripts * tested format * fixed some tests * ran format * fixed tests * fixed acceptance tests * fixed some more tests * synced functionality completely * reduced old references * removed remaining references * moved forward functions * removed forward * tested various forwards * worked on getting original forwards back into place * added more coverage * cleaned up model * git status * Fix automatic weight extraction to use reference HookedTransformer This restores the working weight extraction mechanism that creates a reference HookedTransformer internally and extracts exact processed weights for perfect compatibility with ablation studies. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * moved embed stuff from bridge * moved MLP stuff * claned up a bit * cleaned up a bit * removed extra block * created pos embed bridge * fixed unembed --------- Co-authored-by: Claude <[email protected]>

* moved final layer norm * moved layer norm forward * cleaned up more things * updated attention weight loading * fixed function names

* fixed some ci issues * fixed type issues * ran format * fixed test * fixed type issues * fixed type issue * fixed type issue * fixed test * fixed test * fixed issues * ran format * fixed typing * fixed tests * fixed tests * simplified test * sped up tests * added check for kv cache * ran format * skipped some tests * marked a couple tests to skip * ran some more optimizations * ran poetry lock * regenerated lock * fixed commands * set random seed * updated parallelism prop * updated command * reverted some changes * updated notebook settings * updated verbosity * removed extra test * cleaned up tests some more * marked test as skipped * fixed more tests * sped up CI * reverted CI changes * reverted actions changes * improved cache * sped up some tests * optimzed more tests * sped up some more tests * made more speed improvements * fixed error * fixed typing

bryce13950 and others added 8 commits September 21, 2025 17:57

Add missing configuration parameters

6a00384

Add missing configuration parameters

7e691b3

Properly set up normalization_type and layer_norm_folding attributes …

8b434d7

…for already initialized components (#1066)

Process accuracy (#1067)

c0255e8

* imporoved accuracy a bit * got models to match * removed forward pass stuff * cleaned up weight processing a bit * removed working attention * restored files

bryce13950 changed the title ~~Dev 3.x folding~~ Transformer bridge layer norm folding Sep 27, 2025

bryce13950 and others added 3 commits September 29, 2025 22:33

Revision extra forwards (#1073)

ddcb4f5

* moved final layer norm * moved layer norm forward * cleaned up more things * updated attention weight loading * fixed function names

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Transformer bridge layer norm folding #1071

Transformer bridge layer norm folding #1071

Uh oh!

bryce13950 commented Sep 27, 2025

Uh oh!

Uh oh!

Transformer bridge layer norm folding #1071

Are you sure you want to change the base?

Transformer bridge layer norm folding #1071

Uh oh!

Conversation

bryce13950 commented Sep 27, 2025

Description

Type of change

Screenshots

Checklist:

Uh oh!

Uh oh!