Skip to content

Conversation

bryce13950
Copy link
Collaborator

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (issue)

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Screenshots

Please attach before and after screenshots of the change if applicable.

Checklist:

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have not rewritten tests relating to key interfaces which would affect backward compatibility

bryce13950 and others added 8 commits September 21, 2025 17:57
* created individual processing functions

* extracted state dict and inserted back into instance after processing

* created weight processing shared class

* added test coverage for new functions

* updated hooked transformer to use new shared functions

* created test

* moved over weight processing

* replaced keys

* used the correct function

* created test for making sure path translation works correctly

* fixed weight processing

* added additional tests

* formatted tests a bit

* cleaned up

* fixed unit test

* fixed indentation

* fixed doc string

* fixed unit test

* fixed type

* fixed some tests

* fixed test

* fixed setup of tests
* created individual processing functions

* extracted state dict and inserted back into instance after processing

* created weight processing shared class

* added test coverage for new functions

* updated hooked transformer to use new shared functions

* created test

* moved over weight processing

* replaced keys

* used the correct function

* created test for making sure path translation works correctly

* fixed weight processing

* added additional tests

* formatted tests a bit

* cleaned up

* fixed unit test

* fixed indentation

* fixed doc string

* fixed unit test

* fixed type

* fixed some tests

* fixed test

* fixed setup of tests

* cleaned up test

* started working through individual matches

* added test coverage

* tested function a bit

* integrated weight conversion into weight proccessing

* simplified functions

* identified individual problem lines

* identified divergences more clearly

* brought back error lines
* created individual processing functions

* extracted state dict and inserted back into instance after processing

* created weight processing shared class

* added test coverage for new functions

* updated hooked transformer to use new shared functions

* created test

* moved over weight processing

* replaced keys

* used the correct function

* created test for making sure path translation works correctly

* fixed weight processing

* added additional tests

* formatted tests a bit

* cleaned up

* fixed unit test

* fixed indentation

* fixed doc string

* fixed unit test

* fixed type

* fixed some tests

* fixed test

* fixed setup of tests

* cleaned up test

* started working through individual matches

* added test coverage

* tested function a bit

* integrated weight conversion into weight proccessing

* simplified functions

* identified individual problem lines

* identified divergences more clearly

* brought back error lines
* imporoved accuracy a bit

* got models to match

* removed forward pass stuff

* cleaned up weight processing a bit

* removed working attention

* restored files
* imporoved accuracy a bit

* got models to match

* removed forward pass stuff

* cleaned up weight processing a bit

* removed working attention

* restored files

* created loop to verify weight conversion

* finished compatibility layer

* finished testing hugging face weights

* setup correct init

* added some tests

* removed seperate component

* fixed some integration tests
@bryce13950 bryce13950 changed the title Dev 3.x folding Transformer bridge layer norm folding Sep 27, 2025
bryce13950 and others added 3 commits September 29, 2025 22:33
* imporoved accuracy a bit

* got models to match

* removed forward pass stuff

* cleaned up weight processing a bit

* removed working attention

* restored files

* created loop to verify weight conversion

* finished compatibility layer

* finished testing hugging face weights

* setup correct init

* added some tests

* removed seperate component

* fixed some integration tests

* fixed typing issue

* fixed typing and format issues

* fixed ci issues

* ran format

* fixed mypy issues

* removed extra file

* removed old scripts

* tested format

* fixed some tests

* ran format

* fixed tests

* fixed acceptance tests

* fixed some more tests

* synced functionality completely

* reduced old references

* removed remaining references

* moved forward functions

* removed forward

* tested various forwards

* worked on getting original forwards back into place

* added more coverage

* cleaned up model

* git status

* Fix automatic weight extraction to use reference HookedTransformer

This restores the working weight extraction mechanism that creates a reference
HookedTransformer internally and extracts exact processed weights for perfect
compatibility with ablation studies.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>

* moved embed stuff from bridge

* moved MLP stuff

* claned up a bit

* cleaned up a bit

* removed extra block

* created pos embed bridge

* fixed unembed

---------

Co-authored-by: Claude <[email protected]>
* moved final layer norm

* moved layer norm forward

* cleaned up more things

* updated attention weight loading

* fixed function names
* fixed some ci issues

* fixed type issues

* ran format

* fixed test

* fixed type issues

* fixed type issue

* fixed type issue

* fixed test

* fixed test

* fixed issues

* ran format

* fixed typing

* fixed tests

* fixed tests

* simplified test

* sped up tests

* added check for kv cache

* ran format

* skipped some tests

* marked a couple tests to skip

* ran some more optimizations

* ran poetry lock

* regenerated lock

* fixed commands

* set random seed

* updated parallelism prop

* updated command

* reverted some changes

* updated notebook settings

* updated verbosity

* removed extra test

* cleaned up tests some more

* marked test as skipped

* fixed more tests

* sped up CI

* reverted CI changes

* reverted actions changes

* improved cache

* sped up some tests

* optimzed more tests

* sped up some more tests

* made more speed improvements

* fixed error

* fixed typing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants