The repository implements a system for interpreting language mechanisms using sparse autoencoders, named SAELing. The system aims to reveal and control the internal linguistic knowledge of large language models. We use SAELing to extract a large number of causal features from large language models. For details, see Sparse Auto-Encoder Interprets Linguistic Features in Large Language Models.
-
Notifications
You must be signed in to change notification settings - Fork 0
THU-KEG/Linguistic-SAE
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published