Tuned Lens 🔎

Tools for understanding how transformer predictions are built layer-by-layer.

This package provides a simple interface for training and evaluating tuned lenses. A tuned lens allows us to peek at the iterative computations a transformer uses to compute the next token.

What is a Lens?

A lens into a transformer with n layers allows you to replace the last m layers of the model with an affine transformation (we call these affine translators). Each affine translator is trained to minimize the KL divergence between its prediction and the final output distribution of the original model. This means that after training, the tuned lens allows you to skip over these last few layers and see the best prediction that can be made from the model's intermediate representations, i.e., the residual stream, at layer n - m.

The reason we need to train an affine translator is that the representations may be rotated, shifted, or stretched from layer to layer. This training differentiates this method from simpler approaches that unembed the residual stream of the network directly using the unembedding matrix, i.e., the logit lens. We explain this process and its applications in the paper Eliciting Latent Predictions from Transformers with the Tuned Lens.

Acknowledgments

Originally conceived by Igor Ostrovsky and Stella Biderman at EleutherAI, this library was built as a collaboration between FAR and EleutherAI researchers.

Install Instructions

Installing from PyPI

First, you will need to install the basic prerequisites into a virtual environment:

Python 3.9+
PyTorch 1.13.0+

Then, you can simply install the package using pip.

pip install tuned-lens

Installing the container

If you prefer to run the training scripts from within a container, you can use the provided Docker container.

docker pull ghcr.io/alignmentresearch/tuned-lens:latest
docker run --rm tuned-lens:latest tuned-lens --help

Contributing

Make sure to install the dev dependencies and install the pre-commit hooks.

$ git clone https://github.com/AlignmentResearch/tuned-lens.git
$ pip install -e ".[dev]"
$ pre-commit install

Citation

If you find this library useful, please cite it as:

@article{belrose2023eliciting,
  title={Eliciting Latent Predictions from Transformers with the Tuned Lens},
  authors={Belrose, Nora and Furman, Zach and Smith, Logan and Halawi, Danny and McKinney, Lev and Ostrovsky, Igor and Biderman, Stella and Steinhardt, Jacob},
  journal={to appear},
  year={2023}
}

Warning This package has not reached 1.0. Expect the public interface to change regularly and without a major version bumps.

Name		Name	Last commit message	Last commit date
Latest commit History 220 Commits
.github		.github
.vscode		.vscode
docs		docs
notebooks		notebooks
tests		tests
tuned_lens		tuned_lens
.codecov.yml		.codecov.yml
.coveragerc		.coveragerc
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
CITATION.cff		CITATION.cff
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tuned Lens 🔎

What is a Lens?

Acknowledgments

Install Instructions

Installing from PyPI

Installing the container

Contributing

Citation

About

Releases 5

Packages

Contributors 9

Languages

License

AlignmentResearch/tuned-lens

Folders and files

Latest commit

History

Repository files navigation

Tuned Lens 🔎

What is a Lens?

Acknowledgments

Install Instructions

Installing from PyPI

Installing the container

Contributing

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 5

Packages 0

Contributors 9

Languages

Packages