Feature Request: Transformer Debugger - Debugging and controlling the behavior of transformer based LLM models.

**Short Description**

> Transformer Debugger (TDB) is a tool developed by OpenAI's [Superalignment team](https://openai.com/blog/introducing-superalignment) with the goal of supporting investigations into specific behaviors of small language models. The tool combines [automated interpretability](https://openai.com/research/language-models-can-explain-neurons-in-language-models) techniques with [sparse autoencoders](https://transformer-circuits.pub/2023/monosemantic-features).
> 
> TDB enables rapid exploration before needing to write code, with the ability to intervene in the forward pass and see how it affects a particular behavior. It can be used to answer questions like, "Why does the model output token A instead of token B for this prompt?" or "Why does attention head H attend to token T for this prompt?" It does so by identifying specific components (neurons, attention heads, autoencoder latents) that contribute to the behavior, showing automatically generated explanations of what causes those components to activate most strongly, and tracing connections between components to help discover circuits.

**Paper**
https://arxiv.org/pdf/2211.00593v1.pdf

**Existing Implementations**
- (Official by OAI) https://github.com/openai/transformer-debugger
- Official Intrepretatibility in the Wild: https://github.com/redwoodresearch/Easy-Transformer

**Other Information**
This tool could be a very great guide to people working with the interpretability of LLM models. There are already a lot of LLM models in Keras-nlp and engineers might find it very useful while working on the deployment of the models to ensure the safety, reliability, intrepretability and control of the LLM models available here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Transformer Debugger - Debugging and controlling the behavior of transformer based LLM models. #1513

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Transformer Debugger - Debugging and controlling the behavior of transformer based LLM models. #1513

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions