Skip to content

UFO-101/auto-circuit

Repository files navigation

PyPI - Version GitHub Release

AutoCircuit

A library for efficient patching and automatic circuit discovery.

Static Badge

Read the paper

Transformer Circuit Metrics are not Robust (Oral spotlight, COLM 2024)

Getting Started

pip install auto-circuit

Easy and Efficient Edge Patching

patch_edges = [
"Resid Start->MLP 2",
"MLP 2->A2.4.Q",
"A2.4->Resid End",
]
with patch_mode(model, ablations, patch_edges):
patched_out = model(tokens)

Different Ablation Methods

ablations = src_ablations(model, test_loader, AblationType.TOKENWISE_MEAN_CORRUPT)

Automatic Circuit Discovery

attrution_patching_scores: PruneScores = mask_gradient_prune_scores(
model=model,
dataloader=test_loader,
official_edges=None,
grad_function="logit",
answer_function="avg_diff",
)

Visualization

fig = draw_seq_graph(model, prune_scores)