Pico Analyze is a companion toolkit to pico-train, designed specifically for studying and visualizing the learning dynamics of language models. Whether you want to track activation sparsity, compare layers across checkpoints, or probe the evolution of specific attention heads, Pico Analyze has you covered.
For a detailed run-through, check out the full tutorial on our website at picolm.io.
-
Rich Checkpoint Compatibility
- Seamlessly loads model states, gradients, and activations stored by pico-train
- Automatically handles standard PyTorch and Hugging Face–compatible checkpoints
-
Modular Analysis System
- Components: Specify which parts of the model (e.g., weights, gradients, activations) to analyze
- Metrics: Apply built-in metrics like CKA, PWCCA, PER, Gini, Hoyer, and more
-
Deep Learning Dynamics Insights
- Compare multiple checkpoints from different training steps
- Visualize how parameters evolve over time using comprehensive logs or Weights & Biases integration
-
Config-Driven & Extensible
- Simple YAML config to define which steps, layers, metrics, and components to analyze
- Easily register custom metrics or components by subclassing and decorating with
@register_metric
or@register_component
-
Clone the Repository
git clone https://github.com/pico-lm/pico-analyze cd pico-analyze
-
Configure Environment
Create a
.env
file at the root with your Hugging Face and Weights & Biases tokens:export HF_TOKEN=your_huggingface_token export WANDB_API_KEY=your_wandb_key
-
Install Dependencies
source setup.sh
This script checks your environment, installs necessary tools, and sets up a Poetry virtual environment.
-
Prepare Your Checkpoints
Make sure you have checkpoints generated by pico-train—either locally or hosted on Hugging Face. -
Create an Analysis Config
Define a YAML file specifying:- Which checkpoints to analyze (by step or revision tag)
- Which components (weights, activations, gradients)
- Which metrics (CKA, Gini, etc.)
# configs/my_analysis_config.yaml analysis_name: "my_analysis" steps: - 0 - 1000 - 5000 metrics: - metric_name: cka data_split: "val" target_checkpoint: 5000 components: - component_name: simple data_type: "weights" layer_suffixes: "attention.o_proj" layers: [0, 1, 2] monitoring: output_dir: "analysis_results" save_to_wandb: true wandb: entity: "pico-lm" project: "pico-analysis"
-
Run the Analysis
poetry run analyze \ --config_path configs/my_analysis_config.yaml \ --repo_id pico-lm/pico-decoder-small \ --branch pico-decoder-small-1
--repo_id
: The Hugging Face repository hosting your checkpoints (e.g.,pico-lm/pico-decoder-small
)--branch
: The repo branch or “revision” (e.g.,pico-decoder-small-1
)- Or use
--run_path
to analyze local checkpoints
-
Review Output
- Results are saved to
analysis_results/my_analysis
- Inspect JSON logs for each step, or open Weights & Biases to see dynamic charts
- Results are saved to
-
Metrics
- Single-checkpoint (e.g., norm, gini, condition_number)
- Comparative (e.g., cka, pwcca)
-
Components
- simple: Directly extracts a single tensor (weights, gradients, or activations)
- ov_circuit: Combines attention value and output projections for interpretability
Add custom metrics or components by registering them in the code:
# src/metrics/custom.py
@register_metric("my_custom_metric")
class MyCustomMetric(BaseMetric):
...
-
Add New Metrics
Create a class inheriting fromBaseMetric
(orBaseComparativeMetric
) and register it with@register_metric(...)
. -
Add New Components
SubclassBaseComponent
to define a new data extraction strategy and register with@register_component(...)
.
- Report issues or request features via GitHub Issues
- We welcome contributions! Feel free to open a Pull Request
Pico Analyze is open-source under the Apache 2.0 License. If you use it in academic or professional work, please cite:
@software{pico2025,
author = {Diehl Martinez, Richard},
title = {Pico: A Lightweight Framework for Studying Language Model Learning Dynamics},
year = {2025},
url = {https://github.com/pico-lm}
}
Happy Analyzing!
Check out our website or star our repos for updates, tutorials, and more on the Pico ecosystem.