📊 Pico Analyze

Pico Analyze is a companion toolkit to pico-train, designed specifically for studying and visualizing the learning dynamics of language models. Whether you want to track activation sparsity, compare layers across checkpoints, or probe the evolution of specific attention heads, Pico Analyze has you covered.

For a detailed run-through, check out the full tutorial on our website at picolm.io.

Key Features

Rich Checkpoint Compatibility
- Seamlessly loads model states, gradients, and activations stored by pico-train
- Automatically handles standard PyTorch and Hugging Face–compatible checkpoints
Modular Analysis System
- Components: Specify which parts of the model (e.g., weights, gradients, activations) to analyze
- Metrics: Apply built-in metrics like CKA, PWCCA, PER, Gini, Hoyer, and more
Deep Learning Dynamics Insights
- Compare multiple checkpoints from different training steps
- Visualize how parameters evolve over time using comprehensive logs or Weights & Biases integration
Config-Driven & Extensible
- Simple YAML config to define which steps, layers, metrics, and components to analyze
- Easily register custom metrics or components by subclassing and decorating with @register_metric or @register_component

Installation

Clone the Repository

git clone https://github.com/pico-lm/pico-analyze
cd pico-analyze

Configure Environment

Create a .env file at the root with your Hugging Face and Weights & Biases tokens:
```
export HF_TOKEN=your_huggingface_token
export WANDB_API_KEY=your_wandb_key
```
Install Dependencies
```
source setup.sh
```
This script checks your environment, installs necessary tools, and sets up a Poetry virtual environment.

Basic Usage

Prepare Your Checkpoints
Make sure you have checkpoints generated by pico-train—either locally or hosted on Hugging Face.

Create an Analysis Config
Define a YAML file specifying:

Which checkpoints to analyze (by step or revision tag)
Which components (weights, activations, gradients)
Which metrics (CKA, Gini, etc.)

# configs/my_analysis_config.yaml

analysis_name: "my_analysis"
steps:
  - 0
  - 1000
  - 5000
metrics:
  - metric_name: cka
    data_split: "val"
    target_checkpoint: 5000
    components:
      - component_name: simple
        data_type: "weights"
        layer_suffixes: "attention.o_proj"
        layers: [0, 1, 2]
monitoring:
  output_dir: "analysis_results"
  save_to_wandb: true
  wandb:
    entity: "pico-lm"
    project: "pico-analysis"

Run the Analysis
```
poetry run analyze \
    --config_path configs/my_analysis_config.yaml \
    --repo_id pico-lm/pico-decoder-small \
    --branch pico-decoder-small-1
```
- --repo_id: The Hugging Face repository hosting your checkpoints (e.g., pico-lm/pico-decoder-small)
- --branch: The repo branch or “revision” (e.g., pico-decoder-small-1)
- Or use --run_path to analyze local checkpoints
Review Output
- Results are saved to analysis_results/my_analysis
- Inspect JSON logs for each step, or open Weights & Biases to see dynamic charts

Configurable Metrics & Components

Metrics
- Single-checkpoint (e.g., norm, gini, condition_number)
- Comparative (e.g., cka, pwcca)
Components
- simple: Directly extracts a single tensor (weights, gradients, or activations)
- ov_circuit: Combines attention value and output projections for interpretability

Add custom metrics or components by registering them in the code:

# src/metrics/custom.py

@register_metric("my_custom_metric")
class MyCustomMetric(BaseMetric):
    ...

Extensibility

Add New Metrics
Create a class inheriting from BaseMetric (or BaseComparativeMetric) and register it with @register_metric(...).
Add New Components
Subclass BaseComponent to define a new data extraction strategy and register with @register_component(...).

Community & Contributions

Report issues or request features via GitHub Issues
We welcome contributions! Feel free to open a Pull Request

License & Citation

Pico Analyze is open-source under the Apache 2.0 License. If you use it in academic or professional work, please cite:

@software{pico2025,
    author = {Diehl Martinez, Richard},
    title = {Pico: A Lightweight Framework for Studying Language Model Learning Dynamics},
    year = {2025},
    url = {https://github.com/pico-lm}
}

Happy Analyzing!
Check out our website or star our repos for updates, tutorials, and more on the Pico ecosystem.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
configs		configs
lib		lib
scripts		scripts
src		src
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📊 Pico Analyze

Key Features

Installation

Basic Usage

Configurable Metrics & Components

Extensibility

Community & Contributions

License & Citation

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

pico-lm/pico-analyze

Folders and files

Latest commit

History

Repository files navigation

📊 Pico Analyze

Key Features

Installation

Basic Usage

Configurable Metrics & Components

Extensibility

Community & Contributions

License & Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages