Concept backpropagation

Corresponding paper: https://arxiv.org/abs/2307.12601

Description

This project implements concept backpropagation, a method for visualising how a given concept (found through standard concept detection) is represented in a probed neural network model. Simply put, for a neural network model, an input sample, and a trained concept probe that indicates the presence of the given concept in one of the intermediary layers of the model, the method aims to find the smallest perturbation that maximises the detection of the concept.

In practice, this means altering chess positions in order to maximise threats on certain pieces, or maximising the "loopiness" of standard MNIST digits.

How do I run it?

This codebase includes routines for the application areas described in the corresponding paper. The chess models and chess-related concept datasets were retrieved from the repository for Explainable Minichess.

All the experiments are provided through standard Python notebooks. They should work for standard versions of Python, with varying versions of scientific Python packages. (i.e. Tensorflow and Numpy) They also require the larq-package.

The main method remains mostly the same in all of the notebooks, and is as such quite simple to adapt to other problems.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
6x6_legality_checker		6x6_legality_checker
6x6_model		6x6_model
concept_datasets		concept_datasets
figures		figures
images		images
.gitignore		.gitignore
README.md		README.md
chess.ipynb		chess.ipynb
fashion_mnist.ipynb		fashion_mnist.ipynb
housing.ipynb		housing.ipynb
mnist_ae.ipynb		mnist_ae.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Concept backpropagation

Description

How do I run it?

About

Releases

Packages

Languages

patrik-ha/concept-backpropagation

Folders and files

Latest commit

History

Repository files navigation

Concept backpropagation

Description

How do I run it?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages