Causal-Copilot: An Autonomous Causal Analysis Agent

Introduction

Understanding causal relationships is fundamental to scientific discovery, enabling researchers to move beyond mere correlation and establish the underlying mechanisms that drive natural and social phenomena. Recent years have witnessed significant theoretical advancements in causal discovery, yielding a diverse array of sophisticated methodologies. However, the complexity of these methods—each with its distinct assumptions, applicability conditions, and technical nuances—has created substantial barriers for scientists outside the field of causal analysis, often deterring them from adopting these powerful analytical tools in their research.

Causal-Copilot is a LLM-oriented toolkit for automatic causal analysis that uniquely integrates domain knowledge from large language models with established expertise from causal discovery researchers. Designed for scientific researchers and data scientists, it facilitates the identification, analysis, and interpretation of causal relationships within real-world datasets through natural dialogue. The system autonomously orchestrates the entire analytical pipeline-analyzing statistics, selecting optimal causal analysis algorithms, configuring appropriate hyperparameters, synthesizing executable code, conducting uncertainty quantification, and generating comprehensive PDF reports—while requiring minimal expertise in causal methods. This seamless integration of conversational interaction and rigorous methodology culminates enables researchers across disciplines to focus on domain-specific insights rather than technical implementation details.

🔍 Try out our interactive demo: Causal-Copilot Live Demo

Demo

Video Demo

Report Examples

We provide some examples of our system automatically generated reports for open-source datasets generated as follows:

Features

Automated Causal Analysis: Harnesses the power of large language models combined with domain expertise to select optimal causal analysis algorithms and hyperparameters. Incorporates proven methodological insights from causal discovery researchers to ensure the analytical reliability, without the requirements in expertise about causality and extensive parameter tuning.
Statistical-LLM Hybrid Post Processing: Present the edge uncertainty examination (bootstrap), as well as graph pruning and direction revision driven by LLM's prior knowledge.
Chat-based User-friendly Interface: Navigate complex causal analysis through natural dialogue, and visualize data statistics and causal graphs through clear, intuitive figures, without wrestling with technical details.
Comprehensive Analysis Report: Provide well-formulated scientific report for the whole causal analysis process, containing detailed explanation documenting the complete analytical process, intuitive visualization and in-depth interpretation of the findings.
Extensibility: Maintain open interfaces for integrating new causal analysis algorithms and support seamless incorporation of emerging causality-related libraries and methodologies

Architecture Details

Our Causal-Copilot consists of four components, namely preprocessing, decision making, post processing and intepretation parts, which are all supported by SOTA LLMs (e.g., GPT-4o, GPT-4o-mini).

Evaluation on Simulated Data

We evaluate the automatic causal discovery ability of our Causal-Copilot on in total 180 simulated datasets including different types of functional forms, graph sparsity, noise types and heterogeneity, compared with a robust baseline, PC algorithm with the default setting.
The results show that our Causal-Copilot can achieve much better performance, indicating the effectiveness of its automatic algorithm selection and hyper-parameter setting strategy, in a autonomous manner.

Metric	Baseline	Causal-Copilot
Precision	78.6%	81.6%
Recall	78.2%	81.0%
F1-score	76.1%	79.3%

Getting Started

Online Demo

🔍 Try out our interactive demo: Causal-Copilot Live Demo

Local Deployment

Python 3.8+
Required Python libraries (specified in requirements.txt)

Ensure you have the necessary dependencies installed by running:

pip install -r requirements.txt

Usage

python main.py --data_file your_data --apikey your_openai_apikey --initial_query your_user_query

License

Distributed under the MIT License. See LICENSE for more information.

Resource

Our codes for causal discovery are from the causal-learn and CausalNex projects, currently including PC, FCI, CDNOD, GES, NOTEARS, DirectLiNGAM, ICALiNGAM
Our PDF template is based on this overleaf project
Our example datasets are from Bioinformatics-Abalone, Architecture-CCS, Bioinformatics-Sachs
Our codes for deployment are from Gradio

Contributor

Xinyue Wang*, Kun Zhou*(Equal Contribution), Wenyi Wu, Fang Nan, Shivam Singh, Biwei Huang

Contact

For additional information, questions, or feedback, please contact ours at [email protected], [email protected], [email protected], [email protected] and [email protected]. We welcome contributions! Come and join us now!

If you use Causal-Copilot in your research, please cite it as follows:

@inproceedings{causalcopilot,
  title={Causal-Copilot: An Autonomous Causal Analysis Agent},
  author={Wang, Xinyue and Zhou, Kun and Wu, Wenyi and Nan, Fang and Huang, Biwei},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 280 Commits
.gradio		.gradio
Gradio		Gradio
algorithm		algorithm
asset		asset
causal-learn @ 42d5588		causal-learn @ 42d5588
data/simulation		data/simulation
dataset		dataset
global_setting		global_setting
postprocess		postprocess
preprocess		preprocess
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
main.py		main.py
offline_test_generation.py		offline_test_generation.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Causal-Copilot: An Autonomous Causal Analysis Agent

Introduction

Demo

Video Demo

Report Examples

Table of Contents

Features

Architecture Details

Evaluation on Simulated Data

Getting Started

Online Demo

Local Deployment

Usage

License

Resource

Contributor

Contact

About

Releases

Packages

Contributors 5

Languages

Lancelot39/Causal-Copilot

Folders and files

Latest commit

History

Repository files navigation

Causal-Copilot: An Autonomous Causal Analysis Agent

Introduction

Demo

Video Demo

Report Examples

Table of Contents

Features

Architecture Details

Evaluation on Simulated Data

Getting Started

Online Demo

Local Deployment

Usage

License

Resource

Contributor

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages