Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
124 changes: 46 additions & 78 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,90 +1,42 @@
# GraphNet ![](https://img.shields.io/badge/version-v0.1-brightgreen) ![](https://img.shields.io/github/issues/PaddlePaddle/GraphNet?label=open%20issues) [![](https://img.shields.io/badge/Contribute%20to%20GraphNet-blue)](https://github.com/PaddlePaddle/GraphNet/issues/98)

**GraphNet** is a large-scale dataset of deep learning **computation graphs**, built as a standard benchmark for **tensor compiler** optimization. It provides 2.7K computation graphs extracted from state-of-the-art deep learning models spanning diverse tasks and ML frameworks. With standardized formats and rich metadata, GraphNet enables fair comparison and reproducible evaluation of the general optimization capabilities of tensor compilers, thereby supporting advanced research such as AI for System on compilers (AI for Compiler).
<h1 align="center">GraphNet: A Large-Scale Computational Graph Dataset for Tensor Compiler Research</h1>

<br>
<div align="center">
<img src="/pics/Eval_result.png" alt="Violin plots of speedup distributions" width="65%">
</div>

Compiler developers can use GraphNet samples to evaluate tensor compilers (e.g., CINN, TorchInductor, TVM) on target tasks. The figure above shows the speedup of two compilers (CINN and TorchInductor) across two tasks (CV and NLP).

## 🧱 Dataset Construction

To guarantee the dataset’s overall quality, reproducibility, and cross-compiler compatibility, we define the following construction **constraints**:

1. Computation graphs must be executable in imperative (eager) mode.
2. Computation graphs and their corresponding Python code must support serialization and deserialization.
3. The full graph can be decomposed into two disjoint subgraphs.
4. Operator names within each computation graph must be statically parseable.
5. If custom operators are used, their implementation code must be fully accessible.

### Graph Extraction & Validation

We provide automated extraction and validation tools for constructing this dataset.

<div align="center">
<img src="/pics/graphnet_overview.jpg" alt="GraphNet Architecture Overview" width="65%">
![](https://img.shields.io/badge/version-v0.1-brightgreen)
![](https://img.shields.io/github/issues/PaddlePaddle/GraphNet?label=open%20issues)
[![Documentation](https://img.shields.io/badge/documentation-blue)](./GraphNet_technical_report.pdf)
<a href="https://github.com/user-attachments/assets/125e3494-25c9-4494-9acd-8ad65ca85d03"><img src="https://img.shields.io/badge/微信-green?logo=wechat&amp"></a>
</div>

**Demo: Extract & Validate ResNet‑18**
```bash
git clone https://github.com/PaddlePaddle/GraphNet.git
cd GraphNet

# Set your workspace directory
export GRAPH_NET_EXTRACT_WORKSPACE=/home/yourname/graphnet_workspace/

# Extract the ResNet‑18 computation graph
python graph_net/test/vision_model_test.py

# Validate the extracted graph (e.g. /home/yourname/graphnet_workspace/resnet18/)
python -m graph_net.torch.validate \
--model-path $GRAPH_NET_EXTRACT_WORKSPACE/resnet18/
```

**Illustration: How does GraphNet extract and construct a computation graph sample on PyTorch?**
**GraphNet** is a large-scale dataset of deep learning **computation graphs**, built as a standard benchmark for **tensor compiler** optimization. It provides over 2.7K computation graphs extracted from state-of-the-art deep learning models spanning diverse tasks and ML frameworks. With standardized formats and rich metadata, GraphNet enables fair comparison and reproducible evaluation of the general optimization capabilities of tensor compilers, thereby supporting advanced research such as AI for System on compilers.

## 📣 News
- [2025-10-14] ✨ Our technical report is out: a detailed study of dataset construction and compiler benchmarking, introducing the novel performance metrics Speedup Score S(t) and Error-aware Speedup Score ES(t). [📘 GraphNet: A Large-Scale Computational Graph Dataset for Tensor Compiler Research](./GraphNet_technical_report.pdf)
- [2025-8-20] 🚀 The second round of [open contribution tasks](https://github.com/PaddlePaddle/Paddle/issues/74773) was released. (completed ✅)
- [2025-7-30] 🚀 The first round of [open contribution tasks](https://github.com/PaddlePaddle/GraphNet/issues/44) was released. (completed ✅)
## 📊 Benchmark Results
We evaluate two representative tensor compiler backends, CINN (PaddlePaddle) and TorchInductor (PyTorch), on GraphNet's NLP and CV subsets. The evaluation adopts two quantitative metrics proposed in the [Technical Report](./GraphNet_technical_report.pdf):
- **Speedup Score** S(t) — evaluates compiler performance under varying numerical tolerance levels.
<div align="center">
<img src="/pics/graphnet_sample.png" alt="GraphNet Extract Sample" width="65%">
<img src="/pics/St-result.jpg" alt="Speedup Score S_t Results" width="80%">
</div>

* Source code of custom_op is required **only when** corresponding operator is used in the module, and **no specific format** is required.

**Step 1: graph_net.torch.extract**

Import and wrap the model with `graph_net.torch.extract(name=model_name, dynamic=dynamic_mode)()` is all you need:

```bash
import graph_net

# Instantiate the model (e.g. a torchvision model)
model = ...

# Extract your own model
model = graph_net.torch.extract(name="model_name", dynamic="True")(model)
```

After running, the extracted graph will be saved to: `$GRAPH_NET_EXTRACT_WORKSPACE/model_name/`.

For more details, see docstring of `graph_net.torch.extract` defined in `graph_net/torch/extractor.py`.

**Step 2: graph_net.torch.validate**

To verify that the extracted model meets requirements, we use `graph_net.torch.validate` in CI tool and also ask contributors to self-check in advance:
- **Error-aware Speedup Score** ES(t) — further accounts for runtime and compilation errors.
<div align="center">
<img src="/pics/ESt-result.jpg" alt="Error-aware Speedup Score ES_t Results" width="80%">

```bash
python -m graph_net.torch.validate \
--model-path $GRAPH_NET_EXTRACT_WORKSPACE/model_name
```
</div>

All the **construction constraints** will be examined automatically. After passing validation, a unique `graph_hash.txt` will be generated and later checked in CI procedure to avoid redundant.
## ⚡ Quick Start
This section shows how to evaluate tensor compilers and reproduce benchmark results (for compiler users and developers),
as well as how to contribute new computation graphs (for GraphNet contributors).

## ⚖️ Compiler Evaluation
### ⚖️ Compiler Evaluation

**Step 1: Benchmark**

We use `graph_net.torch.test_compiler` to benchmark GraphNet samples with specific batch and log configurations:
Use graph_net.torch.test_compiler to benchmark GraphNet samples with specific batch and logging configurations:

```bash
# Set your benchmark directory
Expand All @@ -110,8 +62,7 @@ After executing, `graph_net.torch.test_compiler` will:

**Step 2: Generate JSON Record**

This step is to extract information (including failure) from logs in benchmark.
All the information will be saved to multiple `model_compiler.json` files via:
Extract runtime, correctness, and failure information from benchmark logs:

```bash
python -m graph_net.log2json \
Expand All @@ -121,7 +72,7 @@ python -m graph_net.log2json \

**Step 3: Analysis**

After processing, we provide `graph_net.violin_analysis` to generate [violin plot](https://en.m.wikipedia.org/wiki/Violin_plot) and `graph_net.S_analysis` to generate S and ES plot based on the JSON results.
Use `graph_net.violin_analysis` to generate [violin plot](https://en.m.wikipedia.org/wiki/Violin_plot) and `graph_net.S_analysis` to generate S and ES plot based on the JSON results.

```bash
python -m graph_net.violin_analysis \
Expand All @@ -140,7 +91,12 @@ python -m graph_net.S_analysis \

The scripts are designed to process a file structure as `/benchmark_path/category_name/`, and items on x-axis are identified by name of the sub-directories. After executing, several summary plots of result in categories (model tasks, libraries...) will be exported to `$GRAPH_NET_BENCHMARK_PATH`.

## 📌 Roadmap
### 🧱 Construction & Contribution Guide
Want to understand how GraphNet is built or contribute new samples?
Check out the [Construction Guide](./docs/README_contribute.md) for details on the extraction and validation workflow.


## 🚀 Future Roadmap

1. Scale GraphNet to 10K+ graphs.
2. Further annotate GraphNet samples into more granular sub-categories
Expand All @@ -149,7 +105,7 @@ The scripts are designed to process a file structure as `/benchmark_path/categor

**Vision**: GraphNet aims to lay the foundation for AI for Compiler by enabling **large-scale, systematic evaluation** of tensor compiler optimizations, and providing a **dataset for models to learn** and transfer optimization strategies.

## 💬 GraphNet Community
## GraphNet Community

You can join our community via following group chats. Welcome to ask any questions about using and building GraphNet.

Expand All @@ -167,5 +123,17 @@ You can join our community via following group chats. Welcome to ask any questio
</table>
</div>

## 🪪 License
This project is released under the [MIT License](LICENSE).
## License and Acknowledgement

GraphNet is released under the [MIT License](./LICENSE).

If you find this project helpful, please cite:

```bibtex
@misc{li2025graphnet,
title = {GraphNet: A Large-Scale Computational Graph Dataset for Tensor Compiler Research},
author = {Xinqi Li and Yiqun Liu and Shan Jiang and Enrong Zheng and Huaijin Zheng and Wenhao Dai and Haodong Deng and Dianhai Yu and Yanjun Ma},
year = {2025},
url = {https://github.com/PaddlePaddle/GraphNet/blob/develop/GraphNet_technical_report.pdf}
}
```
File renamed without changes.
File renamed without changes.
87 changes: 87 additions & 0 deletions docs/README_contribute.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# Contributing to GraphNet
To guarantee the dataset’s overall quality, reproducibility, and cross-compiler compatibility, we define the following construction **constraints**:

1. Computation graphs must be executable in imperative (eager) mode.
2. Computation graphs and their corresponding Python code must support serialization and deserialization.
3. The full graph can be decomposed into two disjoint subgraphs.
4. Operator names within each computation graph must be statically parseable.
5. If custom operators are used, their implementation code must be fully accessible.

## Graph Extraction & Validation
GraphNet provides automated tools for graph extraction and validation.

<div align="center">
<img src="/pics/graphnet_overview.jpg" alt="GraphNet Architecture Overview" width="65%">
</div>

**Demo: Extract & Validate ResNet‑18**
```bash
git clone https://github.com/PaddlePaddle/GraphNet.git
cd GraphNet

# Set your workspace directory
export GRAPH_NET_EXTRACT_WORKSPACE=/home/yourname/graphnet_workspace/

# Extract the ResNet‑18 computation graph
python graph_net/test/vision_model_test.py

# Validate the extracted graph (e.g. /home/yourname/graphnet_workspace/resnet18/)
python -m graph_net.torch.validate \
--model-path $GRAPH_NET_EXTRACT_WORKSPACE/resnet18/
```

**Illustration – Extraction Workflow**

<div align="center">
<img src="/pics/dataset_composition.png" alt="GraphNet Extract Sample" width="65%">
</div>

* Source code of custom_op is required **only when** corresponding operator is used in the module, and **no specific format** is required.

**Step 1: graph_net.torch.extract**

Wrap the model with the extractor — that’s all you need:

```bash
import graph_net

# Instantiate the model (e.g. a torchvision model)
model = ...

# Extract your own model
model = graph_net.torch.extract(name="model_name", dynamic="True")(model)
```

After running, the extracted graph will be saved to: `$GRAPH_NET_EXTRACT_WORKSPACE/model_name/`.

For more details, see docstring of `graph_net.torch.extract` defined in `graph_net/torch/extractor.py`.

**Step 2: graph_net.torch.validate**

To verify that the extracted model meets requirements, we use `graph_net.torch.validate` in CI tool and also ask contributors to self-check in advance:

```bash
python -m graph_net.torch.validate \
--model-path $GRAPH_NET_EXTRACT_WORKSPACE/model_name
```

All the **construction constraints** will be examined automatically. After passing validation, a unique `graph_hash.txt` will be generated and later checked in CI procedure to avoid redundant.

## 📁 Repository Structure
This repository is organized as follows:

| Directory | Description |
|------------|--------------|
| **graph_net/** | Core module for graph extraction, validation, and benchmarking |
| **paddle_samples/** | Computation graph samples extracted from PaddlePaddle |
| **samples/** | Computation graph samples extracted from PyTorch |
| **docs/** | Technical documents and contributor guides|

Below is the structure of the **graph_net/**:
```text
graph_net/
├─ config/ # Config files, params
├─ paddle/ # PaddlePaddle graph extraction & validation
├─ torch/ # PyTorch graph extraction & validation
├─ test/ # Unit tests and example scripts
└─ *.py # Benchmark & analysis scripts
Binary file added pics/ESt-result.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added pics/St-result.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added pics/dataset_composition.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.