This is the official repository of TwiBot-22 @ NeurIPS 2022, Datasets and Benchmarks Track. This dataset is collected from the Twitter website before 2022.
TwiBot-22 is the largest and most comprehensive Twitter bot detection benchmark to date. Specifically, TwiBot-22 is designed to address the challenges of limited dataset scale, imcomplete graph structure, and low annotation quality in previous datasets. For more details, please refer to the TwiBot-22 paper and statistics.
Each dataset contains node.json
(or tweet.json
, user.json
, list.json
, and hashtag.json
for TwiBot-22), label.csv
, split.csv
and edge.csv
(for datasets with graph structure). See here for a detailed description of these files.
TwiBot-22 is available at Google Drive.
Please apply for access by contacting shangbin at cs.washington.edu with your institutional email address and clearly state your institution, your research advisor (if any), and your use case of TwiBot-22.
For TwiBot-20, visit the TwiBot-20 github repository.
For other datasets, please visit the Bot Repository.
After downloading these datasets, you can transform them into the 4-file format detailed in "Dataset Format". Alternatively, you can directly download our preprocessed version:
For TwiBot-20, visit the TwiBot-20 github repository, apply for TwiBot-20 access, and there will be a TwiBot-20-Format22.zip
in the TwiBot-20 Google Drive link.
For other datasets, you can directly download them from Google Drive. You should adhere to the license of each dataset, the "Content redistribution" section of the Twitter Developer Agreement and Policy, the rules set by the Bot Repository, and only use these datasets for research purposes.
- pip:
pip install -r requirements.txt
- conda :
conda install --yes --file requirements.txt
- clone this repo by running
git clone https://github.com/LuoUndergradXJTU/TwiBot-22.git
- make dataset directory
mkdir datasets
and download datasets to./datasets
- change directory to
src/{name_of_the_baseline}
- run experiments under the guidance of corresponding
readme.md
baseline | paper | acc on Twibot-22 | f1 on Twibot-22 | type | tags |
---|---|---|---|---|---|
Abreu et al. | link | 0.7066 | 0.5344 | F | random forest |
Alhosseini et al. | link | 0.4772 | 0.3810 | F G | gcn |
BGSRD | link | 0.7188 | 0.2114 | F | BERT GAT |
Bot Hunter | link | 0.7279 | 0.2346 | F | random forest |
Botometer | link | 0.4987 | 0.4257 | F T G | |
BotRGCN | link | 0.7966 | 0.5750 | F T G | BotRGCN |
Cresci et al. | link | - | - | T | DNA |
Dehghan et al. | link | - | - | F T G | Graph |
Efthimion et al. | link | 0.7408 | 0.2758 | F T | efthimion |
EvolveBot | link | 0.7109 | 0.1409 | F T G | random forest |
FriendBot | link | - | - | F T G | random forest |
Kipf et al. | link | 0.7839 | 0.5496 | F T G | Graph Neural Network |
Velickovic et al. | link | 0.7948 | 0.5586 | F T G | Graph Neural Network |
GraphHist | link | - | - | F T G | random forest |
Hayawi et al. | link | 0.7650 | 0.2474 | F | lstm |
HGT | link | 0.7491 | 0.3960 | F T G | Graph Neural Networks |
SimpleHGN | link | 0.7672 | 0.4544 | F T G | Graph Neural Networks |
Kantepe et al. | link | 0.7640 | 0.5870 | F T | random forest |
Knauth et al. | link | 0.7125 | 0.3709 | F T G | random forest |
Kouvela et al. | link | 0.7644 | 0.3003 | F T | random forest |
Kudugunta et al. | link | 0.6587 | 0.5167 | F | SMOTENN, random forest |
Lee et al. | link | 0.7628 | 0.3041 | F T | random forest |
LOBO | link | 0.7570 | 0.3857 | F T | random forest |
Miller et al. | link | 0.3037 | 0.4529 | F T | k means |
Moghaddam et al. | link | 0.7378 | 0.3207 | F G | random forest |
NameBot | link | 0.7061 | 0.0050 | F | Logistic Regression |
RGT | link | 0.7647 | 0.4294 | F T G | Graph Neural Networks |
RoBERTa | link | 0.7207 | 0.2053 | F T | RoBERTa |
Rodriguez-Ruiz | link | 0.4936 | 0.5657 | F T G | SVM |
Santos et al. | link | - | - | F T | decision tree |
SATAR | link | - | - | F T G | |
SGBot | link | 0.7508 | 0.3659 | F T | random forest |
T5 | link | 0.7205 | 0.2027 | T | T5 |
Varol et al. | link | 0.7392 | 0.2754 | F T | random forest |
Wei et al. | link | 0.7020 | 0.5360 | T |
where -
represents the baseline could not scale to TwiBot-22 dataset
Precision | Botometer-feedback-2019 | Cresci-2015 | Cresci-2017 | Cresci-rtbust-2019 | Cresci-stock-2018 | Gilani-2017 | Midterm-2018 | Twibot-20 | Twibot-22 |
---|---|---|---|---|---|---|---|---|---|
Abreu et al. | 63.63 |
99.05 |
98.34 |
78.57 |
75.45 |
76.82 |
97.28 |
72.20 |
50.92 |
Alhosseini et al. | - | 87.69 |
- | - | - | - | - | 57.81 |
29.99 |
BGSRD | 27.50 |
86.52 |
75.85 |
58.13 |
52.78 |
25.43 |
84.40 |
67.64 |
22.55 |
BotHunter | - | 98.55 |
98.65 |
81.92 |
84.29 |
78.99 |
99.44 |
72.77 |
68.09 |
Botometer | 21.05 - |
50.54 - |
93.35 - |
65.22 - |
68.50 - |
62.99 - |
31.18 - |
55.67 - |
30.81 - |
BotRGCN | - | 95.51 |
- | - | - | - | - | 84.52 |
74.81 |
Cresci | - | 0.59 - |
12.96 - |
- | - | - | - | 7.66 - |
- |
Dehghan et al. | - | 96.15 |
- | - | - | - | - | 94.72 |
- |
Efthimion et al. | 0.00 |
93.82 |
94.58 |
68.29 |
82.75 |
37.50 |
98.01 |
64.20 |
77.78 |
EvolveBot | - | 85.03 |
- | - | - | - | - | 66.93 |
56.38 |
FriendBot | - | 95.29 |
77.55 |
- | - | - | - | 72.64 |
- |
GCN | - | 95.59 |
- | - | - | - | - | 75.23 |
71.19 |
GAT | - | 96.10 |
- | - | - | - | - | 81.39 |
76.23 |
GraphHist | - | 73.12 |
- | - | - | - | - | 51.27 |
- |
Hayawi et al. | 25.00 |
92.96 |
95.47 |
48.82 |
50.73 |
51.44 |
85.30 |
71.61 |
80.00 |
HGT | - | 94.80 |
- | - | - | - | - | 85.55 |
68.22 |
SimpleHGN | - | 95.68 |
- | - | - | - | - | 84.76 |
72.57 |
Kantepe et al. | - | 81.30 |
83.00 |
- | - | - | - | 63.40 |
78.60 |
Knauth et al. | 57.41 |
85.70 |
91.56 |
57.41 |
99.89 |
35.17 |
99.91 |
96.56 |
- |
Kouvela et al. | 48.00 |
99.54 |
99.24 |
82.27 |
82.17 |
79.69 |
97.56 |
79.33 |
69.30 |
Kudugunta et al. | 56.67 |
100.0 |
98.53 |
66.09 |
54.87 |
85.44 |
99.06 |
80.40 |
44.31 |
Lee et al. | 58.97 |
98.65 |
99.56 |
79.37 |
84.75 |
77.58 |
97.36 |
76.60 |
67.23 |
LOBO | - | 98.47 |
99.30 |
- | - | - | - | 74.83 |
75.43 |
Miller et al. | 0.00 |
72.07 |
77.21 |
52.17 |
54.78 |
48.89 |
83.85 |
60.71 |
29.46 |
Moghaddam et al. | - | 98.33 |
- | - | - | - | - | 72.29 |
67.61 |
NameBot | 45.45 |
76.81 |
80.39 |
65.00 |
58.34 |
58.21 |
86.93 |
58.72 |
67.73 |
RGT | - | 96.38 |
- | - | - | - | - | 85.15 |
75.03 |
RoBERTa | - | 97.58 |
92.43 |
- | - | - | - | 73.88 |
63.28 |
Rodriguez-Ruiz et al. | - | 78.64 |
79.47 |
- | - | - | - | 61.60 |
33.23 |
Santos et al. | 50.00 |
72.86 |
81.71 |
75.68 |
65.39 |
32.26 |
88.05 |
62.73 |
- |
SATAR | - | 90.66 |
- | - | - | - | - | 81.50 |
- |
SGBot | 59.70 |
99.45 |
98.26 |
83.08 |
83.90 |
82.68 |
99.35 |
76.40 |
73.11 |
T5 | - | 91.04 |
94.48 |
- | - | - | - | 72.19 |
63.27 |
Varol et al. | - | 92.22 |
- | - | - | - | - | 78.04 |
75.74 |
Wei et al. | - | 91.70 |
85.90 |
- | - | - | - | 61.00 |
62.70 |
Recall | Botometer-feedback-2019 | Cresci-2015 | Cresci-2017 | Cresci-rtbust-2019 | Cresci-stock-2018 | Gilani-2017 | Midterm-2018 | Twibot-20 | Twibot-22 |
---|---|---|---|---|---|---|---|---|---|
Abreu et al. | 46.66 |
62.13 |
91.97 |
89.18 |
75.67 |
58.87 |
98.63 |
82.81 |
11.73 |
Alhosseini et al. | - | 97.16 |
- | - | - | - | - | 95.69 |
56.75 |
BGSRD | 8.57 |
95.56 |
100.0 |
35.14 |
70.40 |
60.00 |
97.66 |
73.19 |
19.90 |
BotHunter | - | 91.48 |
85.40 |
83.02 |
79.92 |
62.29 |
99.66 |
86.75 |
14.07 |
Botometer | 57.14 - |
98.95 - |
99.69 - |
100.0 - |
94.96 - |
89.91 - |
87.88 - |
50.82 - |
69.80 - |
BotRGCN | - | 99.17 |
- | - | - | - | - | 90.19 |
46.80 |
Cresci | - | 66.67 - |
95.30 - |
- | - | - | - | 67.47 - |
- |
Dehghan et al. | - | 83.88 |
- | - | - | - | - | 82.19 |
- |
Efthimion et al. | 0.00 |
94.38 |
89.23 |
75.68 |
58.02 |
2.80 |
94.04 |
70.63 |
16.76 |
EvolveBot | - | 95.83 |
- | - | - | - | - | 72.81 |
8.04 |
FriendBot | - | 100.0 |
100.0 |
- | - | - | - | 88.94 |
- |
GCN | - | 98.81 |
- | - | - | - | - | 87.62 |
44.80 |
GAT | - | 99.11 |
- | - | - | - | - | 89.53 |
44.12 |
GraphHist | - | 100.0 |
- | - | - | - | - | 99.05 |
- |
Hayawi et al. | 17.78 |
79.31 |
92.19 |
81.25 |
71.16 |
28.00 |
98.64 |
83.50 |
14.99 |
HGT | - | 99.11 |
- | - | - | - | - | 91.00 |
28.03 |
SimpleHGN | - | 99.29 |
- | - | - | - | - | 92.06 |
32.90 |
Kantepe et al. | - | 75.30 |
76.10 |
- | - | - | - | 61.00 |
46.80 |
Knauth et al. | 59.09 |
97.40 |
95.35 |
51.24 |
88.83 |
44.00 |
83.99 |
76.30 |
- |
Kouvela et al. | 20.00 |
96.79 |
98.98 |
80.00 |
78.78 |
57.20 |
98.92 |
95.17 |
19.17 |
Kudugunta et al. | 45.33 |
60.95 |
85.88 |
50.67 |
47.54 |
35.14 |
90.24 |
33.47 |
61.98 |
Lee et al. | 44.00 |
98.46 |
99.13 |
86.45 |
80.30 |
60.19 |
98.37 |
83.66 |
19.65 |
LOBO | - | 99.05 |
96.13 |
- | - | - | - | 87.81 |
25.91 |
Miller et al. | 0.00 |
100.0 |
99.11 |
37.50 |
58.89 |
77.19 |
99.81 |
97.44 |
97.89 |
Moghaddam et al. | - | 59.23 |
- | - | - | - | - | 84.38 |
21.02 |
NameBot | 33.33 |
91.12 |
91.79 |
70.27 |
64.13 |
36.45 |
96.82 |
70.47 |
0.03 |
RGT | - | 99.23 |
- | - | - | - | - | 91.06 |
30.10 |
RoBERTa | - | 94.11 |
96.27 |
- | - | - | - | 72.38 |
12.27 |
Rodriguez-Ruiz et al. | - | 99.11 |
92.88 |
- | - | - | - | 98.75 |
81.32 |
Santos et al. | 13.33 |
85.80 |
84.40 |
75.68 |
64.95 |
9.35 |
97.24 |
58.13 |
- |
SATAR | - | 99.88 |
- | - | - | - | - | 91.22 |
- |
SGBot | 45.33 |
63.67 |
90.86 |
81.62 |
81.03 |
63.62 |
99.66 |
94.91 |
24.32 |
T5 | - | 87.71 |
90.26 |
- | - | - | - | 69.05 |
12.09 |
Varol et al. | - | 97.40 |
- | - | - | - | - | 84.37 |
16.83 |
Wei et al. | - | 75.30 |
72.10 |
- | - | - | - | 54.00 |
46.80 |
F1 | Botometer-feedback-2019 | Cresci-2015 | Cresci-2017 | Cresci-rtbust-2019 | Cresci-stock-2018 | Gilani-2017 | Midterm-2018 | Twibot-20 | Twibot-22 |
---|---|---|---|---|---|---|---|---|---|
Abreu et al. | 53.84 |
76.36 |
95.04 |
83.54 |
76.93 |
66.66 |
97.95 |
77.14 |
53.44 |
Alhosseini et al. | - | 92.17 |
- | - | - | - | - | 72.07 |
38.10 |
BGSRD | 13.03 |
90.80 |
86.27 |
41.08 |
58.18 |
35.72 |
90.50 |
70.05 |
21.14 |
BotHunter | 49.57 |
97.22 |
91.60 |
82.90 |
82.17 |
69.18 |
99.59 |
79.09 |
23.46 |
Botometer | 30.77 - |
66.90 - |
96.12 - |
78.95 - |
79.59 - |
77.39 - |
46.03 - |
53.13 - |
42.75 - |
BotRGCN | - | 97.30 |
- | - | - | - | - | 87.25 |
57.50 |
Cresci | - | 1.17 - |
22.81 - |
- | - | - | - | 13.69 - |
- |
Dehgan | - | 88.34 |
- | - | - | - | - | 76.20 |
- |
Efthimion et al. | 0.00 |
94.10 |
91.83 |
71.79 |
68.21 |
05.22 |
95.98 |
67.26 |
27.58 |
EvolveBot | - | 90.07 |
- | - | - | - | - | 69.75 |
14.09 |
FriendBot | - | 97.58 |
87.35 |
- | - | - | - | 79.97 |
- |
GCN | - | 97.17 |
- | - | - | - | - | 80.86 |
54.96 |
GAT | - | 97.58 |
- | - | - | - | - | 85.25 |
55.86 |
GraphHist | - | 84.47 |
- | - | - | - | - | 67.56 |
- |
Hayawi et al. | 20.49 |
85.56 |
93.78 |
60.87 |
60.75 |
34.67 |
91.48 |
77.05 |
24.74 |
HGT | - | 96.93 |
- | - | - | - | - | 88.19 |
39.60 |
SimpleHGN | - | 97.28 |
- | - | - | - | - | 88.25 |
45.44 |
Kantepe et al. | - | 78.17 |
79.41 |
- | - | - | - | 62.23 |
58.71 |
Knauth et al. | 41.27 |
91.18 |
93.42 |
54.15 |
94.03 |
39.10 |
91.26 |
85.24 |
37.09 |
Kouvela et al. | 28.10 |
98.15 |
99.11 |
81.10 |
80.44 |
66.57 |
98.23 |
86.53 |
30.03 |
Kudugunta et al. | 49.61 |
75.74 |
91.74 |
49.22 |
50.94 |
49.75 |
94.45 |
47.26 |
51.67 |
Lee et al. | 50.34 |
98.56 |
99.35 |
82.74 |
82.46 |
67.78 |
97.87 |
79.98 |
30.41 |
LOBO | - | 98.76 |
97.69 |
- | - | - | - | 80.80 |
38.57 |
Miller et al. | 0.00 |
83.77 |
86.80 |
43.64 |
56.76 |
59.86 |
91.14 |
74.81 |
45.29 |
Moghaddam et al. | - | 73.93 |
- | - | - | - | - | 77.87 |
32.07 |
NameBot | 38.46 |
83.36 |
85.71 |
67.53 |
61.10 |
44.83 |
91.61 |
65.06 |
0.50 |
RGT | - | 97.78 |
- | - | - | - | - | 88.01 |
42.94 |
RoBERTa | - | 95.86 |
94.30 |
- | - | - | - | 73.09 |
20.53 |
Rodriguez-Ruiz et al. | - | 87.70 |
85.65 |
- | - | - | - | 63.10 |
56.57 |
Santos et al. | 21.05 |
78.80 |
83.03 |
75.68 |
65.17 |
14.49 |
92.42 |
60.34 |
- |
SATAR | - | 95.05 |
- | - | - | - | - | 86.07 |
- |
SGBot | 49.60 |
77.91 |
94.61 |
82.26 |
82.34 |
72.10 |
99.52 |
84.90 |
36.59 |
T5 | - | 89.35 |
92.32 |
- | - | - | - | 70.57 |
20.27 |
Varol et al. | - | 94.73 |
- | - | - | - | - | 81.08 |
27.54 |
Wei et al. | - | 82.65 |
78.43 |
- | - | - | - | 57.33 |
53.61 |
model | Acc | F1 | precision | recall |
---|---|---|---|---|
Moghaddam et al. | 89.41 |
24.98 |
16.57 |
50.79 |
SGBot | 91.87 |
47.43 |
76.16 |
34.48 |
BotHunter | 91.44 |
40.39 |
78.28 |
27.24 |
GAT | 91.14 |
47.00 |
64.83 |
36.95 |
BotRGCN | 88.74 |
65.89 |
79.82 |
56.23 |
RGT | 92.8 |
23.39 |
58.33 |
14.66 |
model | Acc | F1 | precision | recall |
---|---|---|---|---|
Moghaddam et al. | 83.93 |
18.49 |
11.58 |
45.94 |
SGBot | 84.72 |
26.00 |
54.55 |
17.11 |
BotHunter | 85.63 |
23.38 |
73.67 |
13.95 |
GAT | 84.93 |
30.47 |
55.64 |
21.05 |
BotRGCN | 85.59 |
55.45 |
67.45 |
47.17 |
RGT | 87.1 |
38.02 |
58.50 |
28.57 |
model | Acc | F1 | precision | recall |
---|---|---|---|---|
Moghaddam et al. | 87.61 |
22.34 |
14.48 |
49.00 |
SGBot | 89.52 |
38.96 |
68.97 |
27.18 |
BotHunter | 89.53 |
33.77 |
76.62 |
21.66 |
GAT | 89.09 |
40.58 |
61.84 |
30.28 |
BotRGCN | 87.92 |
59.46 |
76.88 |
48.66 |
RGT | 89.6 |
26.89 |
56.49 |
18.05 |
Please cite TwiBot-22 if you use the TwiBot-22 dataset or this repository
@inproceedings{fengtwibot,
title={TwiBot-22: Towards Graph-Based Twitter Bot Detection},
author={Feng, Shangbin and Tan, Zhaoxuan and Wan, Herun and Wang, Ningnan and Chen, Zilong and Zhang, Binchi and Zheng, Qinghua and Zhang, Wenqian and Lei, Zhenyu and Yang, Shujie and others},
booktitle={Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track}
}
- New dataset: convert the original data to the TwiBot-22 defined schema.
- New baseline: load well-formatted dataset from the dataset directory and define your model.
Welcome PR!
Feel free to open issues in this repository! Instead of emails, Github issues are much better at facilitating a conversation between you and our team to address your needs. You can also contact Zhaoxuan Tan through tanzhaoxuan at stu.xjtu.edu.cn
.