ADGym performs large benchmark and automatical selection of AD design choices, where AD design dimensions/choices are acquired (listed in the following Table) by decoupling the standard AD research pipeline:
Data Augmentation → Data Preprocessing → Network Construction → Network Training
Currently, ADGym is mainly devised for the tabular data.
Each part of the pipeline can be instantiated by multiple components (core components are marked in bold):
Pipeline | Detailed Components | Value |
---|---|---|
Data Augmentation | [Oversampling, SMOTE, Mixup, GAN] | |
Data Preprocessing | [MinMax, Normalization] | |
Network Construction | Network Architecture | [MLP, AutoEncoder, ResNet, FTTransformer] |
Hidden Layers | [[20], [100, 20], [100, 50, 20]] | |
Activation | [Tanh, ReLU, LeakyReLU] | |
Dropout | [0.0, 0.1, 0.3] | |
Initialization | [PyTorch default, Xavier, Kaiming] | |
Network Training | Loss Function | [BCE, Focal, Minus, Inverse, Hinge, Deviation, Ordinal] |
Optimizer | [SGD, Adam, RMSprop] | |
Batch Resampling | [False, True] | |
Epochs | [20, 50, 100] | |
Batch Size | [16, 64, 256] | |
Learning Rate | [1e-2, 1e-3] | |
Weight Decay | [1e-2, 1e-4] |
- For the experimental results of all the components, open the test_ADGym.py and run:
adgym = ADGym(la=5, grid_mode='small', grid_size=1000, suffix='test')
adgym.run()
- For the experimental results of all the current SOTA semi- or supervised models, open the test_SOTA.py and run:
pipeline = RunPipeline(suffix='SOTA', parallel='semi-supervise', mode='nla')
pipeline.run()
pipeline = RunPipeline(suffix='SOTA', parallel='supervise', mode='nla')
pipeline.run()
- For the experimental results of meta classifier (and its counterpart baseline), open the meta.py and run:
# two-stage meta classifier, using meta-feature extractor in MetaOD
run(suffix='', grid_mode='small', grid_size=1000, mode='two-stage')
# end-to-end meta classifier
run(suffix='', grid_mode='small', grid_size=1000, mode='end-to-end')
- iteration_utilities==0.11.0
- metaod==0.0.6
- scikit-learn==0.24
- imbalanced-learn==0.7.0
- torch==1.9.0
- tensorflow==2.8.0
- tabgan==1.2.1
- rtdl==0.0.13
- protobuf==3.20.*
- numpy==1.21.6
- 2022.11.17: run the experiments of current component combinations
- 2022.11.23: add the GAN-based data augmentation method
- 2022.11.25: add the oversampling and SMOTE data augmentation method
- 2022.11.25: add the binary cross entropy loss and focal loss
- 2023.01.04: add the Mixup data augmentation method
- 2023.01.04: add different network initialization methods
- 2023.01.04: add the ordinal loss in PReNet model
- 2023.01.04: revise the labeled anomalies to the number (instead of ratio) of labeled anomalies
- 2023.02.20: restart ADGym
- 2023.02.20: add two baselines: random selection and model selection based on the partially labeled data
- 2023.02.22: provide both two-stage and end-to-end versions of meta predictor
- 2023.02.23: improve the training efficiency in meta classifier
- 2023.02.28: support GPU version of meta predictors and fix some bugs
- 2023.03.01: provide ml-based meta predictor
- 2023.03.01: using the performance rank ratio (instead of performance) as training targets
- 2023.04.23: learning-to-rank + ensemble strategy
- 2023.05.09: add two loss functions for meta predictor
- 2023.05.09: add early-stopping mechanism for meta predictor
- 2023.05.09: add CORAL method for transfer learning in meta features
- 2023.05.13: provide ensembled topk components
- 2023.05.22: fixed the bug that modified the original data during the process of conducting the experiment.
- 2023.05.22: optimized code efficiency
- 2023.05.29: fixed the bug in REPEN model
- 2023.06.01: fixed the bug in data preprocessing of meta predictor (end-to-end mode)
- 2023.06.01: replace the LightGBM by XGBoost (which is faster) for ml-based meta predictor