Skip to content

Accel-NASBench: Sustainable Benchmarking for Accelerator-Aware NAS

License

Notifications You must be signed in to change notification settings

afzalxo/Accel-NASBench

Repository files navigation

Accel-NASBench: Sustainable Benchmarking for Accelerator-aware NAS

TLDR: We propose a compute-efficient approach to NAS benchmark construction for large-scale datasets using training proxies. Using the searched proxy configuration, we build a surrogate NAS benchmark for the ImageNet2012 dataset on the MnasNet search space (Please see the specification of the search space in Appendix B of the paper and here). We also offer inference throughput surrogates for 6 hardware accelerators: Cloud TPUv2 and TPUv3, A100 and RTX3090 GPUs, Xilinx Ultrascale+ ZCU102 and Versal AI Core VCK190 FPGAs, and latency surrogates for the FPGA platforms. The benchmark allows evaluation without model and dataset proxies and can be utilized for benchmarking discrete NAS optimizers.

The XGB surrogates are available on figshare here. They will be automatically downloaded if you run the example.py file using python3 example.py.

To install the requirements, please clone the repository and run the following command inside the cloned directory

pip3 install -r requirements.txt

Then run the example using

python3 example.py

The example file will download the XGB surrogates to anb_models_0_9 directory in the project directory. The surrogates allow evaluation of accuracy, throughput on the 6 accelerators, and latency on the FPGAs. Search space sample can be manually specified using

from configurationspaces.searchspaces import EfficientNetSS as ss
# Create search space instance
search_space = ss()
# Specify sample instance
test_sample_man = search_space.manual_sample(
    [
        [1, 6, 6, 6, 6, 6, 6],  # Expansion Factor for the 7 blocks
        [3, 3, 5, 3, 5, 5, 3],  # Kernel Sizes
        [1, 2, 2, 3, 3, 4, 1],  # Number of Layers in block
        [True, True, True, True, True, True, True],   # Squeeze-Excite state
    ]
)

or sample 4 random architectures using

test_samples_rand = search_space.random_sample(4)

The accuracy surrogate instance can be created using anb.ANBEnsemble('xgb'). The throughput surrogate instance can be created as follows

ensemble_inst_thr = anb.ANBEnsemble("xgb", device="tpuv2", metric="throughput")

The supported devices and their corresponding metrics are as follows:

supported_metrics = {
            "3090": ["throughput"],
            "a100": ["throughput"],
            "tpuv2": ["throughput"],
            "tpuv3": ["throughput"],
            "zcu102": ["latency", "throughput"],
            "vck190": ["latency", "throughput"],
        }

A possible result of running example.py is follows. The result will be different each run owing to the random sampling of architecture. For the manually specified sample, the results would be the same each run:

Mean Accuracy: [51.687283 65.736916]  # [Acc of sample 1, Acc of sample 2]
Std Acc: [0.13972819 0.1365913 ]
Mean Throughput: [1320.735   883.5296]  # [Throughput of sample 1, of sample 2] in images/sec
Std Thr: [7.4857635 8.001069 ]  # Standard deviation in throughput is measured in images/sec

Since we passed two samples to .query methods, we get their corresponding results in arrays, first element of the array corresponds to the result of the first sample.

Accel-NASBench dataset

Dataset utilized to train the surrogates is provided in json format here similar to that used by NASBench-301. Please see a sample result_x.json file to understand its different fields. Each result_x.json file contains architecture specification, accuracy, train time, and all on-device throughput/latency measurement mean and standard deviations. We train the surrogates using mean throughput/latency values. Accuracy is evaluated only at a single seed.

Fit surrogates using the dataset

Although we provide the surrogate models, they can be trained manually using the dataset provided. Please follow the following steps in order to train the surrogates.

  1. Download and extract the anb_dataset_jsons.tar.gz archive.
  2. Take note of the root directory of the extracted dataset.
  3. Run the following command inside the project directory to fit the accuracy XGB surrogate on random train/val/test splits of 0.8/0.1/0.1 ratio
python3 fit_model.py --dataset_root <path/to/extracted/dataset/> --model xgb --model_config_path ./configs/model_configs/gradient_boosting/xgb_configspace.json --data_config_path configs/data_configs/nb_fpga.json --log_dir experiments/ --seed <seed>

Example result of the above command using seed=3 is as follows but will be different each run:

train metrics: {'mae': 0.20502309085633813, 'mse': 0.06897010112105674, 'rmse': 0.2626215930213217, 'r2': 0.9937321545409954, 'kendall_tau': 0.9485453372279331, 'kendall_tau_2_dec': 0.9490165549382614, 'kendall_tau_1_dec': 0.9521390453390867, 'spearmanr': 0.9963628656408133}
valid metrics: {'mae': 0.32042279419469843, 'mse': 0.19266443301611308, 'rmse': 0.43893556818297724, 'r2': 0.9827909607086215, 'kendall_tau': 0.9246529281557777, 'kendall_tau_2_dec': 0.9252142975144962, 'kendall_tau_1_dec': 0.9283951048238933, 'spearmanr': 0.991347023504538}
test metrics {'mae': 0.3169779104452133, 'mse': 0.16787281044461427, 'rmse': 0.4097228458905047, 'r2': 0.9839661171659461, 'kendall_tau': 0.9170514701750982, 'kendall_tau_2_dec': 0.9175674110264382, 'kendall_tau_1_dec': 0.9204615872718475, 'spearmanr': 0.98994312462432}

To fit the throughput XGB surrogate for ZCU102 FPGA on random train/val/test splits:

python3 fit_model.py --dataset_root <path/to/extracted/dataset/> --model xgb_accel --device zcu102 --metric throughput --model_config_path ./configs/model_configs/gradient_boosting/xgb_accel_zcu102_throughput_configspace.json --data_config_path configs/data_configs/nb_fpga.json --log_dir experiments/ --seed <seed>

When fitting surrogates for throughput/latency, use models <xgb/lgb/sklearn_forest/svr/svr_nu>_accel, combined with --device <zcu102/vck190/tpuv2/tpuv3/a100/3090> and --metric <throughput/latency>. Throughput is supported by all 6 devices while latency is supported by only the FPGAs. The relevant model config files for model/device/metric are located here.

The dataset splits utilized in this work were generated using the create_data_splits.py file. The splits are located in configs directory here. Please place the dataset inside a directory structure specified inside the splits json files when training on the manual splits rather than random splits.

Hyperparameter Optimization

The hyperparameters of the surrogates were optimized using SMAC3. Plase see do_hpo.py and shell/hpo_all.sh for details and file an issue if face an issue trying to perform HPO.

The searched hyperparameters for various device/metric pairs can be found here.

ANB Evaluation

Please see anb-eval

Dataset Collection Pipelines

Owing to the complex instrumentation of dataset collection, we have an entire repository that details collection pipelines for accuracy, throughput, and latency. Please see ANB-DatasetCollection. Please note that collection of throughput/latency requires specialized hardware such as TPUs and FPGAs.

Acknowledgements

This project was supported by Cloud TPUs from Google's TPU Research Cloud (TRC) program. GPU compute supported by CloudLabs and Turing AI Compute Cluster @ HKUST.

This repository builds upon code from the following repositories:

We are grateful to the authors of these repositories for their contributions.

About

Accel-NASBench: Sustainable Benchmarking for Accelerator-Aware NAS

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published