• fast to get a working result
• the inaccurate prediction from supernet degrades the final network performance
The one-shot LaNAS uses a pretrained supernet to predict the performance of a proposed architecture via masking. The following figure illustrates the search procedures.
The training of supernet is same as the regular training except for that we apply a random mask at each iterations.
NASBench-101 has very limited architectures (~420K architectures), which can be easily predicted with some sort of predictor. Supernet can be a great alternative to solve this problem as it renders a search space having 10^21 architectures. Therefore, our supernet can also be used as a benchmark to evaluate different search algorithms. See Fig.6 in LaNAS paper. Please check how LaNAS interacts with supernet, and samples the architecture and its accuracy.
You can skip this step if use our pre-trained supernet.
Our supernet is designed for NASNet search space, and changing it to a new design space requires some work to change the codes. We're working on this issue, will update later. The training of supernet is fairly easy, simply
python train.py
- Training on the ImageNet
Please use the training pipeline from Pytorch-Image-Models. Here we describe the procedures to do so:
- get the supernet model from supernet_train.py, line 94
- go to Pytorch-Image-Models
- find pytorch-image-models/blob/master/timm/models/factory.py, replace line 57 as follows
# model = create_fn(**model_args, **kwargs)
model = our-supernet
You can download the supernet pre-trained by us from here. Place it in the same folder, and start searching with
python train.py
The search results will be written into a results.txt, and you can read the results by
python read_result.py
The program outputs every samples with its test accuracy, e.g.
[[1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]] 81.69 3774
[1.0 .. 0.0] is the architecture encoding, which can be used to train a network later.
81.69 is the test accuracy predicted from supernet via weight sharing.
3774 means this is the 3774th sample.
Once you pick a network after reading the results, you can train the network in the Evaluate folder.
cd Evaluate
#attention, you need supply the code of target architecture in the argument of masked_code
python super_individual_train.py --cutout --auxiliary --batch_size=16 --init_ch=36 --masked_code='[1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0]'
Though one-shot NAS substantially reduces the computation cost by training only one supernet, to approximate the performance of every architecture in the search space via weight-sharing. However, the performance estimation can be very inaccurate due to the co-adaption among operations. Recently, we propose few-shot NAS that uses multiple supernetworks, called sub-supernet, each covering different regions of the search space to alleviate the undesired co-adaption. Since each sub-supernet only covers a small search space, compared to one-shot NAS, few-shot NAS improves the accuracy of architecture evaluation with a small increase of evaluation cost. Please see the following paper for details.
Few-shot Neural Architecture Search
in submission
Yiyang Zhao (WPI), Linnan Wang (Brown), Yuandong Tian (FAIR), Rodrigo Fonseca (Brown), Tian Guo (WPI)
To Evaluate Few-shot NAS, please check this repository. The following figures show the performance improvement of few-shot NAS.
These figures basically tell you few-shot NAS is an effective trade-off between one-shot NAS and vanilla NAS, i.e. training from scratch that retains both good performance estimation of a network and the fast speed.