Skip to content

⚡Super fast and lightweight anchor-free object detection model. 🔥Only 1.8MB and run 97FPS on cellphone🔥

License

Notifications You must be signed in to change notification settings

zhang0557kui/nanodet

 
 

Repository files navigation

NanoDet

GitHub license GitHub release (latest by date)

Super fast and lightweight anchor-free object detection model. Real-time on mobile devices.

  • ⚡Super lightweight: Model file is only 1.8 MB.
  • ⚡Super fast: 97fps(10.23ms) on mobile ARM CPU.
  • 😎Training friendly: Much lower GPU memory cost than other models. Batch-size=80 is available on GTX1060 6G.
  • 😎Easy to deploy: Provide C++ implementation and Android demo based on ncnn inference framework.

NEWS!!!

  • [2021.03.12] Apply the Transformer encoder to NanoDet! Introducing NanoDet-t, which replaces the PAN in NanoDet-m with a TAN(Transformer Attention Net), gets 21.7 mAP(+1.1) on COCO val 2017. Check nanodet-t.yml for more details.

  • [2021.03.03] Update Nanodet-m-416 COCO pretrained model. COCO mAP(0.5:0.95)=23.5. Download in Model Zoo.

  • [2021.02.03] Support EfficientNet-Lite and Rep-VGG backbone. Please check the config folder. Download models in Model Zoo

  • [2021.01.10] NanoDet-g with lower memory access cost, which designed for edge NPU or GPU, is now available! Check config/nanodet-g.yml and download in Model Zoo.

More...

Benchmarks

Model Resolution COCO mAP Latency(ARM 4xCore) FLOPS Params Model Size(ncnn fp16)
NanoDet-m 320*320 20.6 10.23ms 0.72B 0.95M 1.8MB
NanoDet-m 416*416 23.5 16.44ms 1.2B 0.95M 1.8MB
NanoDet-g 416*416 22.9 Not Designed For ARM 4.2B 3.81M 7.7MB
YoloV3-Tiny 416*416 16.6 37.6ms 5.62B 8.86M 33.7MB
YoloV4-Tiny 416*416 21.7 32.81ms 6.96B 6.06M 23.0MB

Note:

  • Performance is measured on Kirin 980(4xA76+4xA55) ARM CPU based on ncnn. You can test latency on your phone with ncnn_android_benchmark.

  • NanoDet mAP(0.5:0.95) is validated on COCO val2017 dataset with no testing time augmentation.

  • YOLO mAP refers from Scaled-YOLOv4: Scaling Cross Stage Partial Network.

  • NanoDet-g is designed for edge NPU, GPU or TPU with high parallel computing power but low memory bandwidth. It has much lower memory access cost than NanoDet-m.


NanoDet is a FCOS-style one-stage anchor-free object detection model which using ATSS for target sampling and using Generalized Focal Loss for classification and box regression. Please refer to these papers for more details.

Fcos: Fully convolutional one-stage object detection

ATSS:Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection

Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection

知乎中文介绍 | QQ交流群:908606542 (答案:炼丹)


Demo

Android demo

android_demo

Android demo project is in demo_android_ncnn folder. Please refer to Android demo guide.

Here is a better implementation 👉 ncnn-android-nanodet

NCNN C++ demo

C++ demo based on ncnn is in demo_ncnn folder. Please refer to Cpp demo guide.

MNN demo

Inference using Alibaba's MNN framework is in demo_mnn folder. Including python and cpp inference code. Please refer to MNN demo guide.

Pytorch demo

First, install requirements and setup NanoDet following installation guide. Then download COCO pretrain weight from here

👉COCO pretrain weight (Google Drive)

  • Inference images
python demo/demo.py image --config CONFIG_PATH --model MODEL_PATH --path IMAGE_PATH
  • Inference video
python demo/demo.py video --config CONFIG_PATH --model MODEL_PATH --path VIDEO_PATH
  • Inference webcam
python demo/demo.py webcam --config CONFIG_PATH --model MODEL_PATH --camid YOUR_CAMERA_ID

Besides, We provide a notebook here to demonstrate how to make it work with PyTorch.


Install

Requirements

  • Linux or MacOS
  • CUDA >= 10.0
  • Python >= 3.6
  • Pytorch >= 1.6
  • experimental support Windows (Notice: Windows not support distributed training before pytorch1.7)

Step

  1. Create a conda virtual environment and then activate it.
 conda create -n nanodet python=3.8 -y
 conda activate nanodet
  1. Install pytorch
conda install pytorch torchvision cudatoolkit=11.1 -c pytorch -c conda-forge
  1. Install requirements
pip install Cython termcolor numpy tensorboard pycocotools matplotlib pyaml opencv-python tqdm pytorch-lightning torchmetrics
  1. Setup NanoDet
git clone https://github.com/RangiLyu/nanodet.git
cd nanodet
python setup.py develop

Model Zoo

NanoDet supports variety of backbones. Go to the config folder to see the sample training config files.

Model Backbone Resolution COCO mAP FLOPS Params Pre-train weight
NanoDet-m ShuffleNetV2 1.0x 320*320 20.6 0.72B 0.95M Download
NanoDet-m-416 ShuffleNetV2 1.0x 416*416 23.5 1.2B 0.95M Download
NanoDet-t (NEW) ShuffleNetV2 1.0x 320*320 21.7 0.96B 1.36M Download
NanoDet-g Custom CSP Net 416*416 22.9 4.2B 3.81M Download
NanoDet-EfficientLite EfficientNet-Lite0 320*320 24.7 1.72B 3.11M Download
NanoDet-EfficientLite EfficientNet-Lite1 416*416 30.3 4.06B 4.01M Download
NanoDet-EfficientLite EfficientNet-Lite2 512*512 32.6 7.12B 4.71M Download
NanoDet-RepVGG RepVGG-A0 416*416 27.8 11.3B 6.75M Download

How to Train

  1. Prepare dataset

    If your dataset annotations are pascal voc xml format, refer to config/nanodet_custom_xml_dataset.yml

    Or convert your dataset annotations to MS COCO format(COCO annotation format details).

  2. Prepare config file

    Copy and modify an example yml config file in config/ folder.

    Change save_path to where you want to save model.

    Change num_classes in model->arch->head.

    Change image path and annotation path in both data->train and data->val.

    Set gpu ids, num workers and batch size in device to fit your device.

    Set total_epochs, lr and lr_schedule according to your dataset and batchsize.

    If you want to modify network, data augmentation or other things, please refer to Config File Detail

  3. Start training

    NanoDet is now using pytorch lightning for training.

    For both single-GPU or multiple-GPUs, run:

    python tools/train.py CONFIG_FILE_PATH

    For Windows users, if you have problems with the new lightning trainer, try to use tools/deprecated/train.py

    follow this...

    For single GPU, run

    python tools/deprecated/train.py CONFIG_FILE_PATH

    For multi-GPU, NanoDet using distributed training. (Notice: Windows not support distributed training before pytorch1.7) Please run

    python -m torch.distributed.launch --nproc_per_node=GPU_NUM --master_port 29501 tools/deprecated/train.py CONFIG_FILE_PATH
  4. Visualize Logs

    TensorBoard logs are saved in save_dir which you set in config file.

    To visualize tensorboard logs, run:

    cd <YOUR_SAVE_DIR>
    tensorboard --logdir ./

How to Deploy

NanoDet provide C++ and Android demo based on ncnn library.

  1. Convert model

    To convert NanoDet pytorch model to ncnn, you can choose this way: pytorch->onnx->ncnn

    To export onnx model, run tools/export_onnx.py.

    python tools/export_onnx.py --cfg_path ${CONFIG_PATH} --model_path ${PYTORCH_MODEL_PATH}

    Then using onnx-simplifier to simplify onnx structure.

    python -m onnxsim ${INPUT_ONNX_MODEL} ${OUTPUT_ONNX_MODEL}

    Run onnx2ncnn in ncnn tools to generate ncnn .param and .bin file.

    After that, using ncnnoptimize to optimize ncnn model.

    If you have quentions about converting ncnn model, refer to ncnn wiki. https://github.com/Tencent/ncnn/wiki

  2. Run NanoDet model with C++

    Please refer to demo_ncnn.

  3. Run NanoDet on Android

    Please refer to android_demo.


Thanks

https://github.com/Tencent/ncnn

https://github.com/open-mmlab/mmdetection

https://github.com/implus/GFocal

https://github.com/cmdbug/YOLOv5_NCNN

https://github.com/rbgirshick/yacs

About

⚡Super fast and lightweight anchor-free object detection model. 🔥Only 1.8MB and run 97FPS on cellphone🔥

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 72.2%
  • C++ 21.6%
  • Java 5.6%
  • CMake 0.6%