FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer

website | paper

Abstract

Transformer, as an alternative to CNN, has been proven effective in many modalities (e.g., texts and images). For 3D point cloud transformers, existing efforts focus primarily on pushing their accuracy to the state-of-the-art level. However, their latency lags behind sparse convolution-based models (3$\times$ slower), hindering their usage in resource-constrained, latency-sensitive applications (such as autonomous driving). This inefficiency comes from point clouds' sparse and irregular nature, whereas transformers are designed for dense, regular workloads. This paper presents FlatFormer to close this latency gap by trading spatial proximity for better computational regularity. We first flatten the point cloud with window-based sorting and partition points into groups of equal sizes rather than windows of equal shapes. This effectively avoids expensive structuring and padding overheads. We then apply self-attention within groups to extract local features, alternate sorting axis to gather features from different directions, and shift windows to exchange features across groups. FlatFormer delivers state-of-the-art accuracy on Waymo Open Dataset with 4.6x speedup over (transformer-based) SST and 1.4x speedup over (sparse convolutional) CenterPoint. This is the first point cloud transformer that achieves real-time performance on edge GPUs and is faster than sparse convolutional methods while achieving on-par or even superior accuracy on large-scale benchmarks.

Results

All the results are reproducible with this repo. Regrettably, we are unable to provide the pre-trained model weights due to Waymo Dataset License Agreement. Discussions are definitely welcome if you could not obtain satisfactory performances with FlatFormer in your projects.

3D Object Detection (on Waymo validation)

Model	#Sweeps	mAP/H_L1	mAP/H_L2	Veh_L1	Veh_L2	Ped_L1	Ped_L2	Cyc_L1	Cyc_L2
FlatFormer	1	76.1/73.4	69.7/67.2	77.5/77.1	69.0/68.6	79.6/73.0	71.5/65.3	71.3/70.1	68.6/67.5
FlatFormer	2	78.9/77.3	72.7/71.2	79.1/78.6	70.8/70.3	81.6/78.2	73.8/70.5	76.1/75.1	73.6/72.6
FlatFormer	3	79.6/78.0	73.5/72.0	79.7/79.2	71.4/71.0	82.0/78.7	74.5/71.3	77.2/76.1	74.7/73.7

Usage

Prerequisites

The code is built with following libraries:

Python >= 3.6, <3.8
PyTorch >= 1.9, <= 1.10.2
mmcv = 1.4.0
mmdetection = 2.14.0

After installing these dependencies, please run this command to install the codebase:

pip install -v -e .

Dataset Preparation

Please follow the instructions from MMDetection3D to download and preprocess the Waymo Open Dataset. After data preparation, you will be able to see the following directory structure (as is indicated in mmdetection3d):

mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│   ├── waymo
│   │   ├── waymo_format
│   │   │   ├── training
│   │   │   ├── validation
│   │   │   ├── testing
│   │   │   ├── gt.bin
│   │   ├── kitti_format
│   │   │   ├── ImageSets
│   │   │   ├── training
│   │   │   ├── testing
│   │   │   ├── waymo_gt_database
│   │   │   ├── waymo_infos_trainval.pkl
│   │   │   ├── waymo_infos_train.pkl
│   │   │   ├── waymo_infos_val.pkl
│   │   │   ├── waymo_infos_test.pkl
│   │   │   ├── waymo_dbinfos_train.pkl

Training

# multi-gpu training
bash tools/dist_train.sh configs/flatformer/$CONFIG.py 8 --work-dir $CONFIG/ --cfg-options evaluation.pklfile_prefix=./work_dirs/$CONFIG/results evaluation.metric=waymo

Evaluation

# multi-gpu testing
bash tools/dist_test.sh configs/flatformer/$CONFIG.py /work_dirs/$CONFIG/latest.pth 8 --eval waymo

Citation

If FlatFormer is useful or relevant to your research, please kindly recognize our contributions by citing our paper:

@inproceedings{liu2023flatformer,
  title={FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer},
  author={Liu, Zhijian and Yang, Xinyu and Tang, Haotian and Yang, Shang and Han, Song},
  booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2023}
}

Acknowledgments

This project is based on the following codebases.

We would like to thank Tianwei Yin, Lue Fan and Ligeng Mao for providing detailed results of CenterPoint, SST/FSD and VoTr, and Yue Wang and Yukang Chen for their helpful discussions. This work was supported by National Science Foundation, MIT-IBM Watson AI Lab, NVIDIA, Hyundai and Ford. Zhijian Liu was partially supported by the Qualcomm Innovation Fellowship.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
configs		configs
mmdet3d		mmdet3d
tools		tools
.gitignore		.gitignore
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer

website | paper

Abstract

Results

3D Object Detection (on Waymo validation)

Usage

Prerequisites

Dataset Preparation

Training

Evaluation

Citation

Acknowledgments

About

Releases

Packages

Languages

mit-han-lab/flatformer

Folders and files

Latest commit

History

Repository files navigation

FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer

website | paper

Abstract

Results

3D Object Detection (on Waymo validation)

Usage

Prerequisites

Dataset Preparation

Training

Evaluation

Citation

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages