Recently, the proliferation of Dynamic Scale Convolution modules has simplified the feature correspondence between multiple views. Concurrently, Transformers have been proven effective in enhancing the reconstruction of multi-view stereo (MVS) by facilitating feature interactions across views. In this paper, we present CM-MVSNet based on an in-depth study of feature extraction and matching in MVS. By exploring inter-view relationships and measuring the receptive field size and feature information on the image surface through the curvature of the law, our method adapts to various candidate scales of curvature. Consequently, this module outperforms existing networks in adaptively extracting more detailed features for precise cost computation. Furthermore, to better identify inter-view similarity relationships, we introduce a Transformer-based feature matching module. Leveraging Transformer principles, we align features from multiple source views with those from a reference view, enhancing the accuracy of feature matching. Additionally, guided by the proposed curvature-guided dynamic scale convolution and Transformer-based feature matching, we introduce a feature-matching similarity measurement module that tightly integrates curvature and inter-view similarity measurement, leading to improved reconstruction accuracy. Our approach demonstrates advanced performance on the DTU dataset and the Tanks and Temples benchmark. Details are described in our paper and our result:
CT-MVSNet:Curvature-Guided For Multi-View Stereo with Transformers
Licheng Sun, Liang Wang
CT-MVSNet is more robust on the challenge regions and can generate more accurate depth maps. The point cloud is more complete and the details are finer.
If there are any errors in our code, please feel free to ask your questions.
- PyTorch 1.9.1
- Python 3.7
Training Data. We adopt the full resolution ground-truth depth provided in CasMVSNet or MVSNet. Download DTU training data and Depth raw.
Unzip them and put the Depth_raw
to dtu_training
folder. The structure is just like:
dtu_training
├── Cameras
├── Depths
├── Depths_raw
└── Rectified
Testing Data. Download DTU testing data and unzip it. The structure is just like:
dtu_testing
├── Cameras
├── scan1
├── scan2
├── ...
Testing Data. Download Tanks and Temples and
unzip it. Here, we adopt the camera parameters of short depth range version (Included in your download), therefore, you should
replace the cams
folder in intermediate
folder with the short depth range version manually. The
structure is just like:
tanksandtemples
├── advanced
│ ├── Auditorium
│ ├── ...
└── intermediate
├── Family
├── ...
Put model to <your model path>
.
Fusibile installation. Since we adopt Gipuma to filter and fuse the point on DTU dataset, you need to install
Fusibile first. Download fusible to <your fusibile path>
and execute the following commands:
cd <your fusibile path>
cmake .
make
If nothing goes wrong, you will get an executable named fusable. And most of the errors are caused by mismatched GPU computing power.
Point generation. To recreate the results from our paper, you need to specify the datapath
to
<your dtu_testing path>
, outdir
to <your output save path>
, resume
to <your model path>
, and fusibile_exe_path
to <your fusibile path>/fusibile
in shell file ./script/dtu_test.sh
first and then run:
bash ./scripts/dtu_test.sh
Note that we use the CT-MVSNet_dtu checkpoint when testing on DTU.
Point testing. You need to move the point clouds generated under each scene into a
folder dtu_points
. Meanwhile, you need to rename the point cloud in
the mvsnet001_l3.ply format (the middle three digits represent the number of scene).
Then specify the dataPath
, plyPath
and resultsPath
in
./dtu_eval/BaseEvalMain_web.m
and ./dtu_eval/ComputeStat_web.m
. Finally, run
file ./dtu_eval/BaseEvalMain_web.m
through matlab software to evaluate
DTU point scene by scene first, then execute file ./dtu_eval/BaseEvalMain_web.m
to get the average metrics for the entire dataset.
To visualize the depth map in pfm format, run:
python main.py --vis --depth_path <your depth path> --depth_img_save_dir <your depth image save directory>
The visualized depth map will be saved as <your depth image save directory>/depth.png
. For visualization of point clouds,
some existing software such as MeshLab can be used.
To train the model from scratch on DTU, specify the datapath
and log_dir
in ./scripts/dtu_train.sh
first
and then run:
bash ./scripts/dtu_train.sh
By default, we employ the DistributedDataParallel mode to train our model, you can also train your model in a single GPU.
Thanks to MVSNet, MVSNet_pytorch and CasMVSNet.