Skip to content

gpeihong/Image-based-Scene-Flow-Estimation-for-3D-Point-Clouds

Repository files navigation

Image-based Scene Flow Estimation for 3D Point Clouds

Project in DUT Media Lab

Abstract

In recent years, with the development and popularity of 3D scanning devices, 3D data such as point clouds have appeared in people's lives and are growing explosively, and the scene flow estimation for point clouds has become a hot research topic today. The 3D motion field formed by the scene motion in space is called scene flow. Scene flow estimation has important research significance in the fields of target detection, virtual reality, scene understanding, and scene dynamic reconstruction.

Most of the current research work on point cloud scene flow is based on deep learning, where point clouds are learned by end-to-end neural networks. Although these deep learning methods have achieved better results than traditional methods, they still face some bottlenecks. Due to the lack of practical data (marking cost is too high) and objective evaluation metrics, there is still a long way to go before engineering applications. In addition, scene flow estimation has been affected by the speed of computing due to the large spatial scale of calculation. And the huge amount of computation brought by point clouds is also an important factor limiting the scene flow research.

To address the above problems, this paper proposes a method to combine optical flow estimation to achieve 3D point cloud scene flow estimation. First, the FlowNet [2] network is trained on the KITTI dataset and the trained model is used to perform the optical flow estimation task to generate the desired optical flow images. Based on the correspondence that the projection of the scene flow in the 2D image plane is the optical flow, the camera intrinsic matrix, the rotation matrix from the reference camera to the target camera image plane, and the extrinsic matrix from the LiDAR to the reference camera are applied to calculate the projection matrix, and the point cloud is projected onto the corresponding image to achieve the sensor fusion. Finally, combined with the ICP registration of the point clouds, the lines between all corresponding point pairs between adjacent two frames of point clouds are drawn, and combined with the optical flow information of the image, the point cloud scene flow estimation and visualization with low amount of computation are realized.

Training of the neural network

Two datasets, MPI Sintel and Flying Chairs are used. MPI Sintel data formats include .flo, .png; FlyingChairs include .ppm, .png. Find image pair positions by data source type and save the positions by group within List, then break the dataset into training and validation groups. After the model training, the test is executed on the MPI Sintel dataset using the currently trained model with an EPE of 1.485, similar to the test results in the original paper. Figure 3.8 shows the test visualization results, and Figure 3.9 shows its corresponding ground truth.

Testing results in MPI Sintel:

image

MPI Sintel Ground Truth:

image

Optical flow estimation of Kitti Road:

Input image pair:

image image

Output color coded flow field:

image

Fusion of sensors

Sensor setup[39]:

image

Object coordinates[39]:

image

The transformation relation is shown as follows:

image

Where Tr_velo_to_cam * X is the projection of point X in Velodyne point cloud coordinates into the camera 00 (reference camera) coordinate system. R_rect00 *Tr_velo_to_cam * X is the projection of the point X in Velodyne coordinates into the camera 00 (reference camera) coordinate system, and the image coplanar alignment correction based on the reference camera 00, which is necessary for 3D projection using the KITTI dataset. P_rect_00 * R_rect00 * Tr_velo_to_cam * X is to project the point X in Velodyne coordinates into the camera 00 (reference camera) coordinate system, then perform the image co-alignment correction, and then project it into the pixel coordinate system of camera xx.

Point cloud P1 after removing points outside the image plane:

image

Projection of P1 to image P after removal of some points:

image

Approximate estimation and visualization of scene flow

Iterative Closest Point(ICP): Point cloud before registration: image

Lineset of the point cloud projected to the interior of the image plane:

image

The visualization of the optical flow is based on the Munsell color wheel for display. The space of the Munsell color system is roughly in a cylindrical shape, as shown in Figure 4.12. The north-south axis = value, from black to white. Longitude = hue. divides the circle equally into five primary colors and five intermediate colors: red (R), red-yellow (YR), yellow (Y), yellow-green (GY), green (G), green-blue (BG), blue (B), blue-purple (PB), purple (P), and purple-red (RP). The portion between two adjacent positions is then divided equally into 10 parts, for a total of 100 parts. Distance from the axis = chroma, which indicates the purity of the hue. Its value increases from the middle (0) outward with the purity of the hue, and there is no theoretical upper limit. The specific color is expressed in the form: hue + value + chroma.

Munsell Color System:

image

Visualization results of the scene flow(multiple perspectives):

image

image

image

Future work

In this work, scene flow and optical flow estimation have been studied in depth and significant results have been achieved. But this does not mean that their accuracy is comparable to that of end-to-end deep learning methods. Because this experiment is based on ICP registration, it can be preliminarily concluded that the accuracy of scene flow prediction is relatively high, but this experiment should be improved in the future. Subsequently, the lineset vector should be mapped to a two-dimensional vector in the plane according to the projection matrix, and this two-dimensional vector should be compared with the optical flow vector of the corresponding pixel calculated by FlowNet using, for example, cosine similarity, and its accuracy should be counted by quantitative means to guide the subsequent optimization work.

Reference

[2] Dosovitskiy A, Fischer P, Ilg E, et al. Flownet: Learning optical flow with convolutional networks[C]//Proceedings of the IEEE international conference on computer vision. 2015: 2758-2766.

[39] Geiger A, Lenz P, Stiller C, et al. Vision meets robotics: The kitti dataset[J]. The International Journal of Robotics Research, 2013, 32(11): 1231-1237.

About

Project in DUT Media Lab

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published