initial commit

IIT-PAVIS · Sep 22, 2020 · 80f719f · 80f719f
1 parent 5d85264
commit 80f719f
Show file tree

Hide file tree

Showing 67 changed files with 5,587 additions and 1 deletion.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,2 @@
+script/datagen
+script/nbv_exp
diff --git a/.idea/.gitignore b/.idea/.gitignore
diff --git a/.idea/ExHistCNN.iml b/.idea/ExHistCNN.iml
diff --git a/.idea/inspectionProfiles/profiles_settings.xml b/.idea/inspectionProfiles/profiles_settings.xml
diff --git a/.idea/misc.xml b/.idea/misc.xml
diff --git a/.idea/modules.xml b/.idea/modules.xml
diff --git a/.idea/vcs.xml b/.idea/vcs.xml
diff --git a/README.md b/README.md
@@ -13,7 +13,56 @@ alt="IMAGE ALT TEXT HERE" width="480" height="360" border="10" /></a> </br>
 ## Introduction
 In this project, we address the problem of autonomous 3D exploration of an unknown indoor environment using a depth camera. We cast the problem as the estimation of the Next Best View (NBV) that maximises the coverage of the unknown area. We do this by re-formulating NBV estimation as a classification problem and we propose a novel learning-based metric that encodes both, the current 3D observation (a depth frame) and the history of the ongoing reconstruction. One of the major contributions of this work is about introducing a new representation for the 3D reconstruction history as an auxiliary utility map which is efficiently coupled with the current depth observation. With both pieces of information, we train a light-weight CNN, named ExHistCNN, that estimates the NBV as a set of directions towards which the depth sensor finds most unexplored areas. 
 
-In this repo, we will share the code, dataset as well our pretrained models. Details will follow shortly!
+In this repo, we will share the code, dataset as well our pretrained models. The shared scripts covers the dataset orgnaisation and model training. While nbv-related scripts can not be fully shared due to redistribution constraints, we provide the full result data of the paper for potential comparison.
+
+
+## Installation
+The code is tested on Unbuntu 16.04 LTS with Cuda version V9.2.148. It is recommended to use to virtual environment, e.g. conda. You can create a conda virtual env using the provided environment file
+
+`conda env create -f environment.yml`
+
+Clone the project and activate the virtual environment, we are good to go.
+
+## Dataset
+
+Download the <a href="https://istitutoitalianotecnologia-my.sharepoint.com/:u:/g/personal/yiming_wang_iit_it/EWReM9pGq1NOop3jxORE_S8BT9bkGg27uJUzLvdOHq6AfA?e=6QrYXp">dataset</a> through the link. The dataset contains the rendered rooms in SUNCG and mp3d that are used in this paper together with the ground-truth annotation json files.
+The renderings (depth+rgb) per room can be generated by existing tools. For SUNCG, you can use the [SUNCG toolbox](https://github.com/tinytangent/SUNCGtoolbox) while for real-world room scans from mp3d, you can use [HabitatSim](https://github.com/facebookresearch/habitat-sim).
+You can check the procedure of the dataset generation from the video:
+<p align="center">
+  <a href="https://www.youtube.com/watch?v=m1UtcLF0GpE" target="_blank"><img src="http://img.youtube.com/vi/m1UtcLF0GpE/0.jpg"
+alt="IMAGE ALT TEXT HERE" width="480" height="360" border="10" /></a> </br>
+  <a href="http://www.youtube.com/watch?v=r_YE-oIccxQ">Video for dataset generation!</a>
+</p>
+
+It is encouraged to put the dataset foler in the proj folder, where the script can be run without adapting path.
+
+### Dataset organisation
+You can download the [csv files](https://istitutoitalianotecnologia-my.sharepoint.com/:u:/g/personal/yiming_wang_iit_it/Ec1AvAnEpehLgOWY2E3ZEJUB3gkNDVAUcxaGJkNzJDxj-Q?e=MLEy14) that organise the dataset for train, validation and test.
+It is recommended the dataset organisation files inside the data/ folder.
+
+Optionally, you can organise your own dataset split for training/validation/testing by tuning and running the scripts in script/train/organise_dataset.py
+
+### Dataset preprocessing
+We first pre-process the dataset, in terms of reading images and applying transforms, and save the data to h5 files. This facilitate speedy training.
+you can pre-process the dataset using the script: script/train/prepare_H5.py
+
+In addition, if you want to train the MLP classifiers as described in the paper, you can prepare the features for training using resnet: script/train/prepare_resnet_features_MLP.py after you have the H5 files
+
+## Network
+### training
+You can use the scripts
+
+Optionally you can download the pretrained models from [here](https://istitutoitalianotecnologia-my.sharepoint.com/:u:/g/personal/yiming_wang_iit_it/EUBOjPb27VFOsNSuF7b8__EBoQ5WemMzOOxSJxdHyrnGAg?e=BfePqo)
+Please locate the pretrained models within the checkpoint folder
+### evaluation
+You can evaluate the models by running: script/train/evaluate_network.py
+The evaluation metric will be saved and can be visualised by running: script/train/plot_result.py
+
+## Visualise NBV results
+You can download the metadata for the [results](https://istitutoitalianotecnologia-my.sharepoint.com/:u:/g/personal/yiming_wang_iit_it/EQ9TTdvz-f1MpV2OEjpJ6ksBqca_JQsJN2byhEvO7Vhutg?e=QiAXxe) for each NBV startegies reported in the paper.
+Run the script: script/result_analysis/analyse_result.py, to obtain the figures as reported in the paper.
+
+You can also visualise the point cloud at each time under each run and each method by using the script: script/result_analysis/reconstruct_NBV.py
 
 ## Citation
 If you find our work useful in your research, please consider citing:

diff --git a/checkpoint b/checkpoint
@@ -0,0 +1 @@
+/data/auto3dmemo/checkpoint
diff --git a/config.yml b/config.yml
@@ -0,0 +1,38 @@
+#common
+project_name: auto3dmemo
+dataset_folder: "dataset"
+depth_folder: "depth"
+visibility_folder: "visibility_map"
+param_folder: "param"
+temp_folder: "temp"
+
+## dataset generation related
+#logger_file: "logger"
+#pose_file: "pose.txt"
+#ee_pose_file: "pose_ee.txt"
+#neigh_info_file: "neighbourinfo.json"
+#neigh_id_file: "pose_neighbour.txt"
+#neigh_direction_id_file: "pose_neighbour_direcion.txt"
+#limit_neigh_num: 10
+#enforce_generation_neighbour: False
+#motion_label_file: "motionlabel.json"
+
+# visbility_related
+info_with_random: true
+max_range: 6.0
+octomap_resolution: 0.05
+point_subsample: 1
+sensor_min_dis: 0.1
+sensor_max_dis: 5.5
+sensor_angle_pan: 0.32 # unit pi*rad
+sensor_angle_tilt: 0.25 #0.24
+sensor_angle_gap: 0.005
+
+# reconstruction_related:
+intrinsic_file: "camera.json"
+hand_eye_file: "hand_eye.txt"
+
+## NBV result related
+#result_folder: "result"
+#vismap_runtime_folder: "vismap_runtime"
+#enforce_generation_nbv: False
diff --git a/data b/data
@@ -0,0 +1 @@
+/data/auto3dmemo/ExHistCNN_data
diff --git a/dataset b/dataset
@@ -0,0 +1 @@
+/data/auto3dmemo/dataset
diff --git a/environment.yml b/environment.yml
@@ -0,0 +1,171 @@
+name: memo_nbv
+channels:
+  - open3d-admin
+  - pytorch
+  - defaults
+dependencies:
+  - _libgcc_mutex=0.1=main
+  - attrs=19.2.0=py_0
+  - backports=1.0=py_2
+  - backports.shutil_get_terminal_size=1.0.0=py27_2
+  - backports_abc=0.5=py_0
+  - blas=1.0=mkl
+  - bleach=3.1.0=py27_0
+  - bokeh=1.3.4=py27_0
+  - ca-certificates=2020.1.1=0
+  - cairo=1.14.8=0
+  - certifi=2019.11.28=py27_0
+  - cffi=1.12.3=py27h2e261b9_0
+  - click=7.0=py27_0
+  - cloudpickle=1.2.2=py_0
+  - configparser=4.0.2=py27_0
+  - cudatoolkit=9.2=0
+  - cycler=0.10.0=py27_0
+  - cytoolz=0.10.0=py27h7b6447c_0
+  - dask=1.2.2=py_0
+  - dask-core=1.2.2=py_0
+  - dbus=1.13.6=h746ee38_0
+  - decorator=4.4.0=py27_1
+  - defusedxml=0.6.0=py_0
+  - distributed=1.28.1=py27_0
+  - entrypoints=0.3=py27_0
+  - enum34=1.1.6=py27_1
+  - expat=2.2.6=he6710b0_0
+  - fontconfig=2.12.1=3
+  - freetype=2.5.5=2
+  - functools32=3.2.3.2=py27_1
+  - future=0.17.1=py27_0
+  - futures=3.3.0=py27_0
+  - glib=2.56.2=hd408876_0
+  - gmp=6.1.2=h6c8ec71_1
+  - gst-plugins-base=1.14.0=hbbd80ab_1
+  - gstreamer=1.14.0=hb453b48_1
+  - h5py=2.9.0=py27h7918eee_0
+  - hdf5=1.10.4=hb1b8bf9_0
+  - heapdict=1.0.1=py_0
+  - icu=54.1=0
+  - imageio=2.6.0=py27_0
+  - intel-openmp=2019.4=243
+  - ipaddress=1.0.22=py27_0
+  - ipykernel=4.10.0=py27_0
+  - ipython=5.8.0=py27_0
+  - ipython_genutils=0.2.0=py27_0
+  - ipywidgets=7.5.1=py_0
+  - jinja2=2.10.3=py_0
+  - jpeg=9b=h024ee3a_2
+  - jsonschema=3.0.2=py27_0
+  - jupyter_client=5.3.3=py27_1
+  - jupyter_core=4.5.0=py_0
+  - libffi=3.2.1=hd88cf55_4
+  - libgcc=7.2.0=h69d50b8_2
+  - libgcc-ng=9.1.0=hdf63c60_0
+  - libgfortran-ng=7.3.0=hdf63c60_0
+  - libiconv=1.14=0
+  - libpng=1.6.37=hbc83047_0
+  - libsodium=1.0.16=h1bed415_0
+  - libstdcxx-ng=9.1.0=hdf63c60_0
+  - libtiff=4.0.10=h2733197_2
+  - libxcb=1.13=h1bed415_1
+  - libxml2=2.9.9=hea5a465_1
+  - linecache2=1.0.0=py27_0
+  - locket=0.2.0=py27_1
+  - markupsafe=1.1.1=py27h7b6447c_0
+  - matplotlib=2.0.2=np111py27_0
+  - mistune=0.8.4=py27h7b6447c_0
+  - mkl=2019.4=243
+  - mkl-service=2.3.0=py27he904b0f_0
+  - mkl_fft=1.0.14=py27ha843d7b_0
+  - mkl_random=1.0.2=py27hd81dba3_0
+  - msgpack-python=0.6.1=py27hfd86e86_1
+  - nbconvert=5.6.0=py27_1
+  - nbformat=4.4.0=py27_0
+  - networkx=2.2=py27_1
+  - ninja=1.9.0=py27hfd86e86_0
+  - notebook=5.7.8=py27_0
+  - numpy=1.11.3=py27h7e9f1db_12
+  - numpy-base=1.11.3=py27hde5b4d6_12
+  - olefile=0.46=py27_0
+  - openssl=1.0.2u=h7b6447c_0
+  - packaging=19.2=py_0
+  - pandas=0.23.4=py27h04863e7_0
+  - pandoc=2.2.3.2=0
+  - pandocfilters=1.4.2=py27_1
+  - partd=1.0.0=py_0
+  - pathlib2=2.3.5=py27_0
+  - pcre=8.43=he6710b0_0
+  - pexpect=4.7.0=py27_0
+  - pickleshare=0.7.5=py27_0
+  - pillow=4.2.1=py27_0
+  - pip=19.2.3=py27_0
+  - pixman=0.34.0=hceecf20_3
+  - prometheus_client=0.7.1=py_0
+  - prompt_toolkit=1.0.15=py27_0
+  - psutil=5.6.3=py27h7b6447c_0
+  - ptyprocess=0.6.0=py27_0
+  - pycairo=1.13.3=py27hea6d626_0
+  - pycparser=2.19=py27_0
+  - pygments=2.4.2=py_0
+  - pyparsing=2.4.2=py_0
+  - pyqt=5.6.0=py27h22d08a2_6
+  - pyrsistent=0.15.4=py27h7b6447c_0
+  - python=2.7.12=1
+  - python-dateutil=2.8.0=py27_0
+  - pytz=2019.3=py_0
+  - pywavelets=1.0.3=py27hdd07704_1
+  - pyyaml=5.1.2=py27h7b6447c_0
+  - pyzmq=18.1.0=py27he6710b0_0
+  - qt=5.6.2=5
+  - readline=6.2=2
+  - scandir=1.10.0=py27h7b6447c_0
+  - scikit-image=0.13.1=py27h14c3975_1
+  - scikit-learn=0.20.3=py27hd81dba3_0
+  - scipy=1.2.1=py27h7c811a0_0
+  - send2trash=1.5.0=py27_0
+  - setuptools=41.4.0=py27_0
+  - simplegeneric=0.8.1=py27_2
+  - singledispatch=3.4.0.3=py27_0
+  - sip=4.18.1=py27hf484d3e_2
+  - six=1.12.0=py27_0
+  - sortedcontainers=2.1.0=py27_0
+  - sqlite=3.13.0=0
+  - subprocess32=3.5.4=py27h7b6447c_0
+  - tblib=1.4.0=py_0
+  - terminado=0.8.2=py27_0
+  - testpath=0.4.2=py27_0
+  - tk=8.5.18=0
+  - toolz=0.10.0=py_0
+  - tornado=5.1.1=py27h7b6447c_0
+  - traceback2=1.4.0=py27_0
+  - traitlets=4.3.3=py27_0
+  - unittest2=1.1.0=py27_0
+  - wcwidth=0.1.7=py27_0
+  - webencodings=0.5.1=py27_1
+  - wheel=0.33.6=py27_0
+  - widgetsnbextension=3.5.1=py27_0
+  - xz=5.2.4=h14c3975_4
+  - yaml=0.1.7=had09818_2
+  - zeromq=4.3.1=he6710b0_3
+  - zict=1.0.0=py_0
+  - zlib=1.2.11=h7b6447c_3
+  - zstd=1.3.7=h0b5b093_0
+  - open3d=0.8.0.0=py27_0
+  - pytorch=1.3.0=py2.7_cuda9.2.148_cudnn7.6.3_0
+  - torchvision=0.4.1=py27_cu92
+  - pip:
+    - asn1crypto==0.24.0
+    - auto3dmemo==0.0.1
+    - auto3dnbv==0.0.1
+    - backports.functools-lru-cache==1.5
+    - cryptography==2.6.1
+    - future-fstrings==0.4.5
+    - kiwisolver==1.1.0
+    - msgpack==0.6.1
+    - multiprocessing==2.6.2.1
+    - open-3d==0.3.0.0
+    - open3d-official==0.3.0.0
+    - opencv-python==4.0.0.21
+    - pyopenssl==19.0.0
+    - tokenize-rt==2.2.0
+    - torch==1.3.0
+prefix: /home/yiming/.miniconda3/envs/memo_nbv
+
diff --git a/network/MLP.py b/network/MLP.py
@@ -0,0 +1,25 @@
+import torch
+import torch.nn as nn
+import torch.nn.functional as func
+
+
+class MLP(nn.Module):
+    def __init__(self, input_lenth = 10240):
+        super(MLP, self).__init__()
+        self.input_lenth = input_lenth
+        # apply drop out
+        self.fc_dropout = nn.Dropout(p=0.5)    # https://arxiv.org/pdf/1207.0580.pdf
+        # fully connected layers
+        self.fc1 = nn.Linear(input_lenth, 1024) ### divide by 8 because 3 maxpooling us applied
+        self.fc2 = nn.Linear(1024, 512)
+        self.fc3 = nn.Linear(512, 256)
+        self.fc4 = nn.Linear(256, 4)  # 4 output
+
+    def forward(self, x):
+        x = x.float()
+        x = x.view(-1, self.input_lenth)
+        x = self.fc_dropout(func.relu(self.fc1(x)))
+        x = self.fc_dropout(func.relu(self.fc2(x)))
+        x = self.fc_dropout(func.relu(self.fc3(x)))
+        x = self.fc4(x)
+        return x
diff --git a/network/MLP.pyc b/network/MLP.pyc