GitHub - NVlabs/LSM: [NeurIPS'24] Large Spatial Model: End-to-end Unposed Images to Semantic 3D

Large Spatial Model: End-to-end Unposed Images to Semantic 3D

This repository is the official implementation of the Large Spatial Model. LSM reconstructs explicit radiance fields from two unposed images in real-time, capturing geometry, appearance, and semantics.

Feature and RGB Rendering

Feature Visualization

output_fmap_video.mp4

RGB Color Rendering

output_images_video.mp4

Get Started

Installation

Dowload repo:

git clone --recurse-submodules https://github.com/NVlabs/LSM.git

Create and activate conda environment:

conda create -n lsm python=3.10
conda activate lsm

Install PyTorch and related packages:

conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=12.1 -c pytorch -c nvidia -y
conda install pytorch-cluster pytorch-scatter pytorch-sparse -c pyg -y

Install other Python dependencies:

pip install -r requirements.txt
pip install flash-attn --no-build-isolation

Install PointTransformerV3:

cd submodules/PointTransformerV3/Pointcept/libs/pointops
python setup.py install
cd ../../../../..

Install 3D Gaussian Splatting modules:

pip install submodules/3d_gaussian_splatting/diff-gaussian-rasterization
pip install submodules/3d_gaussian_splatting/simple-knn

Install OpenAI CLIP:

pip install git+https://github.com/openai/CLIP.git

Build croco model:

cd submodules/dust3r/croco/models/curope
python setup.py build_ext --inplace
cd ../../../../..

Download pre-trained models:

The following three model weights need to be downloaded:

# 1. Create directory for checkpoints
mkdir -p checkpoints/pretrained_models

# 2. DUSt3R model weights
wget -P checkpoints/pretrained_models https://download.europe.naverlabs.com/ComputerVision/DUSt3R/DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth

# 3. LSEG demo model weights
gdown 1FTuHY1xPUkM-5gaDtMfgCl3D0gR89WV7 -O checkpoints/pretrained_models/demo_e200.ckpt

# 4. LSM final checkpoint
gdown 1q57nbRJpPhrdf1m7XZTkBfUIskpgnbri -O checkpoints/pretrained_models/checkpoint-final.pth

Usage

Data preparation
- Prepare any two images of indoor scenes (preferably indoor images, as the model is trained on indoor scene datasets).
- Place your images in a directory of your choice.
Example directory structure:
```
demo_images/
└── indoor/
    ├── scene1/
    │   ├── image1.jpg
    │   └── image2.jpg
    └── scene2/
        ├── room1.png
        └── room2.png
```

Commands

# Reconstruct 3D scene and generate video using two images
bash scripts/infer.sh

Optional parameters in scripts/infer.sh (default settings recommended):

# Path to your input images
--file_list "demo_images/indoor/scene2/image1.jpg" "demo_images/indoor/scene2/image2.jpg"

# Output directory for Gaussian points and rendered video
--output_path "outputs/indoor/scene2"

# Image resolution for processing
--resolution "256"

Acknowledgement

This work is built on many amazing research works and open-source projects, thanks a lot to all the authors for sharing!

Citation

If you find our work useful in your research, please consider giving a star ⭐ and citing the following paper 📝.

@misc{fan2024largespatialmodelendtoend,
      title={Large Spatial Model: End-to-end Unposed Images to Semantic 3D},
      author={Zhiwen Fan and Jian Zhang and Wenyan Cong and Peihao Wang and Renjie Li and Kairun Wen and Shijie Zhou and Achuta Kadambi and Zhangyang Wang and Danfei Xu and Boris Ivanovic and Marco Pavone and Yue Wang},
      year={2024},
      eprint={2410.18956},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2410.18956},
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.vscode		.vscode
assets/demo_videos		assets/demo_videos
configs		configs
demo_images/indoor		demo_images/indoor
large_spatial_model		large_spatial_model
scripts		scripts
submodules		submodules
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
demo.py		demo.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Large Spatial Model: End-to-end Unposed Images to Semantic 3D

Table of Contents

Feature and RGB Rendering

Feature Visualization

RGB Color Rendering

Get Started

Installation

Usage

Acknowledgement

Citation

About

Releases

Packages

Contributors 2

Languages

License

NVlabs/LSM

Folders and files

Latest commit

History

Repository files navigation

Large Spatial Model: End-to-end Unposed Images to Semantic 3D

Table of Contents

Feature and RGB Rendering

Feature Visualization

RGB Color Rendering

Get Started

Installation

Usage

Acknowledgement

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages