Skip to content
/ LSM Public

[NeurIPS'24] Large Spatial Model: End-to-end Unposed Images to Semantic 3D

License

Notifications You must be signed in to change notification settings

NVlabs/LSM

Repository files navigation

Large Spatial Model: End-to-end Unposed Images to Semantic 3D

arXiv Gradio Home Page

This repository is the official implementation of the Large Spatial Model. LSM reconstructs explicit radiance fields from two unposed images in real-time, capturing geometry, appearance, and semantics.

Table of Contents

Feature and RGB Rendering

Feature Visualization

output_fmap_video.mp4

RGB Color Rendering

output_images_video.mp4

Get Started

Installation

  1. Dowload repo:

    git clone --recurse-submodules https://github.com/NVlabs/LSM.git
    
  2. Create and activate conda environment:

    conda create -n lsm python=3.10
    conda activate lsm
  3. Install PyTorch and related packages:

    conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=12.1 -c pytorch -c nvidia -y
    conda install pytorch-cluster pytorch-scatter pytorch-sparse -c pyg -y
  4. Install other Python dependencies:

    pip install -r requirements.txt
    pip install flash-attn --no-build-isolation
  5. Install PointTransformerV3:

    cd submodules/PointTransformerV3/Pointcept/libs/pointops
    python setup.py install
    cd ../../../../..
  6. Install 3D Gaussian Splatting modules:

    pip install submodules/3d_gaussian_splatting/diff-gaussian-rasterization
    pip install submodules/3d_gaussian_splatting/simple-knn
  7. Install OpenAI CLIP:

    pip install git+https://github.com/openai/CLIP.git
  8. Build croco model:

    cd submodules/dust3r/croco/models/curope
    python setup.py build_ext --inplace
    cd ../../../../..
  9. Download pre-trained models:

    The following three model weights need to be downloaded:

    # 1. Create directory for checkpoints
    mkdir -p checkpoints/pretrained_models
    
    # 2. DUSt3R model weights
    wget -P checkpoints/pretrained_models https://download.europe.naverlabs.com/ComputerVision/DUSt3R/DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth
    
    # 3. LSEG demo model weights
    gdown 1FTuHY1xPUkM-5gaDtMfgCl3D0gR89WV7 -O checkpoints/pretrained_models/demo_e200.ckpt
    
    # 4. LSM final checkpoint
    gdown 1q57nbRJpPhrdf1m7XZTkBfUIskpgnbri -O checkpoints/pretrained_models/checkpoint-final.pth

Usage

  1. Data preparation

    • Prepare any two images of indoor scenes (preferably indoor images, as the model is trained on indoor scene datasets).
    • Place your images in a directory of your choice.

    Example directory structure:

    demo_images/
    └── indoor/
        ├── scene1/
        │   ├── image1.jpg
        │   └── image2.jpg
        └── scene2/
            ├── room1.png
            └── room2.png
  2. Commands

    # Reconstruct 3D scene and generate video using two images
    bash scripts/infer.sh

    Optional parameters in scripts/infer.sh (default settings recommended):

    # Path to your input images
    --file_list "demo_images/indoor/scene2/image1.jpg" "demo_images/indoor/scene2/image2.jpg"
    
    # Output directory for Gaussian points and rendered video
    --output_path "outputs/indoor/scene2"
    
    # Image resolution for processing
    --resolution "256"

Acknowledgement

This work is built on many amazing research works and open-source projects, thanks a lot to all the authors for sharing!

Citation

If you find our work useful in your research, please consider giving a star ⭐ and citing the following paper 📝.

@misc{fan2024largespatialmodelendtoend,
      title={Large Spatial Model: End-to-end Unposed Images to Semantic 3D},
      author={Zhiwen Fan and Jian Zhang and Wenyan Cong and Peihao Wang and Renjie Li and Kairun Wen and Shijie Zhou and Achuta Kadambi and Zhangyang Wang and Danfei Xu and Boris Ivanovic and Marco Pavone and Yue Wang},
      year={2024},
      eprint={2410.18956},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2410.18956},
}