We introduce X-Ray, a novel 3D sequential representation inspired by the penetrability of x-ray scans. X-Ray transforms a 3D object into a series of surface frames at different layers, making it suitable for generating 3D models from images. Our method utilizes ray casting from the camera center to capture geometric and textured details, including depth, normal, and color, across all intersected surfaces. This process efficiently condenses the whole 3D object into a multi-frame video format, motivating the utilize of a network architecture similar to those in video diffusion models. This design ensures an efficient 3D representation by focusing solely on surface information. We demonstrate the practicality and adaptability of our X-Ray representation by synthesizing the complete visible and hidden surfaces of a 3D object from a single input image, which paves the way for new 3D representation research and practical applications.
The overview of 3D synthesis via X-Ray.
$ conda create -n xray python=3.10
$ pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118
$ pip install -U xformers==v0.0.23.post1 --index-url https://download.pytorch.org/whl/cu118
$ pip install -r requirements.txt
Download Dataset from Huggingface.
$ cat 0*.zip > Objaverse_XRay.zip
$ unzip Objaverse_XRay.zip
$ ln -s /path/to/Objaverse_XRay Data/Objaverse_XRay
- Render the mesh to obtain the image and camera parameters.
$ cd preprocess/get_image
$ bash custom/render_mesh.sh
- Obtain the X-Ray representation.
$ cd preprocess/get_xray
$ python get_xray.py
- load xray from .npz file
from scipy.sparse import csr_matrix
import numpy as np
def load_xray(xray_path):
loaded_data = np.load(xray_path)
loaded_sparse_matrix = csr_matrix((loaded_data['data'], loaded_data['indices'], loaded_data['indptr']), shape=loaded_data['shape'])
original_shape = (16, 1+3+3, 256, 256)
restored_array = loaded_sparse_matrix.toarray().reshape(original_shape)
return restored_array
xray = load_xray('example/dataset/xrays/0a0bc2921e5246a28732bf5584c251d1/000.npz')
- A minimal dataset is located in ./example/dataset
$ bash scripts/train_diffusion.sh
$ bash scripts/train_upsampler.sh
$ python evaluate_diffusion.py --exp_diffusion Objaverse_XRay --date_root Data/Objaverse_XRay
$ python evaluate_upsampler.py --exp_diffusion Objaverse_XRay --exp_upsampler Objaverse_XRay_upsampler
- Release paper details.
- Release the dataset.
- Release the training and testing source code.
- Release the preprocessing code.
- Release the pre-trained model.
- Release the gradio demo.
Tao Hu et al.
- The model is related to Diffusers and Stability AI;
- The source code is mainly based on SVD Xtend, which can train Stable Video Diffusion from scratch.
If you find this work useful for your research, please cite our paper:
@inproceedings{
hu2024xray,
title={X-Ray: A Sequential 3D Representation For Generation},
author={Tao Hu and Wenhang Ge and Yuyang Zhao and Gim Hee Lee},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024},
url={https://openreview.net/forum?id=36tMV15dPO}
}