DepR: Depth Guided Single-view Scene Reconstruction with Instance-level Diffusion

ICCV 2025

^* equal contribution corresponding author

^† Project done while Qingcheng Zhao interned at UC San Diego.

¹ShanghaiTech University ²UC San Diego ³Lambda, Inc. ⁴Stanford University

Abstract

We propose DepR, a depth-guided single-view scene reconstruction framework that integrates instance-level diffusion within a compositional paradigm. Instead of reconstructing the entire scene holistically, DepR generates individual objects and subsequently composes them into a coherent 3D layout.

Unlike previous methods that use depth solely for object layout estimation during inference and therefore fail to fully exploit its rich geometric information, DepR leverages depth throughout both training and inference. Specifically, we introduce depth-guided conditioning to effectively encode shape priors into diffusion models. During inference, depth further guides DDIM sampling and layout optimization, enhancing alignment between the reconstruction and the input image. Despite being trained on limited synthetic data, DepR achieves state-of-the-art performance and demonstrates strong generalization in singleview scene reconstruction, as shown through evaluations on both synthetic and real-world datasets.

Method

Overview of our DepR. Depth is utilized in three key stages: 1) to back-project features to condition the latent tri-plane diffusion model to generate complete 3D shapes; 2) to guide the diffusion sampling process via gradients from a depth loss; and 3) to optimize object poses via layout loss for accurate scene composition.

BibTeX

@InProceedings{Zhao_2025_ICCV_DepR, author = {Zhao, Qingcheng and Zhang, Xiang and Xu, Haiyang and Chen, Zeyuan and Xie, Jianwen and Gao, Yuan and Tu, Zhuowen}, title = {DepR: Depth Guided Single-view Scene Reconstruction with Instance-level Diffusion}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {5722-5733} }

DepR: Depth Guided Single-view Scene Reconstruction with Instance-level Diffusion

Abstract

Method

Comparisons on 3D-FRONT

Comparisons on Pix-3D and Our Own Images

BibTeX