We introduce AmodalGen3D, a generative framework for amodal 3D object reconstruction that infers complete, occlusion-free geometry and appearance from arbitrary sparse inputs. The model integrates 2D amodal completion priors with multi-view stereo geometry conditioning, supported by a View-Wise Cross Attention mechanism for sparse-view feature fusion and a Stereo-Conditioned Cross Attention module for unobserved structure inference. By jointly modeling visible and hidden regions, AmodalGen3D faithfully reconstructs 3D objects that are consistent with sparse-view constraints while plausibly hallucinating unseen parts. Experiments on both synthetic and real-world datasets demonstrate that AmodalGen3D achieves superior fidelity and completeness under occlusion-heavy sparse-view settings, addressing a pressing need for object-level 3D scene reconstruction in robotics, AR/VR, and embodied AI applications.
Overview of AmodalGen3D. Given sparse images, visibility masks, and occlusion masks indicating the occluded object, AmodalGen3D first generates a sparse structure by aggregating multi-view information and infers the complete geometric structure from the partial stereo point cloud. Once the sparse structure is obtained, we employ a pretrained amodal SLAT Transformer, controlling texture generation with visibility masks and occlusion masks and then decode into an occlusion-free 3D object with high-quality geometry and appearance.
A detailed illustration of our proposed View-Wise Cross Attention and Stereo-Conditioned Cross Attention.
@misc{zhou2025amodalgen3d,
title={AmodalGen3D: Generative Amodal 3D Object Reconstruction from Sparse Unposed Views},
author={Junwei Zhou and Yu-Wing Tai},
year={2025},
eprint={2511.21945},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.21945},
}