GENA3D: Generative Amodal 3D Modeling by Bridging 2D Priors and 3D Coherence

Junwei Zhou Yu-Wing Tai

Dartmouth College

What do we do?

Abstract

Generating complete 3D objects under partial occlusions (i.e., amodal scenarios) is a practically important yet challenging problem, as large portions of object geometry are unobserved in real-world scenarios. Existing approaches either operate directly in 3D, which ensures geometric consistency but often lacks generative expressiveness, or rely on 2D amodal completion, which provides strong appearance priors but does not guarantee reliable 3D structure. This raises a key question: how can we achieve both generative plausibility and geometric coherence in amodal 3D modeling? To answer this question, we introduce GENA3D (GENarative Amodal 3D), a framework that integrates learned 2D generative priors with explicit 3D geometric reasoning within a conditional 3D generation paradigm. The 2D priors enable the model to plausibly infer diverse occluded content, while the 3D representation enforces multi-view consistency and spatial validity. Our design incorporates a novel View-Wise Cross-Attention for multi-view alignment and a Stereo-Conditioned Cross-Attention to anchor generative predictions in 3D relationships. By combining generative imagination with structural constraints, GENA3D generates complete and coherent 3D objects from limited observations without sacrificing geometric fidelity. Experiments demonstrate that our method outperforms existing approaches in both synthetic and real-world amodal scenarios, highlighting the effectiveness of bridging 2D priors and 3D coherence in generating plausible and geometrically consistent 3D structures in complex environments.

Proposed Framework & Method

Overview of GENA3D. Given sparse images, visibility masks, and occlusion masks indicating the occluded object, GENA3D first generates a sparse structure by aggregating multi-view information and infers the complete geometric structure from the partial stereo point cloud. Once the sparse structure is obtained, we employ a pretrained amodal SLAT Transformer, controlling texture generation with visibility masks and occlusion masks and then decode into an occlusion-free 3D object with high-quality geometry and appearance.

A detailed illustration of our proposed View-Wise Cross Attention and Stereo-Conditioned Cross Attention. Specifically, we aggregate sparse-view information and fuse them in a view-wise manner and then infer the unobserved geometry structure with observed partial stereo as condition, preserving the geometric consistency and plausibility.

Qualitative Results

Amodal 3D object generation results on Google Scanned Object (GSO) dataset.

In-the-wild & In-the-scene Results

We use in-the-wild real captures from 3D reconstruction datasets to validate our method's ability in real-world scenarios.

BibTeX


                    @misc{zhou2026gena3dgenerativeamodal3d,
                        title={GENA3D: Generative Amodal 3D Modeling by Bridging 2D Priors and 3D Coherence}, 
                        author={Junwei Zhou and Yu-Wing Tai},
                        year={2026},
                        eprint={2511.21945},
                        archivePrefix={arXiv},
                        primaryClass={cs.CV},
                        url={https://arxiv.org/abs/2511.21945}, 
                  }