Layout-Your-3D: Controllable and Precise 3D Generation with 2D Blueprint

ICLR 2025

Junwei Zhou¹ Xueting Li ² Lu Qi ^3,4 Ming-Hsuan Yang ^5,6

¹Huazhong University of Science and Technology ²NVIDIA ³Wuhan University

⁴Insta360 Research ⁵UC Merced ⁶Yonsei University

Paper Code

A squirrel standing on a box.

A pigeon having a bagel and beer.

A cute stuffed toy crocodile and a small stuffed toy rabbit on a skateboard.
(User provided 2D layout)

A cute stuffed toy crocodile and a small stuffed toy rabbit on a skateboard.
(LLM generated 2D layout)

Abstract

We present Layout-Your-3D, a framework that allows controllable and compositional 3D generation from text prompts. Existing text-to-3D methods often struggle to generate assets with plausible object interactions or require tedious optimization processes. To address these challenges, our approach leverages 2D layouts as a blueprint to facilitate precise and plausible control over 3D generation. Starting with a 2D layout provided by a user or generated from a text description, we first create a coarse 3D scene using a carefully designed initialization process based on efficient reconstruction models. To enforce coherent global 3D layouts and enhance the quality of instance appearances, we propose a collision-aware layout optimization process followed by instance-wise refinement. Experimental results demonstrate that Layout-Your-3D yields more reasonable and visually appealing compositional 3D assets while significantly reducing the time required for each prompt. Additionally, Layout-Your-3D can be easily applicable to downstream tasks, such as 3D editing and object insertion.

Proposed Framework & Method

Given a 2D layout and text prompt, our coarse 3D generation stage (green box) generates coarse 3D instances along with roughly reasonable layouts. The disentangled refinement stage then refines the 3D layout and enhances individual instance quality by leveraging a collision-aware layout refinement (blue box) followed by an instance-wise refinement (yellow box).

Video

Experiments & Main Results

We evaluate the proposed method make comparisons with other SOTA text-to-3D methods on our collected Compo20 validation set. Layout-Your-3D can generate compositional 3D scenes with higher quality and more reasonable 3D layouts. Note that the first two rows of our results are generated with LLM-grounded 2D layouts, and the last two are generated with user-given 2D layouts.

Comparison on the quality of single 3D instance generation. We provide both the short and extended strategy. When lengthening the optimization process, our Layout-Your-3D can generate comparable or even better results than SOTA text-to-3D methods.

Examples on the 3D instances refined with custom text prompts. We conduct experiments on both the longer and shorter refinement strategies to validate the effectiveness of customization.

Since 3D results are closely related to the reference image, incorporating object insertion into our pipeline would be straightforward.

More generated compositional samples

More compositional 3D scenes generated by our Layout-Your-3D.

A teddy bear reading a book wearing blue sunglasses.

A kitten lying next to a flower.

A blue bird standing on top of a hamburger.

A tray on top of a toy pyramid, on the tray is a red bottle, a toy panda and an apple

More visualization results.

Single 3D object generation.

Custom 3D scene generation.

A pigeon having a bagel and a beer. (original text prompt)

[A blue origami pigeon] having [a bagel made out of grass] and [a beer on fire].

BibTeX

@misc{zhou2024layoutyour3d, title={Layout-Your-3D: Controllable and Precise 3D Generation with 2D Blueprint}, author={Junwei Zhou and Xueting Li and Lu Qi and Ming-Hsuan Yang}, year={2024}, eprint={2410.15391}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2410.15391}, }

Website template from DreamFusion. We thank the authors for the open-source code.

Layout-Your-3D: Controllable and Precise 3D Generation with 2D Blueprint

ICLR 2025

A squirrel standing on a box.

A pigeon having a bagel and beer.

A cute stuffed toy crocodile and a small stuffed toy rabbit on a skateboard. (User provided 2D layout)

A cute stuffed toy crocodile and a small stuffed toy rabbit on a skateboard. (LLM generated 2D layout)

Abstract

Proposed Framework & Method

Video

Experiments & Main Results

More generated compositional samples

A teddy bear reading a book wearing blue sunglasses.

A kitten lying next to a flower.

A blue bird standing on top of a hamburger.

A tray on top of a toy pyramid, on the tray is a red bottle, a toy panda and an apple

More visualization results.

Single 3D object generation.

Custom 3D scene generation.

A pigeon having a bagel and a beer. (original text prompt)

[A blue origami pigeon] having [a bagel made out of grass] and [a beer on fire].

BibTeX

A cute stuffed toy crocodile and a small stuffed toy rabbit on a skateboard.
(User provided 2D layout)

A cute stuffed toy crocodile and a small stuffed toy rabbit on a skateboard.
(LLM generated 2D layout)