DreamPartGen: Semantically Grounded Part-Level 3D Generation via Collaborative Latent Denoising

PLAN Lab University of Illinois Urbana-Champaign
TL;DR: We introduce DreamPartGen, a language-grounded collaborative diffusion framework that unifies geometric, visual, and relational reasoning for coherent and interpretable part-level text-to-3D synthesis. We introduce DPLs and RSLs as complementary representations that jointly encode part geometry, appearance, and inter-part relations and are refined together via synchronized co-denoising. We curate PartRel3D, a large-scale relational dataset with 300K canonicalized functional and spatial triplets for explicit language-based supervision of inter-part relations across 175 object categories. We achieve state-of-the-art performance across benchmarks with significant gains in fidelity, language alignment, and controllable part-aware generation.

DreamPartGen Architecture

Part²GS physical constraints visualization.

Our key novelty is to introduce persistent, language-derived relational semantic latents that remain active throughout the denoising process, rather than using text only as a one-shot condition, and to synchronize them with part-level geometric latents. To this end, we formulate part-based 3D generation as a semantically grounded collaborative diffusion process between two complementary latent representations: Duplex Part Latents (DPLs), which encode geometry and appearance of individual parts in a modular and disentangled manner, and Relational Semantic Latents (RSLs), a compact set of text-derived latent tokens that provide both local refinements and global planning signals. During denoising, DPLs and RSLs are synchronized through intra-part and inter-part attention, enabling consistent part-level geometry-appearance alignment and language-guided part assembly.

PartRel3D Dataset

Part²GS physical constraints visualization.

We introduce PartRel3D, a large-scale, relationally annotated part dataset that links part geometry, appearance, and language through explicit functional and spatial relationships. Each object is augmented with canonicalized triplets that encode both functional dependencies (e.g., support, attach, hinge) and spatial arrangements (e.g., above, touching, aligned-with), providing large-scale supervision of assembly-level semantics in 3D.

Rendered GLBs:



Results

quant results.

Quantitative evaluation on 3D object generation. Best and second-best are highlighted.

quant results.

Text-shape alignment comparison. Best and second-best are highlighted.

quant results.

Qualitative comparison on part-level 3D generation. Across diverse object categories, DreamPartGen yields the most faithful decompositions, preserving clear part boundaries, correct topology, and consistent spatial alignment.

BibTeX

@article{yu2026dreampartgen,
 title = {DreamPartGen: Semantically Grounded Part-Level 3D Generation via Collaborative Latent Denoising},
 author = {Yu, Tianjiao and Li, Xinzhuo and Wahed, Muntasir and Xiong, Jerry and Shen, Yifan and Shen, Ying and Lourentzou, Ismini},
 journal = {arXiv preprint arXiv:2603.19216},
 year = {2026}
}