Recent advancements in text-to-3D generation improve the visual quality of Score Distillation Sampling (SDS) and its variants by directly connecting Consistency Distillation (CD) to score distillation. However, due to the imbalance between self-consistency and cross-consistency, these CD-based methods inherently suffer from improper conditional guidance, leading to sub-optimal generation results. To address this issue, we present SegmentDreamer, a novel framework designed to fully unleash the potential of consistency models for high-fidelity text-to-3D generation. Specifically, we reformulate SDS through the proposed Segmented Consistency Trajectory Distillation (SCTD), effectively mitigating the imbalance issues by explicitly defining the relationship between self- and cross-consistency. Moreover, SCTD partitions the Probability Flow Ordinary Differential Equation (PF-ODE) trajectory into multiple sub-trajectories and ensures consistency within each segment, which can theoretically provide a significantly tighter upper bound on distillation error. Additionally, we propose a distillation pipeline for a more swift and stable generation. Extensive experiments demonstrate that our SegmentDreamer outperforms state-of-the-art methods in visual quality, enabling high-fidelity 3D asset creation through 3D Gaussian Splatting (3DGS).
An overview of SegmentDreamer: We begin by initializing a 3D representation θusing a 3D generator, such as Point-E. In each iteration, we randomly render a batch of camera views \( \mathbf{z}_0 \) from \( \theta \) and diffuse them into \( \mathbf{z}_m \) with fixed noise \( \boldsymbol{\epsilon}^* \). Next, we transform zsm into \(\tilde{\mathbf{z}}^{\boldsymbol{\Phi}}_t\) using either one-step or two-step unconditional deterministic sampling. During the denoising process, we first estimate \(\hat{\mathbf{z}}^{\boldsymbol{\Phi}}_s\) through one-step conditional deterministic sampling from \(\tilde{\mathbf{z}}^{\boldsymbol{\Phi}}_t\). Subsequently, we compute two consistency functions and utilize them to derive the loss \(\mathcal{L}_{\text{SCTD}}\), which is ultimately employed to optimize \(\theta\)
DreamFusion (~1h) |
LucidDreamer (35~45) |
Consistent3D (~2.4h) |
Connect3D (1h~1.4h) |
SegmentDreamer (32~38min) |
"A DSLR photo of a car made out of cheese." |
|
"A zoomed out DSLR photo of a robot made out of vegetables." |
|
"A DSLR photo of a bald eagle." |
|
"A DSLR photo of a bear dressed as a lumberjack." |
|
"An amigurumi bulldozer." |
|
"A DSLR photo of a corgi wearing a top hat." |
|
"A plush toy of a corgi nurse." |
Consistent3D (CDS) (CFG: 7.5) |
Consistent3D (CDS) (CFG: 20~40) |
ConnectCD (GCS) (CFG: 7.5) |
ConnectCD (GCS+BEG) (CFG: 7.5) |
SCTD (Ours) (CFG: 7.5) |
"A DSLR photo of a corgi wearing a top hat." |
|
"A DSLR photo of a pig wearing a backpack." |
|
"A DSLR photo of a tiger made out of yarn." |
"A DSLR photo of an astronaut riding a horse." |
"A capybara wearing a top hat, low poly style." |
"A baby dragon is spraying flames." |
"A DSLR photo of the Mount Fuji, aerial view." |
"'A steampunk owl with mechanical wings." |
"A DSLR photo of a LV handbag." |
"A zoomed out DSLR photo of an origami hippo in a river." |
"A delicious hamburger." |
"A DSLR photo of a peacock on a surfboard." |
"A DSLR photo of a robot dinosaur." |
"A DSLR photo of an erupting volcano, aerial view." |
"An airplane made out of wood." |
"A portrait of IRONMAN, white hair, head, photorealistic, 8K, HDR." |
"A portrait of Captain America, white hair, head, photorealistic, 8K, HDR." |
"A portrait of Kid Spiderman, blue hair, head, photorealistic, 8K, HDR." |
"A portrait of white marble bust of BATMAN, head, 8K, HDR." |
"A portrait of Hulk, head, photorealistic, 8K, HDR." |
"An armored green-skin orc riding a vicious hog." |
"Mulan, Anime, full body, with armor." |
"black dragonborn, solo, red eyes, male, full body." |
"A soldier, riding a tiger." |
"A warrior with a red cape riding a horse." |
@article{chen2024vividdreamer,
title={SegmentDreamer: Towards High-fidelity Text-to-3D Synthesis with Segmented Consistency Trajectory Distillation},
author={Zhu, Jiahao and Chen, Zixuan and Wang, Guangcong and Xie, Xiaohua and Zhou, Yi},
journal={arXiv preprint arXiv:xxxx.xxxxx},
year={2025}
}
This project is supported by the Natural Science Foundation of China (No. 62072482), and is also supported by the Project of Guangdong Provincial Key Laboratory of Information Security Technology (Grant No. 2023B1212060026).
We also thank to Lior Yariv for the website template.