SegmentDreamer

SegmentDreamer: Towards High-fidelity Text-to-3D Synthesis
with Segmented Consistency Trajectory Distillation

¹Jiahao Zhu ¹ZiXuan Chen ²Guangcong Wang

¹Xiaohua Xie ¹Yi Zhou

¹Sun Yat-Sen University

²Great Bay University

Paper

Code

Video

Abstract

Recent advancements in text-to-3D generation improve the visual quality of Score Distillation Sampling (SDS) and its variants by directly connecting Consistency Distillation (CD) to score distillation. However, due to the imbalance between self-consistency and cross-consistency, these CD-based methods inherently suffer from improper conditional guidance, leading to sub-optimal generation results. To address this issue, we present SegmentDreamer, a novel framework designed to fully unleash the potential of consistency models for high-fidelity text-to-3D generation. Specifically, we reformulate SDS through the proposed Segmented Consistency Trajectory Distillation (SCTD), effectively mitigating the imbalance issues by explicitly defining the relationship between self- and cross-consistency. Moreover, SCTD partitions the Probability Flow Ordinary Differential Equation (PF-ODE) trajectory into multiple sub-trajectories and ensures consistency within each segment, which can theoretically provide a significantly tighter upper bound on distillation error. Additionally, we propose a distillation pipeline for a more swift and stable generation. Extensive experiments demonstrate that our SegmentDreamer outperforms state-of-the-art methods in visual quality, enabling high-fidelity 3D asset creation through 3D Gaussian Splatting (3DGS).

Method

An overview of SegmentDreamer: We begin by initializing a 3D representation θusing a 3D generator, such as Point-E. In each iteration, we randomly render a batch of camera views \( \mathbf{z}_0 \) from \( \theta \) and diffuse them into \( \mathbf{z}_m \) with fixed noise \( \boldsymbol{\epsilon}^* \). Next, we transform zsm into \(\tilde{\mathbf{z}}^{\boldsymbol{\Phi}}_t\) using either one-step or two-step unconditional deterministic sampling. During the denoising process, we first estimate \(\hat{\mathbf{z}}^{\boldsymbol{\Phi}}_s\) through one-step conditional deterministic sampling from \(\tilde{\mathbf{z}}^{\boldsymbol{\Phi}}_t\). Subsequently, we compute two consistency functions and utilize them to derive the loss \(\mathcal{L}_{\text{SCTD}}\), which is ultimately employed to optimize \(\theta\)

Visual Comparisons

DreamFusion

(~1h)

LucidDreamer

(35~45)

Consistent3D

(~2.4h)

Connect3D

(1h~1.4h)

SegmentDreamer

(32~38min)

"A DSLR photo of* a car made out of cheese."*
"A zoomed out DSLR photo of* a robot made out of vegetables."*
"A DSLR photo of* a bald eagle."*
"A DSLR photo of* a bear dressed as a lumberjack."*
"An amigurumi bulldozer."
"A DSLR photo of* a corgi wearing a top hat."*
"A plush toy of* a corgi nurse."*

Consistency Distillation Loss Comparisons

Consistent3D (CDS)

(CFG: 7.5)

Consistent3D (CDS)

(CFG: 20~40)

ConnectCD (GCS)

(CFG: 7.5)

ConnectCD (GCS+BEG)

(CFG: 7.5)

SCTD (Ours)

(CFG: 7.5)

"A DSLR photo of* a corgi wearing a top hat."*
"A DSLR photo of* a pig wearing a backpack."*
"A DSLR photo of a tiger made out of yarn."

More Generated Results

"A DSLR photo of an astronaut riding a horse."

"A capybara wearing a top hat, low poly style."

"A baby dragon is spraying flames."

"A DSLR photo of the Mount Fuji, aerial view."

"'A steampunk owl with mechanical wings."

"A DSLR photo of a LV handbag."

"A zoomed out DSLR photo of an origami hippo in a river."

"A delicious hamburger."

"A DSLR photo of a peacock on a surfboard."

"A DSLR photo of a robot dinosaur."

"A DSLR photo of an erupting volcano, aerial view."

"An airplane made out of wood."

SegmentDreamer: Towards High-fidelity Text-to-3D Synthesis
with Segmented Consistency Trajectory Distillation

¹Jiahao Zhu ¹ZiXuan Chen ²Guangcong Wang

¹Xiaohua Xie ¹Yi Zhou

¹Sun Yat-Sen University

²Great Bay University

Demo Video

Abstract

Method

Visual Comparisons

Consistency Distillation Loss Comparisons

More Generated Results

Application

3D Head Generation

3D Avatar Generation

Citation

Acknowledgements

SegmentDreamer: Towards High-fidelity Text-to-3D Synthesis with Segmented Consistency Trajectory Distillation

1Jiahao Zhu 1ZiXuan Chen 2Guangcong Wang

1Xiaohua Xie 1Yi Zhou

1Sun Yat-Sen University

2Great Bay University

Demo Video

Abstract

Method

Visual Comparisons

Consistency Distillation Loss Comparisons

More Generated Results

Application

3D Head Generation

3D Avatar Generation

Citation

Acknowledgements

SegmentDreamer: Towards High-fidelity Text-to-3D Synthesis
with Segmented Consistency Trajectory Distillation

¹Jiahao Zhu ¹ZiXuan Chen ²Guangcong Wang

¹Xiaohua Xie ¹Yi Zhou

¹Sun Yat-Sen University

²Great Bay University