Barbie: Text to Barbie-Style 3D Avatars

1Nanjing University 2Peking University
*Corresponding author
Teaser Image

Our method generates Barbie-style 3D avatars from textual input. Barbie-style refers to the following key characteristics: (1) High-Quality geometry and realistic appearance, ensuring visually lifelike avatars; (2) Fine-Grained Decoupling, separating body, clothing, shoes, and accessories to enable flexible apparel combination and editing; (3) Expressive Animation, supporting a wide range of body movements, facial expressions, and hand gestures; (4) Simulation Compatibility, enabling modeling of non-watertight garments and seamless integration into existing physical simulation pipelines.

Demo Video

Results Overview

Barbie-Style Avatar Gallery

Video results are loading slowly, please be patient.

Abstract

To integrate digital humans into everyday life, there is a strong demand for generating high-quality, fine-grained disentangled 3D avatars that support expressive animation and simulation capabilities, ideally from low-cost textual inputs. Although text-driven 3D avatar generation has made significant progress by leveraging 2D generative priors, existing methods still struggle to fulfill all these requirements simultaneously. To address this challenge, we propose Barbie, a novel text-driven framework for generating animatable 3D avatars with separable shoes, accessories, and simulation-ready garments, truly capturing the iconic ``Barbie doll'' aesthetic. The core of our framework lies in an expressive 3D representation combined with appropriate modeling constraints. Unlike previous methods, we innovatively employ G-Shell to uniformly model both watertight components (e.g., bodies, shoes, and accessories) and non-watertight garments compatible with simulation. Furthermore, we introduce a well-designed initialization and a hole regularization loss to ensure clean open surface modeling. These disentangled 3D representations are then optimized by specialized expert diffusion models tailored to each domain, ensuring high-fidelity outputs. To mitigate geometric artifacts and texture conflicts when combining different expert models, we further propose several effective geometric losses and strategies. Extensive experiments demonstrate that Barbie outperforms existing methods in both dressed human and outfit generation. Our framework further enables diverse applications, including apparel combination, editing, expressive animation, and physical simulation.

Method Highlights

pipeline

Compared to existing methods, the avatars generated by Barbie not only exhibit exquisite geometry and realistic appearance but also wear multiple separable, realistic outfits, while supporting expressive animation and physical simulation.

Methodology

pipeline

Our framework consists of three stages: (1) Human Body Generation: This stage generates a reasonable and realistic basic human body by leveraging human-specific generative priors and a novel SMPLX-evolving prior loss; (2) Apparel Generation: This stage models high-quality garments, shoes, and accessories piece by piece, utilizing object-specific diffusion models together with several initialization strategies and geometric losses; and (3) Unified Texture Refinement: This stage enhances visual harmony and consistency by jointly fine-tuning the composed avatar.

Comparisons with Text-to-Avatar Methods

Comparisons with Text-to-Decoupled-Avatar Methods

Comparisons with Text-to-Apparel Methods

Apparel Composition

Apparel Editing

Expressive Animation

Physical Simulation

BibTeX

@article{sun2024barbie,
  title={Barbie: Text to Barbie-Style 3D Avatars},
  author={Sun, Xiaokun and Zhang, Zhenyu and Tai, Ying and Tang, Hao and Yi, Zili and Yang, Jian},
  journal={arXiv preprint arXiv:2408.09126},
  year={2024}
}