ShapeWords: Guiding Text-to-Image Synthesis with 3D Shape-Aware Prompts

CVPR 2025

Dmitry Petrov¹, Pradyumn Goyal¹, Divyansh Shivashok¹, Yuanming Tao¹, Melinos Averkiou^2,3, Evangelos Kalogerakis^1,2,4

¹UMass Amherst ²CYENS CoE ³University of Cyprus ⁴TU Crete

1️⃣ Select an shape category from the dropdown menu -- overall 55 categories. We recommend trying chair (default), car, lamp and bottle categories.

2️⃣ Create a text prompt using [CATEGORY] as a placeholder or use "Random prompt" button to select from a small set of pre-defined prompts

3️⃣ Adjust guidance strength to control shape influence. Use the default 0.9 value for best balance between prompt and shape adherence. Value of 0.0 corresponds to unguided result that is based just on input prompt.

4️⃣ (optional) Choose random seed. For a fixed combination of input prompt and random seed, unguided image will always be the same.

5️⃣ Choose guidance 3D shape using the slider, navigation or random shape buttons. Shapes come from ShapeNet dataset (~55K shapes across all categories)

6️⃣ Click Generate Images button at the bottom to create images that follow both your text prompt and the selected 3D shape geometry

📝 Prompt Design

2️⃣ Text Prompt - Use [CATEGORY] as a placeholder, e.g. 'a [CATEGORY] under a tree'

⚙️ Generation Settings

3️⃣ Guidance Strength (λ) - Higher λ = stronger shape adherence

0 1

4️⃣ Random Seed - (optional) Change for different variations

0 10000

🔍 Shape Selection

5️⃣ Shape Index - Choose a 3D shape to guide image generation

0 6777

Shape 0 of 6777

🖼️ Generated Results Preview

About ShapeWords

ShapeWords incorporates target 3D shape information with text prompts to guide image synthesis.

How It Works

Select an shape category from the dropdown menu -- overall 55 categories. We recommend trying chair (default), car, lamp and bottle categories.
Create a text prompt using [CATEGORY] as a placeholder or use "Random prompt" button to select from a small set of pre-defined prompts
Adjust guidance strength to control shape influence. Use the default 0.9 value for best balance between prompt and shape adherence. Value of 0.0 corresponds to unguided result that is based just on input prompt.
(optional) Choose random seed. For a fixed combination of input prompt and random seed, unguided image will always be the same.
Choose guidance 3D shape using the slider, navigation or random shape buttons. Shapes come from ShapeNet dataset (~55K shapes across all categories)
Click Generate Images button at the bottom to create images that follow both your text prompt and the selected 3D shape geometry

Citation

@misc{petrov2024shapewords,
      title={ShapeWords: Guiding Text-to-Image Synthesis with 3D Shape-Aware Prompts}, 
      author={Dmitry Petrov and Pradyumn Goyal and Divyansh Shivashok and Yuanming Tao and Melinos Averkiou and Evangelos Kalogerakis},
      year={2024},
      eprint={2412.02912},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2412.02912}, 
}