Notes on Learning Generative Models of 3D Structures

notes

In this presentation, Tao Wu provides a comprehensive overview of generative models for 3D structures, exploring various 3D representations, generative methodologies, and applications.

Author

Christian Mills

Published

December 9, 2021

Introduction
Structure-Aware Representation
Methodologies for Learning Generative Models
Application: Visual Program Induction

Presentation Materials

Introduction

Presenter: Tao Wu
Topic: Generative Modeling for 3D Contents
Source: 2020 survey titled “Learning Generative Models of 3D Structures”
Focus: Concepts in 3D representations and generative networks.
Importance of 3D Graphics:
- Critical to industries like gaming, animation, architecture, and interior design.
Problem: Lack of training data due to high costs of data capturing and human labeling.
Solution: Generative models can help address this data scarcity.
Generative vs. Discriminative Models:
- \[ P(X) \ vs \ discriminative: P(Y|X) \]
- Generative models learn the probability distribution over an input space (x). They can sample objects directly from x.
- Discriminative models learn to predict an attribute (y) given an input (x).
Benefits of Generative Models:
- Simulating real-world environments.
- Synthetically generating training data.
Target Audience: New graduate students in computer science.
Survey Scope:
- Large range of historical work.
- Recent progress on generative 3D modeling.

Structure-Aware Representation

Focus: Learned generative models of structured 3D content.
Learned Model: Trained with data instead of manual creation or rule-based systems.
Structured 3D Shapes and Scenes: Collections of substructures, which can be further decomposed.
- Example:
  - An indoor scene (room) consists of a chair, desk, and bed.
  - A chair can be decomposed into a base, seat, and back.
Structure-Aware: Expressing 3D entities while allowing manipulation of their high-level structure.
Two Aspects of Structure-Aware Representation:
- Geometry of atomic elements.
- Structural patterns.

Representations of Low-Level Geometry

Point Clouds: (Covered in previous lectures)
Triangle Mesh: (Covered in previous lectures)
Implicit Surface: A function determining whether a point is inside or outside a surface.

Representations of 3D Structures

Segmented Geometry: Linking a label to each part of the entity’s geometry.
Partsets: An ordered set of atoms.
Relationship Graphs: Edges connect different parts.
Hierarchies (Trees): Representing parent-child relationships between parts.
Hierarchical Graphs: Combining relationship graphs and hierarchies.
Deterministic Program:
- Most general way to represent 3D structures.
- Can output any of the previous representations.
- Beneficial for making patterns clear and allowing easy editing.

Methodologies for Learning Generative Models

Synthesis Methods Flowchart

Constraint-Based Program Synthesizer: Best for few training examples. Finds the minimum cost program satisfying certain constraints.
Classic Probabilistic Models: Suitable for larger datasets (but not large enough for deep learning).
- Probabilistic Graphical Models (e.g., Bayesian network, Markov random field): Best for content with a fixed structure.
- Probabilistic Context-Free Grammar (PCFG): Better for varying structures.
  - Context-Free Grammar (CFG): Used in Natural Language Processing (NLP).
    - Consists of a start symbol, a set of terminals, a set of non-terminals, and rules mapping non-terminals to another layout.
    - Example:
      - Non-terminal: F
      - Terminals: Left arrow, Right arrow, Leaf node
      - Derived tree (sentence) contains only terminals.
  - PCFG: Augments CFG with probabilities for each rule.
    - Probability of a derived tree is the product of applied rule probabilities.
  - Suitability: Well-suited for dynamic model structures due to dynamic recursive nature.
Deep Neural Networks (DNNs): Often the best choice when a large amount of training data is available.
- Autoregressive Model: Iteratively consumes its output from one iteration as input to the next.
  - Example: Inserting one object at a time to generate an indoor scene.
  - Weakness: Prone to drift; errors in one step can cause subsequent outputs to diverge.
- Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs):
  - Popular in recent years.
  - Sample over a low-dimensional latent space.
  - Learn a generator that maps latent vectors to 3D shapes.
  - Use a global latent variable to control generation.
  - Trained with a reconstruction loss between input and generated output.
  - Advantages: Often outperform autoregressive models in global coherence.
- Network Suitability: Different neural networks perform better with specific structured data representations.

Application: Visual Program Induction

Definition: Synthesizing a plausible program that creates 3D content.
Process: Recovering the generator program from existing 3D shapes.
Early Examples: Reconstructing 3D shapes via simple geometric primitives.
- Example (2017 work): Decomposing shapes into primitives and using chamfer distance as a loss function.
  - https://github.com/shubhtuls/volumetricPrimitives
  - Learning Shape Abstractions
  - Learning Shape Abstractions by Assembling Volumetric Primitives
More Recent Work:
- Outputting 3D shape programs with loops and high-level structures.
- Executing the program to reconstruct shapes.
- https://github.com/HobbitLong/shape2prog
- Learning to Infer and Execute 3D Shape Programs
Visual Program Induction from 2D Images:
- Inferring programs that generate 2D diagrams from hand-drawn sketches.
- Using inferred programs for downstream tasks like image editing.
- https://github.com/paschalidoud/superquadric_parsing
Benefits: Efficient and flexible scene manipulation.

About Me:

I’m Christian Mills, a deep learning consultant specializing in practical AI implementations. I help clients leverage cutting-edge AI technologies to solve real-world problems.

Interested in working together? Fill out my Quick AI Project Assessment form or learn more about me.