Notes on Learning Generative Models of 3D Structures

ai
notes
In this presentation, Tao Wu provides a comprehensive overview of generative models for 3D structures, exploring various 3D representations, generative methodologies, and applications.
Author

Christian Mills

Published

December 9, 2021

Introduction

  • Presenter: Tao Wu
  • Topic: Generative Modeling for 3D Contents
  • Source: 2020 survey titled “Learning Generative Models of 3D Structures”
  • Focus: Concepts in 3D representations and generative networks.
  • Importance of 3D Graphics:
    • Critical to industries like gaming, animation, architecture, and interior design.
  • Problem: Lack of training data due to high costs of data capturing and human labeling.
  • Solution: Generative models can help address this data scarcity.
  • Generative vs. Discriminative Models:
    • \[ P(X) \ vs \ discriminative: P(Y|X) \]
    • Generative models learn the probability distribution over an input space (x). They can sample objects directly from x.
    • Discriminative models learn to predict an attribute (y) given an input (x).
  • Benefits of Generative Models:
    • Simulating real-world environments.
    • Synthetically generating training data.
  • Target Audience: New graduate students in computer science.
  • Survey Scope:
    • Large range of historical work.
    • Recent progress on generative 3D modeling.

Structure-Aware Representation

  • Focus: Learned generative models of structured 3D content.
  • Learned Model: Trained with data instead of manual creation or rule-based systems.
  • Structured 3D Shapes and Scenes: Collections of substructures, which can be further decomposed.
    • Example:
      • An indoor scene (room) consists of a chair, desk, and bed.
      • A chair can be decomposed into a base, seat, and back.
  • Structure-Aware: Expressing 3D entities while allowing manipulation of their high-level structure.
  • Two Aspects of Structure-Aware Representation:
    • Geometry of atomic elements.
    • Structural patterns.

Representations of Low-Level Geometry

  • Point Clouds: (Covered in previous lectures)
  • Triangle Mesh: (Covered in previous lectures)
  • Implicit Surface: A function determining whether a point is inside or outside a surface.

Representations of 3D Structures

  • Segmented Geometry: Linking a label to each part of the entity’s geometry.
  • Partsets: An ordered set of atoms.
  • Relationship Graphs: Edges connect different parts.
  • Hierarchies (Trees): Representing parent-child relationships between parts.
  • Hierarchical Graphs: Combining relationship graphs and hierarchies.
  • Deterministic Program:
    • Most general way to represent 3D structures.
    • Can output any of the previous representations.
    • Beneficial for making patterns clear and allowing easy editing.

Methodologies for Learning Generative Models

Synthesis Methods Flowchart

  • Constraint-Based Program Synthesizer: Best for few training examples. Finds the minimum cost program satisfying certain constraints.
  • Classic Probabilistic Models: Suitable for larger datasets (but not large enough for deep learning).
    • Probabilistic Graphical Models (e.g., Bayesian network, Markov random field): Best for content with a fixed structure.
    • Probabilistic Context-Free Grammar (PCFG): Better for varying structures.
      • Context-Free Grammar (CFG): Used in Natural Language Processing (NLP).
        • Consists of a start symbol, a set of terminals, a set of non-terminals, and rules mapping non-terminals to another layout.
        • Example:
          • Non-terminal: F
          • Terminals: Left arrow, Right arrow, Leaf node
          • Derived tree (sentence) contains only terminals.
      • PCFG: Augments CFG with probabilities for each rule.
        • Probability of a derived tree is the product of applied rule probabilities.
      • Suitability: Well-suited for dynamic model structures due to dynamic recursive nature.
  • Deep Neural Networks (DNNs): Often the best choice when a large amount of training data is available.
    • Autoregressive Model: Iteratively consumes its output from one iteration as input to the next.
      • Example: Inserting one object at a time to generate an indoor scene.
      • Weakness: Prone to drift; errors in one step can cause subsequent outputs to diverge.
    • Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs):
      • Popular in recent years.
      • Sample over a low-dimensional latent space.
      • Learn a generator that maps latent vectors to 3D shapes.
      • Use a global latent variable to control generation.
      • Trained with a reconstruction loss between input and generated output.
      • Advantages: Often outperform autoregressive models in global coherence.
    • Network Suitability: Different neural networks perform better with specific structured data representations.

Application: Visual Program Induction

  • Definition: Synthesizing a plausible program that creates 3D content.
  • Process: Recovering the generator program from existing 3D shapes.
  • Early Examples: Reconstructing 3D shapes via simple geometric primitives.
  • More Recent Work:
  • Visual Program Induction from 2D Images:
    • Inferring programs that generate 2D diagrams from hand-drawn sketches.
    • Using inferred programs for downstream tasks like image editing.
    • https://github.com/paschalidoud/superquadric_parsing
  • Benefits: Efficient and flexible scene manipulation.

About Me:

I’m Christian Mills, a deep learning consultant specializing in practical AI implementations. I help clients leverage cutting-edge AI technologies to solve real-world problems.

Interested in working together? Fill out my Quick AI Project Assessment form or learn more about me.