Skip to content

Latest commit

 

History

History
166 lines (125 loc) · 5.93 KB

File metadata and controls

166 lines (125 loc) · 5.93 KB

Visual GenAI | Exam Topics


1. Divergences

  • KL divergence and Fisher divergence
  • Forward KL vs Reverse KL: mode-covering vs mode-seeking behavior

2. Diffusion Model Formulations

2.1 Discrete-Time Variational Diffusion (DDPM)

  • Denoising Diffusion Probabilistic Models formulation
  • Training objective and ELBO derivation
  • Conditional trick
  • Sampling procedure
  • Variance-preserving (VP) schedules
  • Connection to VAEs

2.2 Discrete-Time Score-Based Diffusion (DSM)

  • Denoising Score Matching formulation
  • Training objective and optimal solution
  • Conditional trick
  • Tweedie’s formula. Ideal Denoiser
  • Sampling procedure
  • Motivation for multiple noise levels
  • Variance-exploding (VE) schedules

2.3 Continuous-Time Diffusion (SDE/ODE)

  • SDE and ODE formulations
  • Wiener process (Brownian motion)
  • Fokker–Planck and continuity equations. Why do we need them?
  • Continuous-time schedule derivation

2.4 Flow Matching (FM)

  • FM formulation
  • Training objective and optimal solution
  • Conditional trick
  • Sampling procedure
  • Linear schedule
  • Advantages over continuous diffusion formulations

3. Sampling Methods

3.1 Numerical Solvers

  • Euler, Heun
  • DDIM, DPM-Solver
  • Single-step vs multi-step (Adams–Bashforth)
  • Connections across diffusion formulations

3.2 Timestep Schedules

  • Linear and cosine schedules
  • EDM and SD3 schedules
  • Shift selection strategies=

3.3 Guidance Methods

4. Training Design Choices

4.1 Parameterizations and Losses

  • ε-, x₀-, and v-prediction
  • Conversion between parameterizations
  • Corresponding loss functions

(Table 1. JiT but for the 1 --> 0 process)

4.2 Timestep Sampling

  • Training timestep distributions (Uniform, Logit-normal)
  • Resolution-dependent shift selection

4.3 Representation Alignment

5. Architectures and Conditioning

5.1 Core Architectures

  • UNet (SDXL)
  • Diffusion Transformers (DiT)
  • Multimodal DiT (MM-DiT)

5.2 Conditioning Mechanisms

  • Timestep conditioning
  • Class and text conditioning
  • Adapter-based conditioning:

5.3 Latent vs Pixel-Space Diffusion

  • Latent diffusion models
  • Trade-offs: learnability vs reconstruction quality
  • Representation Autoencoders (RAE)

6. Few-step Models

Expectations:

  • Training and sampling procedures
  • Pros and cons for each variant
  • Training from scratch vs distillation
  • High-level connections between flow-map approaches
  • Key difference between distribution matching and flow-map approaches

6.1 Flow-Map Models

6.2 Distribution Matching

7. Autoregressive Models

Expectations: understanding training and sampling procedures, and pros and cons for each variant.

8. Extensions Beyond Images

Expectations: understanding high-level model designs and ideas, and their pros and cons.

8.1 Video Diffusion

8.2 Multimodal Models

  • Multi-Modal Large Language Models (MLLM), aka pure AR models for text and images
  • Unified AR for text + diffusion for images (Bagel, TransFusion)
  • VLM encoder + diffusion decoder (Qwen-image)

8.3 3D Generative Models

  • 2D diffusion for 3D training (DMD-like for 3D):
    • Score Distillation Sampling (SDS)
    • Variational Score Distillation (VSD)
  • Multi-view diffusion architectures (SEVA)

Materials