ZMIC Journal Club

Generative Image Dynamics

张杨 // School of Data Science, Fudan University
2024-09-21
Generative Image Dynamics
ZMIC Journal Club

Paper Info

  • CVPR 2024 Best Paper (2 out of 2719)
  • Zhengqi Li, Richard Tucker, Noah Snavely and Aleksander Holynski from Google Research
  • Link to Live Demo
Paper info
ZMIC Journal Club

Related Works

  • Animating Pictures with Stochastic Motion Textures (Yung-Yu Chuang, et al. 2005)
  • Image-space Modal Bases for Plausible Manipulation of Objects in Video (Abe Davis, Justin G. Chen and Frédo Durand. 2015)
  • Visual Vibration Analysis (Abe Davis. 2016)
Related Works
ZMIC Journal Club

How Related?

  • Abe Davis, Noah Snavely and Zhengqi Li are all from Cornell Graphics and Vision Group.

  • A demo for Plausible Manipulation of Objects in Video :

center

  • GID is some kind of deep learning version of the work above.
Related Works
ZMIC Journal Club

Framework

  • Input: a single still image.
  • M1: Generate spectral volume <== LDM ~ IFFT
  • M2: Render seamless looping video <== Softmax splatting
  • M3: Setup an interactive dynamic scene <== Davis' work
Framework
ZMIC Journal Club

Optical Flow

  • Motion of brightness parrern in the image.

Background
ZMIC Journal Club

Donguri wave illusion


Optical flow != Motion field

団栗(ドングリ) More details check this link

Background
ZMIC Journal Club

Quasi-periodic Motion

We only consider videos featuring oscillatory motions such as those of trees, flowers, or candle flames swaying in the breeze.

Background
ZMIC Journal Club

Spectral Volume

FFT of quasi-periodic optical flow.

Formulation:

  • Video: for each pixel .
  • Motion texture:
    • Displacement(optical flow): in time domain.
  • Spectral volume:
    • in frequency domain.
Methods
ZMIC Journal Club

M1: Generate Spectral Volume

  • Employ latent diffusion model (LDM) as the backbone for the motion prediction module
  • Techniques:
    • Frequency adaptive normalization
    • Frequency-coordinated denoising(multi channel output)
Methods
ZMIC Journal Club

M2: Render Video

  • Refer to: Niklaus, Simon, and Feng Liu. "Softmax splatting for video frame interpolation." Proceedings of the IEEE/CVF conference on CVPR. 2020.

Methods
ZMIC Journal Club

M3: Interactive Dynamics

  • Refer to: Davis, Abe, Justin G. Chen, and Frédo Durand. "Image-space modal bases for plausible manipulation of objects in video." ACM Transactions on Graphics (TOG) 34.6 (2015): 1-7.

  • Spectral volume can approximate an image-space modal basis that is a projection of the vibration modes of the underlying scene and can be used to simulate the object’s response to a user-defined force.

where are complex modal coordinates.

Methods
ZMIC Journal Club

Comparison Experiments(quantitative)

Experiments
ZMIC Journal Club

Comparison Experiments(qualitative)

Experiments
ZMIC Journal Club

Comparison Exps(with large video model)

  • On a randomly selected 30 videos from the test set, we ask users "which video is more realistic?". Users report a 80.9% preference for our approach over others.

Experiments
ZMIC Journal Club

Ablation Experiments

Experiments
ZMIC Journal Club

Limitations

  • Can fail to model nonoscillating motions or high-frequency vibrations
  • May degrade in scenes with thin moving objects or objects with large displacements

Experiments
ZMIC Journal Club

THANKS

Experiments