跳到主要内容
  1. 博客/

Generative Adversarial Networks

6 分钟·

GANs #

Overview

overview of gans
outer resources:

Original GAN #

\(L_{Discriminator} = \nabla_{\theta_d}\frac{1}{m}\displaystyle\sum_{i=1}^{m}[\log{D(x^{(i)})} + \log{(1-{D(G(z^{(i)}))})}]\)
\(L_{Generator} = \nabla_{\theta_g}\frac{1}{m}\displaystyle\sum_{i=1}^{m}\log{(1-{D(G(z^{(i)}))})}\)

Objective Function: #

The goal is to maximize the loss of \(D\) and minimize the loss of \(G\)

  1. Original minmax(zero-sum)
    \begin{align} \displaystyle\min_G \displaystyle\max_D {V(D,G)} &= E_{x\thicksim{p_{data}(x)}}[\log{D(x)}] + E_{z\thicksim{p_z(z)}}[\log{(1-D(G(z)))}] \end{align}

    Which can also be represented in JS divergence format:
    \begin{align} C(G) = 2JS(p_{data}\parallel p_g)-2\log2 \end{align} not provide a sufficient gradient for G to learn well

  2. Non-saturating
    The non-saturating function results in a larger gradients in early learning process.
    \begin{align} E_{x\thicksim{p_g(x)}}[-log{D_G^*(x)}] + E_{x\thicksim{p_g(x)}}[\log{(1-D_G^*(x))}] = KL(p_g\parallel p_{data}) \end{align}

    Mode collapse: G will prefer to produce repititions but safe examples.

  3. Miximum Likelihood
    Under the assumption that the discriminator is optimal, minimizing:
    \begin{alignat}{2} J^{(G)} &= E_{z\thicksim p_z(z)}[-exp(\sigma^{-1}(D(G(z))))]\ &= E_{z\thicksim p_z(z)}[\frac {-D(G(z))} {1-D(G(z))}] \end{alignat}

objective function comparison

Loss Variant #

  1. WGAN -Wasserstein GANs

    • use the Wasserstein distance as the optimization criterion
    • objective function: \(W(p_{data},p_g) = \displaystyle\inf_{\gamma \in \prod(p_{data},p_g)}E_{(x,y)\in\gamma}[\parallel x-y\parallel]\) where \(\prod(p_{data}, p_g)\) denotes the set of all joint distributions \(\gamma(x,y)\) whose marginals are \(p_{data}\) and \(p_g\)
    • innovation:
      the distence moving from real data and generated data
      binary classifier –> regression task
    • key features:
      remove the last sigmoid activation layer
      remove ’log’ function in the loss
      weight clipping:enforce the discriminator to be a 1-Lipschitz function by clipping its weight to a fxed range.
      replace momentum based optimizer ‘Adam’ with RMSProp or SGD
  2. WGAN-GP -Wasserstein GANs -Gradient norm penalty

    • use a gradient penalty to achieve Lipschitz continuity.
    • Objective Function: \(L = -E_{x\thicksim p_{data}}[D(x)]+E_{\tilde{x}\thicksim p_g}[D(\tilde{x})] + \lambda E_{\tilde{x}\thicksim p_{\tilde{x}}}[(\parallel\nabla_{\tilde{x}}D(\tilde{x})\parallel_2-1)^2]\) The last one is the added penalty item.
    • Innovation: This penalty encourages the gradients of the critic with respect to its inputs to have a norm of 1, which helps stabilize the training process and prevent mode collapse.
  3. RGAN -Relativisitic GANs:

    • The idea is to endow GANs with the property that the probability of real data being real \(D(x_r)\) should decrease as the probability of fake data being real \(D(x_f)\) increases
    • Objective Function: \(D(\tilde{x}) = \sigma(C(x_r)-C(x_g))\)
    • Innovation: make the output of D depend on both real and generated examples
    • Key features:
      • provide continuous measure of the quality of the generated data, by using the relativistic discriminator as the loss function
      • improve the stability and robustness of the training process, by avoiding the problems of vanishing gradients, mode collapse, and non-convergence
  4. F-GAN -f-divergences

    • Use a general class of divergence functions as the optimization criterion.
    • Innovation: provide a unified framework for different GAN variants comparision GAN, WGAN, LSGAN and so on by showing equivalence to different f-divergences.
    • Key features:
      • can use any f-divergence: KL-divergence, JS-divergence, Total Variation Distance and so on.
      • improve the quality and diversity of the generated data by choosing the appropriate f-divergence that suits the data characteristics and the model objectives

Architecture Variant #

Network Architecture #

Original GANs use multi-layer perceptrons(MLPs), so only for small datasets.

  1. DCGAN -deep convolutional generative adversarial networks

    • innovation: G and D are defined by deep convolutional neural networks(DCNNs)
    • key features:
      all-convolution net;
      batch normaliztion except the last layer;
      use Adam optimizer instead of SGD
  2. PROGAN -Progressive Growing of GANs

    • The idea is to grow both the generator and discriminator progressively: starting from a low resolution, we add new layers that model increasingly fine details as training progresses.
    • innovation: progressively growing training approach
    • key features:
      Gradually increasing the resolution
      Minibatch Discrimination
      Pixel-wise normalization, spectral normalization
      orthogonal regularization
  3. SAGAN: self-attention GAN

    • SAGAN stands for Self-Attention Generative Adversarial Network, which is a variant of GANs that uses self-attention mechanism to capture long-range dependencies in images and generate high-quality and high-resolution samples.
    • innovation: Self-attention mechanism
    • key features:
      spectral normalization: improves diversity
      orthogonal regularization: stability
      conditional batch normalization: adjust the style and features

Latent Space #

  1. CGAN -Conditional GANs

    • D and G are conditioned on extra information y: \(\displaystyle\min_G \displaystyle\max_D V(D,G)\) \(= E_{x\thicksim{p_{data}(x)}}[\log{D(x|y)}] + E_{z\thicksim{p_z(z)}}[\log{(1-D(G(z|y)))}]\)
    • innovation: G and D need to match a given condition
    • key features:
      generate data matches a given condition
      learn a conditional distribution, more informative and useful
      can apply to various types of tasks
  2. INFOGAN - information maximizing GAN

    • \( \displaystyle\min_G \displaystyle\max_D V_I(D,G) = V(D,G) -\lambda I(c;G(z,c))\), G(z,c) is the generated example, I is the mutual infomation. Maximizeing I maximizes the mutual information between c and G, causing c to contain as many important and meaningful features of the real examples as possible.
    • innovation: latent factors
    • key features:
      Unsuperviesed manner
      Discover and munipulate latent factors of semantic attributes
      lower bound to approximate \(P(c|x)\)
  3. ACGAN -Auxiliary Classifier GANs

    • incorporate a classifier as part of the discriminator, produce recogmozable examples
    • Objective functions:
      The log-likelihood of the correct source \(L_s\) and correct class \(L_c\), the aim is to maximize \(L_c+L_s\) and \(L_c-L_s\)
      \(L_s = E[\log P(S=real|X_{real})] + E[\log P(S=fake|X_{fake})]\)
      \(L_c = E[\log P(C=c|X_{real})] + E[\log P(C=c|X_{fake})]\)
      innovation: generate data matches the given class label
      key features:
      add conditional information y into the discriminator
      learn conditional distribution of the data
      leverage the supervised infomation from the class labels and unsupervised information from the GAN object
  4. SGAN -Stacked GAN

    • use a top-down stack of GANs to generate data from hierarchical representations
    • innovation: progressively adding finer details at each layer of the stack
    • key features:
      capture different levels of abstraction and variation(more diverse representation)
      can leverage the pre-trained discriminative network(VGG, ResNet) without additional supervision

Application Focus #

  1. SRGAN -Super-Resolution

    • use a perceptual loss function to generate high-resolution images from low-resolution images
    • innovation: upscaling factors to infer photo-realistic natural images
    • key features:
      • The perceptual loss function consists of an adversarial loss and a content loss, based on the pre-trained VGG network.
      • capture the fine textures and details of the natural images
      • upscaling: use a deep residual network with skip connections and sub-pixel convolution layers
    • further improvement: ESRGAN(Enhanced SRGAN)
      cycle-in-cycle GANs(unsupervised image SR)
      SRDGAN(learn noise)
      TGAN(explore tensor structure)
  2. CycleGAN

    • image-to-image translation between unpaired domains
      cycleGAN example
    • innovation: Use a cycle consistency loss that enforces the generator to reconstruct the original img from the translated img.
    • key features:
      cycleGAN
      • no paired examples needed
      • preserve the key attributes and structures of the input imgs
      • learn a mapping function that is bijective and invertible
    • application:
      • Style Transfer, object transfiguration, photo enhancement
  3. StyleGAN

    • use adaptive instance normalization to control the style and features of the generated images at different scales
    • innovation: learn an unsupervised separation of high-level attributes
    • key features:
      • It can enable intuitive and scale-specific control of the synthesis, by manipulating the style vectors that correspond to different levels of detail.
      • add noise makes it more detailed
        StyleGAN structure
  4. Pix2Pix

    • use a conditional GAN objective combined with a reconstruction loss, which are based on the features extracted by a pre-trained VGG network.
    • innovation: translate images from one domain to another using paired examples like edges or maps
    • key features:
      • preserve the key attributes and structures of the input images
      • use a U-Net-Based generator with skip connections
        pix2pix U-net
      • use a patchGAN-based discriminator
        pix2pix patch
    • applications:
      • style transfer, object transfiguration, photo enhancement, and more