1. Generative Adversarial Networks
Generative Adversarial Networks (GANs) are a class of neural network frameworks designed for generative modeling by pitting two neural networks against each other in a game-theoretic setting. The two networks are the generator and the discriminator. The generator creates fake data samples, while the discriminator evaluates them against real data samples, providing feedback to the generator. This adversarial process drives both networks to improve, resulting in the generation of high-quality synthetic data.
1.1. Adversarial Training
The training process of GANs involves alternating between training the discriminator and the generator. The discriminator is trained to maximize its ability to distinguish between real and fake samples, while the generator is trained to minimize the discriminator’s ability to classify its samples as fake.
-
Discriminator Training: The discriminator is presented with a batch of real samples from the training dataset and a batch of fake samples generated by the generator. It learns to output high probabilities for real samples and low probabilities for fake samples.
-
Generator Training: The generator is then trained to produce samples that the discriminator classifies as real. This is done by feeding random noise into the generator and updating its weights based on the discriminator’s feedback.
This adversarial process continues until the generator produces high-quality samples that are indistinguishable from real data, or until a predefined number of training iterations is reached.
The training objective is as follow:
\[\min_{G} \max_{D} \left \{\mathbb{E}_{x \sim p_{data}(x)}[\log D(x)] + \mathbb{E}_{z \sim p_{z}(z)}[\log(1 - D(G(z)))]\right \}\]2. Understanding GAN (Minimizing Jensen-Shannon Divergence)
From a general perspective, generative models aim to approximate the data distribution $p(x)$, and then sample points form it as generating new data points. The key idea of generative models is using a parameterized neural network (generator) to approximate $p(x)$, where the generator can be modeled as a distribution mapping function, which maps sample form simple distribution (e.g., standard Guassian) to a complex distribution $q(x)$. So, we can approximate the data distribution by minimizing the ‘difference’ between $q(x)$ and $p(x)$, where the ‘difference’ of two distributions can be measured by their divergences, such as Kullback–Leibler Divergence.
In the case of GANs, we will proof that, optimizing the training objective is equal to minimizing the Jensen–Shannon Divergence of $q(x)$ (distribution of generator) and $p(x)$ (data distribution).
Proof:
We rewrite the training objective as follow:
\[\begin{aligned} \mathcal{L} &= \mathbb{E}_{x \sim p(x)}[\log D(x)] + \mathbb{E}_{x \sim q(x)}[\log(1 - D(x))] \\ &= \int_x p(x) \log(D(x)) + q(x) \log(1-D(x))dx.\\ \end{aligned}\]Since $\mathcal{L}$ is concave in $D$, we calculate the first-order derivative:
\[\frac{\partial \mathcal{L}}{\partial D} = \int_x \frac{p(x)}{D(x)} - \frac{q(x)}{1-D(x)} dx\]let $\frac{\partial \mathcal{L}}{\partial D} = 0$, we have the optimal $D^* = \frac{p(x)}{p(x) + q(x)}$, we plugging $D^*$ back into the training objective:
\[\begin{aligned} \mathcal{L} &= \int_x p(x) \log(\frac{p(x)}{p(x) + q(x)}) + q(x) \log(1-\frac{p(x)}{p(x) + q(x)})dx\\ &= \int_x p(x) \log(\frac{p(x)}{p(x) + q(x)}) +p(x) \log(2) + q(x) \log(\frac{q(x)}{p(x) + q(x)}) + q(x) \log(2)dx - \log(4)\\ &= 2 \cdot D_{JS}(p(x)||q(x)) - \log(4). \end{aligned}\]We can see that, the process of training a GAN can be interpreted as minimizing the Jensen-Shannon Divergence between the real data distribution and the generated data distribution. More specifically, maximizing the loss is equivalent to train a discriminator that used to estimate the JS Divergence of real and fake distributions (as accurate as possible), and minimizing the loss is equivalent to training the generator to minimize this JS divergence (estimated by the discriminator).