Skip to content

Latest commit

 

History

History
67 lines (52 loc) · 3.3 KB

Improved_Techniques_for_Training_GANs.md

File metadata and controls

67 lines (52 loc) · 3.3 KB

Improved Techniques for Training GANs

I think this paper is mostly known for Virtual Batch Normalization. Ian Goodfellow motivates the need for that in his tutorial:

Batch normalization is very helpful, but for GANs has a few unfortunate side effects. The use of a different minibatch of data to compute the normalization statistics on each step of training results in fluctuation of these normalizing constants. When minibatch sizes are small (as is often the case when trying to fit a large generative model into limited GPU memory) these fluctuations can become large enough that they have more effect on the image generated by the GAN than the input z has.

As evidence, he shows two minibatches of data generated (from the GAN). One minibatch has orange-tinted samples, the other has green-tinted samples. Thus within each minibatch, generated samples are correlated, whereas they should be independent.

The solution with virtual batch normalization is that one keeps a new reference minibatch of samples. Then, for each new minibatch, the mean and variance of each feature is computed based on the union of the reference and current minibatches. I see, does this mean we can reduce fluctuation just by keeping a larger reference minibatch? We can smooth it out by adjusting the ratio between current and reference data.

I don't think it's too challenging to implement virtual batch normalization in TensorFlow, as long as batch normalization is already there. Because we can simply change the dataset that we feed into the network, I assume.

The paper has other contributions as well, such as:

  • Feature Matching: This changes the objective for the generator to maximize some specified features, instead of the discriminator's loss. The features should be designed by the discriminator.

  • Minibatch Discrimination: I'm not sure I understand this, to be honest. It's basically about letting the discriminator look at multiple samples at once? I'll re-read this again.

  • Historical Averaging: This adds an extra objective to the discriminator based on the difference between the current weights and the historical weights. I think this serves to prevent weights from diverging too much, kind of like TRPO's KL Divergence constraint.

  • One-Sided Label Smoothing: This was also in Ian Goodfellow's tutorial. It smoothes the labels from 1 to 0.9 (for example), which is useful in many cases.

(Note: the five techniques above are the five major "training techniques" proposed. And also, these are heuristically motivated, so they are not guaranteed to work for all problems.)

In addition, they present a better evaluation metric, called the Inception score. This is very important. I remember when I was learning about GANs and wondering how to evaluate output beyond an eyeball test. This relies on an already-existing Inception network architecture (I really need to learn more about the network architecture beyond AlexNet) to predict scores. Their evaluation metric is based on an entropy-like quantity.

Note the focus on semi-supervised learning here, where they improve supervised learning by training on unlabeled data. We can augment the cross entropy loss formulation by adding these unlabeled data points into a new class with index K+1.

Experiments: MNIST, CIFAR, SHVN, ImageNet.

Very nice!