You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"We would like to make parameter updates using small mini-batches or even single data-points"
"In our experiments we found that the number of samples L per datapoint can be set to 1 as long as the minibatch size M was large enough, e.g. M = 100"
There's no indication in the original paper or your blog what to do when the side of the batch is as small as 1 despite that being referenced as a desiderata of the paper.
Am I missing something? What is the strategy when for various reasons you wish not to mini-batch?
The text was updated successfully, but these errors were encountered:
My intuition is this, that only the combination of the samples from the batch should form a normal distribution not individual points. So I'm not sure if there's a way to do this without batching.
I've found that this is actually an incredibly important hyperparameter for success. Because the the combination of the batch needs to form a normal distribution the more complex the data the larger the batch needs to be.
Smaller batches with more complex datasets results in generation of content with only the most common attributes from every training document. So in the case of text, smaller batches results in generation of mostly white space if you can minimize the KL or an inability to minimize KL at all while minimizing the reconstruction cost.
This question plagued me for weeks and no one addressed it, but from what I've seen, batch size is absolutely key to success relative to the dataset with VAEs.
https://arxiv.org/pdf/1312.6114.pdf
http://blog.fastforwardlabs.com/2016/08/22/under-the-hood-of-the-variational-autoencoder-in.html
"We would like to make parameter updates using small mini-batches or even single data-points"
"In our experiments we found that the number of samples L per datapoint can be set to 1 as long as the minibatch size M was large enough, e.g. M = 100"
There's no indication in the original paper or your blog what to do when the side of the batch is as small as 1 despite that being referenced as a desiderata of the paper.
Am I missing something? What is the strategy when for various reasons you wish not to mini-batch?
The text was updated successfully, but these errors were encountered: