Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number of samples when minibatch is less than 100? #7

Open
speakerjohnash opened this issue Apr 18, 2017 · 2 comments
Open

Number of samples when minibatch is less than 100? #7

speakerjohnash opened this issue Apr 18, 2017 · 2 comments

Comments

@speakerjohnash
Copy link

https://arxiv.org/pdf/1312.6114.pdf

http://blog.fastforwardlabs.com/2016/08/22/under-the-hood-of-the-variational-autoencoder-in.html

"We would like to make parameter updates using small mini-batches or even single data-points"

"In our experiments we found that the number of samples L per datapoint can be set to 1 as long as the minibatch size M was large enough, e.g. M = 100"

There's no indication in the original paper or your blog what to do when the side of the batch is as small as 1 despite that being referenced as a desiderata of the paper.

Am I missing something? What is the strategy when for various reasons you wish not to mini-batch?

@speakerjohnash
Copy link
Author

My intuition is this, that only the combination of the samples from the batch should form a normal distribution not individual points. So I'm not sure if there's a way to do this without batching.

@speakerjohnash
Copy link
Author

I've found that this is actually an incredibly important hyperparameter for success. Because the the combination of the batch needs to form a normal distribution the more complex the data the larger the batch needs to be.

Smaller batches with more complex datasets results in generation of content with only the most common attributes from every training document. So in the case of text, smaller batches results in generation of mostly white space if you can minimize the KL or an inability to minimize KL at all while minimizing the reconstruction cost.

This question plagued me for weeks and no one addressed it, but from what I've seen, batch size is absolutely key to success relative to the dataset with VAEs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant