-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BucketingSampler more randomness? #364
Comments
https://tensorboard.dev/experiment/h0VXvnqfR8aabI3HJxHxYA/ |
OK, just a sanity check — did you pass in the arguments |
Yes. The related code is shown below: if self.args.bucketing_sampler:
logging.info("Using BucketingSampler.")
train_sampler = BucketingSampler(
cuts_train,
max_duration=self.args.max_duration,
shuffle=True,
num_buckets=self.args.num_buckets,
) for epoch in range(params.start_epoch, params.num_epochs):
train_dl.sampler.set_epoch(epoch) The training options from the log file are:
|
In that case, I will double-check if the shuffling works as intended in each bucket of the BucketingSampler. BTW I didn’t notice the |
There is another factor that could cause this sawtooth pattern, which is that the buckets have the same number of samples, but you consume them in batches that depend on the duration, so the shorter buckets will be consumed first. It might be better to compute the cumulative sum of the durations, and split at percentiles of that, if that is not already what split() there does. |
It doesn’t do it. I can make a PR with that option later and let’s see then. |
I think it is unused and is left to the default value, which is |
It is. See: https://github.com/lhotse-speech/lhotse/blob/master/test/dataset/test_sampling.py#L658
… Wiadomość napisana przez Fangjun Kuang ***@***.***> w dniu 8/10/21, o godz. 08:16:
BTW I didn’t notice the drop_last=True option for BucketingSampler in what you posted — was it used?
I think it is unused and is left to the default value, which is False.
By the way, the drop_last is.not.exposed by lhotse, I think.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
|
BTW, if each bucket keeps track of the amount of data it has left (however we define that , e.g. duration or samples), a relatively easy way to implement approximate proportional sampling would be to, whenever the next batch is requested, choose 2 nonempty buckets a and b, and pick from bucket a with probability (a.dur() / (a.dur() + b.dur()), else bucket b. That would ensure that all the buckets would become empty at exactly the same time. Otherwise, some of the buckets will start to deplete approximately [sqrt(num_batches_in_a_bucket) * num_buckets] batches before the end of the epoch even if they have been balanced at the start. |
I'm not sure I understand why some buckets would deplete sooner. With the new method, each bucket has the same duration of speech -- since the batches are gathered to satisfy a max total duration, it implies that each bucket should yield the same number of batches. And since the buckets are chosen with uniform probabilities, they should all deplete at the same time. Still, the method you're proposing could be useful as a source selection strategy for a "mux" sampler (that I'm planning to add soon) so that it avoids depleting its source samplers prematurely. |
My training crashed with batch size mismatching, there may be some bugs with The log showed that the returned If I did not set
|
Okay, thanks for reporting. I’m not sure what might have went wrong at the moment — I’ll try to reproduce this issue. Is this LibriSpeech 100h / 960h? |
Piotr: my formula was just taking into account the statistics of how random
counts with equal probability are not quite the same in practice.
…On Thu, Aug 12, 2021 at 10:28 AM Piotr Żelasko ***@***.***> wrote:
Okay, thanks for reporting. I’m not sure what might have went wrong at the
moment — I’ll try to reproduce this issue. Is this LibriSpeech 100h / 960h?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#364 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLOZ5ZWK2325ICKM2FF3T4MWUTANCNFSM5B3S3YAQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email>
.
|
I had a feeling that I’m missing something.. that’s a cool observation. Let me see if I got it right. Since at every step the bucket selection probabilities are uniform, there is a relatively „high” probability that at least one of them will deplete „early” (for some definition of „high” and „early”). And in the expectation „early” is (sqrt(num_bucket_batches) * num_buckets). |
LibriSpeech 960h, you could reproduce this issue by running the training script in the icefall conformer_ctc folder. |
yes. The factor of num_buckets is simply because i want to measure the time in minibatches before end-of-epoch for the training process, not minibatches per bucket. |
@pzelasko fangjun told me the batch size mismatching is a bug in masking (posted here k2-fsa/snowfall#240). It won't trigger the bug when setting We will try to fix the bug in masking. |
Thanks for letting me know! |
After adding these two options, the sawtooth patterns in losses disappeared. The picture below shows the loss value of the first 3 epochs. The training is on going, we will see if this helps to reduce the WER, will post results later. |
I am using The information of the batch is printed below:
[EDITED]
|
It is likely due to padding: the sampler counts the duration of non padded cuts (=speech only duration for librispeech). Then inside dataset the padding happens and adds extra duration (also true with transforms such as cut concatenate). |
I see. Thanks! BTW: I use |
Can I close this issue? |
Yes, I think so. There are no oscillations anymore after using this fix. |
@pzelasko we are still seeing sawtooth patterns in losses when we use BucketingSampler, even with fewer buckets.
I think it's because the individual buckets are sorted by length. Is it possible to shuffle somehow within the buckets, or, say, always randomly pick a batch from the front or back of the bucket? Or perhaps the individual buckets could either be reversed, or not-reversed, randomly or alternately.
The text was updated successfully, but these errors were encountered: