-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Watermark model slow training (cross-posted from facebookresearch/audioseal) #484
Comments
Hi! , can you paste here your run command so i am sure you are doing it right?
this seems normal to me the batch_size you add as an argument is the effective batch size, is internally divided between all gpus. If i understand correctly it is normal for step/sec to drop if you increase batch size because the step now has more samples to compute. have you tried to plot convergence curves between the bsz?
Original training took from 3-10 days to to obtain good results on 4 gpus machine. But after 20-40 hours you could see it converging already. |
@hadyelsahar |
it will help a lot if you can share your evaluation metrics, you can find them in the dora log directory ./history.json
Note here that in AudioCraft epoch is just a predefined # of steps not the whole training data, we set the default = 2000 steps . so the size of your training data basically doesn't affect the time taken per epoch it just affects the pool of samples that your training comes from.
We don't use the full 400k hours on vox populi we select 5k hours, with which you can find good performance in about 80-100 epochs, we let our run till 200-300 epochs. |
I think the training could be made a bit more efficient indeed, but we have not focused on it that much...
@Comedian1926 , if you want to study the watermark training at a smaller scale, what you can do is focus on some augmentations, and remove the compression ones -- for them, we need to transfer to CPU, save with the new format, load, and transfer back to GPU, so they take a lot of time. What we observed during training is that the detection (and localization) accuracy increases very fast, in 10 epochs or even less. For the rest of the epochs, all metrics increase at a steady rate (notably the audio quality metrics). |
@pierrefdz @hadyelsahar Thank you very much for your reply, it is very useful to me. My previous training mainly had the d_loss between 1.98 and 2, and I feel it did not converge. I am currently restarting the training and will synchronize the log to you, hoping to succeed. Thank you again for your work~ |
@hadyelsahar @pierrefdz |
I found that the pesq operation in [audiocraft/solvers/watermark.py] is very time-consuming, so I skipped it. |
I have encountered a similar issue where my training results are not converging. Specifically, Has anyone found a suitable solution? I would be extremely grateful for any useful suggestions. |
I'd suggest to first try to make things work without any perceptual losses, and see if you manage to make the bit accuracy and the detection go up. Something like:
Then add the rest little by little and adapt the optimization parameters to ensure that the training is able to start. |
Hi!
(This was cross-posted at facebookresearch/audioseal, but wanted to also put here for visibility--thanks!)
Thanks so much for the helpful training code and documentation. Apologies in advance for the naive question--I'm pretty new to machine learning.
I'm trying to train my own watermarking model at 48kHz with my own dataset on an H100 node with 8 GPUs (H100 80GB HBM3) on a remote SLURM cluster, but as I scale the batch size the training speed appears to drop proportionally. There also appears to be an unexpected behavior where I specify
dataset.batch_size=k
but the submitted config (logged by wandb) showsdataset.batch_size=k/8
.As an example, I ran experiments setting
dataset.batch_size=8
, which becamedataset.batch_size=1
, yielding a max training speed of about 1.67 steps / second and GPU utilization reaching averaging around 25%. When I setdataset.batch_size=128
(to yielddataset.batch_size=16
), training speed dropped to around 0.3 steps / second. It seems to me that parallelization isn't working the way it should based on these results?I've tried preprocessing my dataset to one-second clips and removing some of the augmentations (even running an experiment with only noise augmentations) to try to increase GPU utilization, but nothing I've tried has improved the training speed.
Is this to be expected? Roughly how long did the original AudioSeal model take to train, using what amount of compute?
Thank you so much!
The text was updated successfully, but these errors were encountered: