Learning rate for training #21

jacksonsc007 · 2024-07-10T10:05:46Z

Question

Hi, @xiuqhou
Thanks for your enlightening work. I came across some questions while reproducing your work.

How many gpus did you use to train the model?
Need I change the initial learning rate if I adopt different total batchsize (num_gpus * batchsize_per_gpu). Is there a policy to make final performance impermeable to the total batchsize?

In my personal experiments, model performance are not consistent among different total_batch_size. I experimented with 1x2 (1gpu, 2 images per gpu) and 4x4 (4gpus, 4 images per gpu) settings, and the initial learning rate is same. But results show that there is a non-trivial gaps between them. (4x4 setting lags behind 1x2 setting with 2 AP)

Best regards

Additional

No response

The text was updated successfully, but these errors were encountered:

xiuqhou · 2024-07-10T13:41:12Z

Hello @jacksonsc007 , thank you for your question.

We use 2 * A800 gpus to train the model. The batch_size on each gpu is 5, so the total batch_size is 10. The learning rate is set to 1e-4.
There are two policies to change the learning rate according to the total batchsize. If batch_size increases K times, you can set lr to sqrt(k) times to keep the variance unchanged, or set it to k times according to the linear scaling raw. In practice, the latter is more commonly used.

We use lr=1e-4 for total_batch_size=10, so you should use lr=1.6e-4 for total_batch_size=16 and lr=2e-5 for total_batch_size=2 to achieve a close performance.

jacksonsc007 · 2024-07-10T13:49:36Z

Thanks for your prompt reply. I will try your suggestions and report the results later.

By the way, could you show me the relative papers for the learning rate rule you just mentioned?

xiuqhou · 2024-07-10T14:04:26Z

The linear scaling rule is mentioned in this paper: https://arxiv.org/abs/1706.02677

jacksonsc007 added the question Further information is requested label Jul 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Learning rate for training #21

Learning rate for training #21

jacksonsc007 commented Jul 10, 2024

xiuqhou commented Jul 10, 2024 •

edited

Loading

jacksonsc007 commented Jul 10, 2024

xiuqhou commented Jul 10, 2024

Learning rate for training #21

Learning rate for training #21

Comments

jacksonsc007 commented Jul 10, 2024

Question

Additional

xiuqhou commented Jul 10, 2024 • edited Loading

jacksonsc007 commented Jul 10, 2024

xiuqhou commented Jul 10, 2024

xiuqhou commented Jul 10, 2024 •

edited

Loading