allow specifying LR schedule in terms of tokens #411

epwalsh · 2024-01-17T19:22:49Z

This PR allows us to specify the LR schedule in terms of tokens instead of steps. For example, just change this:

scheduler:
  name: linear_with_warmup
  t_warmup: 5000
  t_max: 476837
  alpha_f: 0.1
  grad_clip_warmup_steps: 1000
  grad_clip_warmup_factor: 10.0

To this:

 scheduler:
   name: linear_with_warmup
+  units: tokens
-  t_warmup: 5000
-  t_max: 476837
+  t_warmup: 2e10
+  t_max: 2e12
   alpha_f: 0.1
-  grad_clip_warmup_steps: 1000
+  grad_clip_warmup_steps: 4e9
   grad_clip_warmup_factor: 10.0

The above two configurations are equivalent given a constant batch size after restarts, but the latter allows us to continue the same LR schedule while changing the batch size without any additional config changes.

This is backwards compatible, so you can make this change to your config and still restart from an older checkpoint.

2015aroras · 2024-01-17T21:23:17Z

olmo/train.py

+                return int(float(self.cfg.max_duration[:-1].strip()))
+            elif self.cfg.max_duration.endswith("ep"):
+                max_epochs = int(self.cfg.max_duration[:-2].strip())
+                return max_epochs * self.batches_per_epoch * self.tokens_per_batch


Correct me if I'm wrong, but (maybe barring some weird edge cases) it seems that the conversion from steps to tokens (and from self.global_step to self.global_train_tokens_seen) is multiplying by self.tokens_per_batch. If that is the case, then it may be more readable if you make self.max_tokens return self.max_steps * self.tokens_per_batch, or some equivalent code.

I don't think that works if batch size has changed at some point.

dirkgr

Looks good. Did you test this in any way?

epwalsh · 2024-01-17T22:32:22Z

Looks good. Did you test this in any way?

@dirkgr, no, but I will before I merge.

epwalsh · 2024-01-18T04:08:15Z

Confirmed it's working as expected after a restart with 2x batch size. https://wandb.ai/ai2-llm/olmo-small-test?workspace=user-epwalsh

allow specifying LR schedule in terms of tokens

319fe5b

epwalsh requested a review from dirkgr January 17, 2024 19:24

Fix

9d9e5a7

epwalsh requested a review from 2015aroras January 17, 2024 19:27

2015aroras reviewed Jan 17, 2024

View reviewed changes

dirkgr approved these changes Jan 17, 2024

View reviewed changes

2015aroras approved these changes Jan 17, 2024

View reviewed changes

epwalsh added 4 commits January 17, 2024 15:55

add configs for test

a8f3f82

allow float types

5b39657

update config

e2825f1

update

409ff32

epwalsh added 3 commits January 17, 2024 20:08

update

0ed8dd2

clean up

9da6dbd

fixes for mypy

9477cfa

epwalsh merged commit dcae8e8 into main Jan 18, 2024

epwalsh deleted the epwalsh/lr-schedule-tokens branch January 18, 2024 19:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allow specifying LR schedule in terms of tokens #411

allow specifying LR schedule in terms of tokens #411

epwalsh commented Jan 17, 2024 •

edited

Loading

2015aroras Jan 17, 2024 •

edited

Loading

epwalsh Jan 17, 2024

dirkgr left a comment

epwalsh commented Jan 17, 2024

epwalsh commented Jan 18, 2024

allow specifying LR schedule in terms of tokens #411

allow specifying LR schedule in terms of tokens #411

Conversation

epwalsh commented Jan 17, 2024 • edited Loading

2015aroras Jan 17, 2024 • edited Loading

Choose a reason for hiding this comment

epwalsh Jan 17, 2024

Choose a reason for hiding this comment

dirkgr left a comment

Choose a reason for hiding this comment

epwalsh commented Jan 17, 2024

epwalsh commented Jan 18, 2024

epwalsh commented Jan 17, 2024 •

edited

Loading

2015aroras Jan 17, 2024 •

edited

Loading