Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Olmo tiny scripts #628

Merged
merged 46 commits into from
Jun 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
f530361
olmo tiny with ddp configs
ananyahjha93 Jun 18, 2024
2fa2551
.
ananyahjha93 Jun 18, 2024
62968ca
final scripts for OLMo tiny
ananyahjha93 Jun 18, 2024
3a3c64f
final scripts for OLMo tiny
ananyahjha93 Jun 18, 2024
6f766da
updated config based on comments
ananyahjha93 Jun 19, 2024
18a9592
updated config based on comments
ananyahjha93 Jun 19, 2024
c3483f0
updated config based on comments
ananyahjha93 Jun 19, 2024
fd32714
updated config based on comments
ananyahjha93 Jun 19, 2024
d4d39c1
updated config based on comments
ananyahjha93 Jun 19, 2024
c5c8100
updated config based on comments
ananyahjha93 Jun 19, 2024
76ce0f8
.
ananyahjha93 Jun 19, 2024
2d07c10
.
ananyahjha93 Jun 19, 2024
564902c
script
ananyahjha93 Jun 19, 2024
aec289b
script
ananyahjha93 Jun 19, 2024
5d225f4
.
ananyahjha93 Jun 19, 2024
7d0e4f5
.
ananyahjha93 Jun 19, 2024
cac1430
.
ananyahjha93 Jun 19, 2024
4f4bc74
fix n_layers in 300M
ananyahjha93 Jun 19, 2024
a0f4663
grad norm 2, 20M
ananyahjha93 Jun 20, 2024
e51c0a9
added new config for tiny runs
ananyahjha93 Jun 24, 2024
89f588a
added FLOPs logging
ananyahjha93 Jun 24, 2024
00e4a65
changelog
ananyahjha93 Jun 24, 2024
fc47a4d
black
ananyahjha93 Jun 24, 2024
6a00cf4
config
ananyahjha93 Jun 24, 2024
d719277
added config for 750M
ananyahjha93 Jun 24, 2024
252f470
type
ananyahjha93 Jun 24, 2024
25e1704
updated config to run on pluto
ananyahjha93 Jun 24, 2024
e512582
Update olmo/train.py
ananyahjha93 Jun 24, 2024
6effc3e
fixed bug where flops count will not accumulate after resume
ananyahjha93 Jun 24, 2024
2caad7e
60M run
ananyahjha93 Jun 24, 2024
ed6b9a6
updated model and config from Pete's branch
ananyahjha93 Jun 25, 2024
08a5f91
60M run
ananyahjha93 Jun 25, 2024
db23930
added back flops calc
ananyahjha93 Jun 25, 2024
dbd0a87
20M run on pluto
ananyahjha93 Jun 25, 2024
13a4d52
60M run
ananyahjha93 Jun 25, 2024
c8e65f8
20M run on pluto
ananyahjha93 Jun 25, 2024
c9768da
60M run
ananyahjha93 Jun 25, 2024
1b52914
20M run on pluto
ananyahjha93 Jun 25, 2024
5dbe772
60M run
ananyahjha93 Jun 25, 2024
71621a1
150M run on jupiter
ananyahjha93 Jun 26, 2024
8dba3b3
.
ananyahjha93 Jun 27, 2024
07b6d4a
300M
ananyahjha93 Jun 27, 2024
3330468
.
ananyahjha93 Jun 28, 2024
943e090
Merge branch 'main' into olmo-tiny
ananyahjha93 Jun 28, 2024
ce582c9
Merge branch 'olmo-tiny' of ssh://github.com/allenai/OLMo into olmo-tiny
ananyahjha93 Jun 28, 2024
b988efb
added back first_batch
ananyahjha93 Jun 28, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Expose memmap dtype in data config
- Added support for DDP training.
- Added caching to disk of HF datasets used in downstream evals
- Added FLOPs logging
- Added configs for OLMo tiny set of models

### Changed

Expand Down
1,269 changes: 1,269 additions & 0 deletions configs/tiny/OLMo-150M.yaml

Large diffs are not rendered by default.

1,269 changes: 1,269 additions & 0 deletions configs/tiny/OLMo-20M.yaml

Large diffs are not rendered by default.

103 changes: 70 additions & 33 deletions configs/tiny/OLMo-300M.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ dry_run: false

wandb:
name: ${run_name}
project: tiny_olmo
project: olmo-tiny

model:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No DDP section in this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is!

d_model: 1024
Expand All @@ -14,80 +14,85 @@ model:
weight_tying: false
alibi: false
rope: true
flash_attention: true # not available on AMD
flash_attention: true
attention_dropout: 0.0
attention_layer_norm: false
multi_query_attention: true
n_kv_heads: 1
clip_qkv: 8.0
clip_qkv: null
include_bias: false
block_type: sequential
layer_norm_type: default
layer_norm_with_affine: false
layer_norm_type: rms
layer_norm_with_affine: true
layer_norm_eps: 1e-6
bias_for_layer_norm: false
attention_layer_norm_with_affine: false
activation_type: swiglu
residual_dropout: 0.0
embedding_dropout: 0.0
max_sequence_length: 2048
max_sequence_length: 4096
vocab_size: 50280
embedding_size: 50304
eos_token_id: 50279
eos_token_id: 0
pad_token_id: 1
init_device: cuda
init_fn: normal
init_std: 0.02
init_cutoff_factor: 3

ddp:
grad_sync_mode: batch
find_unused_params: false

compile: null # causes instability on AMD GPUs
compile: null

optimizer:
name: adamw
learning_rate: 6.0e-4
weight_decay: 0.1
eps: 1e-8
decay_norm_and_bias: true
decay_embeddings: false
betas:
- 0.9
- 0.95
metrics_log_interval: 10

scheduler:
name: cosine_with_warmup
t_warmup: 2000
t_warmup: 5000
alpha_f: 0.1
warmup_min_lr: 0

tokenizer:
identifier: tokenizers/allenai_eleuther-ai-gpt-neox-20b-pii-special.json
identifier: tokenizers/allenai_gpt-neox-olmo-dolma-v1_5.json
truncate_direction: right

save_folder: workspace/${run_name} # doesn't matter since we'll upload to S3
remote_save_folder: s3://allennlp-ananyaj/olmo-tiny/300M/${run_name}
remote_save_folder: s3://ai2-llm/checkpoints/olmo-tiny/${run_name}
save_overwrite: false
# Sharded checkpoints (best for restarts)
save_interval: 5000
save_num_checkpoints_to_keep: 3

# Unsharded checkpoints (for ddp)
save_interval_unsharded: 5000
save_num_unsharded_checkpoints_to_keep: 3
save_num_unsharded_checkpoints_to_keep: -1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does -1 do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 is for keeping all checkpoints, but I'll double check


load_path: null

max_duration: 100_000 # 419B tokens, this is for the scheduler
stop_at: 100_000
global_train_batch_size: 2048
device_train_microbatch_size: 8
max_duration: 1ep
stop_at: 406_934
global_train_batch_size: 1024
device_train_microbatch_size: 4

precision: amp_bf16
distributed_strategy: ddp

gen1_gc_interval: 1

max_grad_norm: 1.0
max_grad_norm_ratio: null

speed_monitor:
window_size: 20

eval_interval: 1000
eval_interval: 5000
eval_subset_num_batches: -1
device_eval_batch_size: ${device_train_microbatch_size}
evaluators:
Expand Down Expand Up @@ -133,30 +138,22 @@ evaluators:

- label: openbook_qa
type: downstream

- label: boolq
type: downstream

- label: sciq
type: downstream

- label: arc_easy
type: downstream

- label: arc_challenge
type: downstream

- label: copa
type: downstream

- label: commonsense_qa
type: downstream

- label: social_iqa
type: downstream

- label: basic_arithmetic
type: downstream
# Doesn't work from cache.
# - label: basic_arithmetic
# type: downstream

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's wrong with these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, basic_arithmetic should be in, others don't provide any signal based on my experience

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah this was commented out saying

# Doesn't work from cache.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should work with cache v4

- label: mmlu_stem_var
type: downstream
Expand All @@ -170,6 +167,42 @@ evaluators:
- label: mmlu_other_var
type: downstream

- label: mmlu_stem_mc_5shot
type: downstream

- label: mmlu_humanities_mc_5shot
type: downstream

- label: mmlu_social_sciences_mc_5shot
type: downstream

- label: mmlu_other_mc_5shot
type: downstream

- label: mmlu_stem_mc_5shot_test
type: downstream

- label: mmlu_humanities_mc_5shot_test
type: downstream

- label: mmlu_social_sciences_mc_5shot_test
type: downstream

- label: mmlu_other_mc_5shot_test
type: downstream

- label: basic_arithmetic
type: downstream

- label: trivia_qa_wiki_ppl
type: downstream

- label: natural_qs_open_ppl
type: downstream

- label: arc_easy_ppl
type: downstream

data:
pad_direction: right
num_workers: 32
Expand All @@ -178,6 +211,10 @@ data:
prefetch_factor: 8
persistent_workers: true
timeout: 0
instance_filter:
repetition_max_period: 13
repetition_min_period: 1
repetition_max_count: 32
paths:
######### NON WEB DATA #########
# ~> GUTENBERG BOOKS (5.256 GT)
Expand Down
Loading
Loading