Skip to content

Conversation

@garrett361
Copy link
Owner

@garrett361 garrett361 commented Jun 27, 2025

Adds a sync_each_batch argument for optionally synchronizing gradients after every batch when using gradient accumulation. Specifying --sync_each_batch True can dramatically reduce memory costs when using FSDP/DeepSpeed. The default is False, preserving the upstream open-instruct behavior.

PR Title
1 #15 padding-free
2 #16 clean_checkpoints_at_end
3 #17 final_lr_ratio
4 #18 add_seed_and_date_to_run_name
5 #19 additional_model_arguments
6 >20 sync_each_batch=True grad acc
7 #21 no grad acc averaging for sum losses
8 #22 extra reporting
9 #23 local_main_process_first when building dataset

@garrett361 garrett361 changed the title [6/9] WIP: sync_each_batch=True grad acc [6/9] sync_each_batch=True grad acc Jun 27, 2025
@garrett361 garrett361 force-pushed the padding-free-squashing-5 branch from cd865ba to 30fb7a6 Compare June 27, 2025 19:23
@garrett361 garrett361 force-pushed the padding-free-squashing-6 branch from 99016c3 to bfd3b00 Compare June 27, 2025 19:23
@garrett361 garrett361 force-pushed the padding-free-squashing-5 branch from 30fb7a6 to d97e9da Compare June 27, 2025 20:21
@garrett361 garrett361 force-pushed the padding-free-squashing-6 branch from bfd3b00 to 342d32a Compare June 27, 2025 20:22
@garrett361 garrett361 force-pushed the padding-free-squashing-5 branch from d97e9da to ea160f0 Compare June 27, 2025 20:48
@garrett361 garrett361 force-pushed the padding-free-squashing-6 branch from 342d32a to 0ea1143 Compare June 27, 2025 20:48
@garrett361 garrett361 force-pushed the padding-free-squashing-5 branch from ea160f0 to 7efb8d3 Compare June 27, 2025 20:54
@garrett361 garrett361 force-pushed the padding-free-squashing-6 branch from 0ea1143 to b866596 Compare June 27, 2025 20:54
@garrett361 garrett361 force-pushed the padding-free-squashing-5 branch from 7efb8d3 to 45a23bb Compare June 27, 2025 21:17
@garrett361 garrett361 force-pushed the padding-free-squashing-6 branch from b866596 to 2d8996b Compare June 27, 2025 21:17
@garrett361 garrett361 force-pushed the padding-free-squashing-5 branch from 45a23bb to acae0d7 Compare June 27, 2025 21:19
@garrett361 garrett361 force-pushed the padding-free-squashing-6 branch from 2d8996b to 053c928 Compare June 27, 2025 21:19
@garrett361 garrett361 force-pushed the padding-free-squashing-5 branch from acae0d7 to 525b935 Compare June 28, 2025 01:41
@garrett361 garrett361 force-pushed the padding-free-squashing-6 branch from 053c928 to ab7a54b Compare June 28, 2025 01:41
@garrett361 garrett361 force-pushed the padding-free-squashing-5 branch from 525b935 to fd7d49d Compare June 28, 2025 01:50
@garrett361 garrett361 force-pushed the padding-free-squashing-6 branch from ab7a54b to f5e19fa Compare June 28, 2025 01:50
@garrett361 garrett361 force-pushed the padding-free-squashing-5 branch 4 times, most recently from c807b6a to 1d85428 Compare June 30, 2025 20:49
@garrett361 garrett361 changed the base branch from padding-free-squashing-5 to main July 1, 2025 13:08
prev-branch: padding-free-squashing-5
@garrett361 garrett361 force-pushed the padding-free-squashing-6 branch from f5e19fa to 7a3d3dd Compare July 1, 2025 13:11
@garrett361
Copy link
Owner Author

I'm going to make this configurable with the previous behavior as the default

@garrett361 garrett361 changed the title [6/9] sync_each_batch=True grad acc [6/9] sync_each_batch option for grad accumulation Jul 1, 2025
@fabianlim fabianlim merged commit 670e9d4 into main Jul 1, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants