Skip to content

Conversation

@garrett361
Copy link
Owner

@garrett361 garrett361 commented Jun 27, 2025

Adds proper scaling for the backward pass when using gradient aggregation and a "sum" loss.

PR Title
1 #15 padding-free
2 #16 clean_checkpoints_at_end
3 #17 final_lr_ratio
4 #18 add_seed_and_date_to_run_name
5 #19 additional_model_arguments
6 #20 sync_each_batch=True grad acc
7 >21 no grad acc averaging for sum losses
8 #22 extra reporting
9 #23 local_main_process_first when building dataset

@garrett361 garrett361 changed the title [7/9] WIP: no grad acc averaging for sum losses [7/9] no grad acc averaging for sum losses Jun 27, 2025
@garrett361 garrett361 force-pushed the padding-free-squashing-6 branch from 99016c3 to bfd3b00 Compare June 27, 2025 19:23
@garrett361 garrett361 force-pushed the padding-free-squashing-7 branch from 02cef67 to a873266 Compare June 27, 2025 19:23
@garrett361 garrett361 force-pushed the padding-free-squashing-6 branch from bfd3b00 to 342d32a Compare June 27, 2025 20:22
@garrett361 garrett361 force-pushed the padding-free-squashing-7 branch from a873266 to dfb0866 Compare June 27, 2025 20:22
@garrett361 garrett361 force-pushed the padding-free-squashing-6 branch from 342d32a to 0ea1143 Compare June 27, 2025 20:48
@garrett361 garrett361 force-pushed the padding-free-squashing-7 branch from dfb0866 to bb745f7 Compare June 27, 2025 20:48
@garrett361 garrett361 force-pushed the padding-free-squashing-6 branch from 0ea1143 to b866596 Compare June 27, 2025 20:54
@garrett361 garrett361 force-pushed the padding-free-squashing-7 branch from bb745f7 to 7e3a6cd Compare June 27, 2025 20:54
@garrett361 garrett361 force-pushed the padding-free-squashing-6 branch from b866596 to 2d8996b Compare June 27, 2025 21:17
@garrett361 garrett361 force-pushed the padding-free-squashing-7 branch from 7e3a6cd to 95f1240 Compare June 27, 2025 21:17
@garrett361 garrett361 force-pushed the padding-free-squashing-6 branch from 2d8996b to 053c928 Compare June 27, 2025 21:19
@garrett361 garrett361 force-pushed the padding-free-squashing-7 branch from 95f1240 to 7985fce Compare June 27, 2025 21:19
@garrett361 garrett361 force-pushed the padding-free-squashing-6 branch from 053c928 to ab7a54b Compare June 28, 2025 01:41
@garrett361 garrett361 force-pushed the padding-free-squashing-7 branch from 7985fce to 8d5c39b Compare June 28, 2025 01:41
prev-branch: padding-free-squashing-3
prev-branch: padding-free-squashing-4
prev-branch: padding-free-squashing-5
@garrett361 garrett361 force-pushed the padding-free-squashing-6 branch from ab7a54b to f5e19fa Compare June 28, 2025 01:50
@garrett361 garrett361 force-pushed the padding-free-squashing-7 branch from 8d5c39b to a4d460b Compare June 28, 2025 01:50
prev-branch: padding-free-squashing-6
@garrett361 garrett361 force-pushed the padding-free-squashing-7 branch from a4d460b to 96c734b Compare June 28, 2025 02:23
@garrett361 garrett361 force-pushed the padding-free-squashing-6 branch from f5e19fa to 7a3d3dd Compare July 1, 2025 13:11
@garrett361
Copy link
Owner Author

I'm a little skeptical about this commit. It's needed for strict correctness, but it would change the dynamics of upstream's training. Not sure what is best.

@fabianlim fabianlim changed the base branch from padding-free-squashing-6 to main July 1, 2025 15:06
@fabianlim
Copy link
Collaborator

ok lets hold off on this first

@garrett361 garrett361 marked this pull request as draft July 1, 2025 16:13
@garrett361 garrett361 closed this Jul 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants