Skip to content

Conversation

@garrett361
Copy link
Owner

@garrett361 garrett361 commented Jun 27, 2025

Adds --additional_model_arguments for passing additional args to models (e.g. for z-loss or NoPE)

PR Title
1 #15 padding-free
2 #16 clean_checkpoints_at_end
3 #17 final_lr_ratio
4 #18 add_seed_and_date_to_run_name
5 >19 additional_model_arguments
6 #20 sync_each_batch=True grad acc
7 #21 no grad acc averaging for sum losses
8 #22 extra reporting
9 #23 local_main_process_first when building dataset

@garrett361
Copy link
Owner Author

garrett361 commented Jun 27, 2025

@fabianlim we have this as a additional_model_arguments: Optional[list[str]] which is then turned into a dict via regex. I think we can just do additional_model_arguments: Optional[dict]?

An then like --additional_model_arguments '{"key": value}'? Not sure.

@garrett361 garrett361 force-pushed the padding-free-squashing-4 branch from f205d24 to 780f13a Compare June 27, 2025 20:21
@garrett361 garrett361 force-pushed the padding-free-squashing-5 branch from 30fb7a6 to d97e9da Compare June 27, 2025 20:21
@garrett361 garrett361 force-pushed the padding-free-squashing-4 branch from 780f13a to 8bd01d8 Compare June 27, 2025 20:48
@garrett361 garrett361 force-pushed the padding-free-squashing-5 branch from d97e9da to ea160f0 Compare June 27, 2025 20:48
@garrett361 garrett361 force-pushed the padding-free-squashing-4 branch from 8bd01d8 to aa6e4b7 Compare June 27, 2025 20:54
@garrett361 garrett361 force-pushed the padding-free-squashing-5 branch from ea160f0 to 7efb8d3 Compare June 27, 2025 20:54
@garrett361 garrett361 force-pushed the padding-free-squashing-4 branch from aa6e4b7 to 380cd14 Compare June 27, 2025 21:17
@garrett361 garrett361 force-pushed the padding-free-squashing-5 branch from 7efb8d3 to 45a23bb Compare June 27, 2025 21:17
@garrett361 garrett361 force-pushed the padding-free-squashing-4 branch from 380cd14 to f5460c2 Compare June 27, 2025 21:19
@garrett361 garrett361 force-pushed the padding-free-squashing-5 branch from 45a23bb to acae0d7 Compare June 27, 2025 21:19
@garrett361 garrett361 force-pushed the padding-free-squashing-4 branch from f5460c2 to d9a59b6 Compare June 28, 2025 01:41
@garrett361 garrett361 force-pushed the padding-free-squashing-5 branch from acae0d7 to 525b935 Compare June 28, 2025 01:41
@garrett361 garrett361 force-pushed the padding-free-squashing-4 branch from d9a59b6 to 5b827c0 Compare June 28, 2025 01:48
@garrett361 garrett361 force-pushed the padding-free-squashing-5 branch from 525b935 to fd7d49d Compare June 28, 2025 01:50
@garrett361 garrett361 force-pushed the padding-free-squashing-4 branch from 5b827c0 to 16a9336 Compare June 30, 2025 18:33
@garrett361 garrett361 changed the base branch from padding-free-squashing-4 to main June 30, 2025 19:29
@garrett361 garrett361 force-pushed the padding-free-squashing-5 branch from fd7d49d to fc3c0b3 Compare June 30, 2025 19:29
@garrett361 garrett361 force-pushed the padding-free-squashing-5 branch 2 times, most recently from ec1fec9 to c807b6a Compare June 30, 2025 20:49
prev-branch: padding-free-squashing-4
@garrett361 garrett361 force-pushed the padding-free-squashing-5 branch from c807b6a to 1d85428 Compare June 30, 2025 20:49
@garrett361 garrett361 force-pushed the padding-free-squashing-5 branch from 070042a to 47c512b Compare June 30, 2025 21:12
@garrett361
Copy link
Owner Author

also fixed an import that needs fixing before the tests can run

@fabianlim fabianlim merged commit a9a6dd0 into main Jun 30, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants