Skip to content

Fix squared optimization steps bug#280

Merged
tomaarsen merged 4 commits intohuggingface:mainfrom
twerkmeister:squared-steps-bug
Jan 23, 2023
Merged

Fix squared optimization steps bug#280
tomaarsen merged 4 commits intohuggingface:mainfrom
twerkmeister:squared-steps-bug

Conversation

@twerkmeister
Copy link
Contributor

Currently, when increasing the number of epochs, the number of steps per epoch is also increased, leading to a number of optimization steps thats roughly (examples / batch size) * epochs * epochs instead of (examples / batch size) * epochs. The proposed change should fix this.

@tomaarsen
Copy link
Member

Hello!

This bug was discovered #268 (comment), and a fix was proposed. It involves simply removing this line:

steps_per_epoch=train_steps,

When steps_per_epoch is not specified, the number of steps per epoch will be equivalent to the number of datapoints in the dataloader. See the docs on the fit method for more info.

The remainder of the code would not need any changes.
Would you agree that this is a preferable solution over the changes that you have proposed? I think it's best if we make a new PR for this.

  • Tom Aarsen

@tomaarsen tomaarsen added the bug Something isn't working label Jan 18, 2023
@twerkmeister
Copy link
Contributor Author

Hey @tomaarsen, I removed the argument as you mentioned. Beyond that, this version also fixes the fact that for the triplet losses self.num_epochs was used to calculate total number of optimization steps for reporting when in fact num_epochs should be used because self.num_epochs can be shadowed by an argument to the train method

@tomaarsen
Copy link
Member

You're very right! Thanks for catching that. I'll merge this once the tests go green 🎉

logger.info(f" Total optimization steps = {len(train_dataloader) * num_epochs}")
logger.info(f" Total train batch size = {batch_size}")

warmup_steps = math.ceil(train_steps * self.warmup_proportion)
Copy link
Member

@tomaarsen tomaarsen Jan 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
warmup_steps = math.ceil(train_steps * self.warmup_proportion)
warmup_steps = math.ceil(train_steps * self.warmup_proportion)

This still uses the train_steps variable that you removed.

@tomaarsen tomaarsen merged commit 29c0348 into huggingface:main Jan 23, 2023
@tomaarsen
Copy link
Member

tomaarsen commented Jan 23, 2023

Wonderful! Thank you for this fix, and for the edits on the PR @twerkmeister!

tomaarsen added a commit to tomaarsen/setfit that referenced this pull request Jan 23, 2023
tomaarsen added a commit that referenced this pull request Jan 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants