Skip to content

Conversation

@runame
Copy link
Contributor

@runame runame commented Jul 31, 2025

Currently, the first time validation metrics are computed is when step == job_config.validation.freq. I think it is preferable to always compute them for the first step as well.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jul 31, 2025
Copy link
Contributor

@tianyu-l tianyu-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this before. I think most desirable thing to do is do validation before step 1 trained, which would cause bigger change to train.py. The code change here would do it after step 1 trained. Is this what you want? (I'm OK with this.)

@runame
Copy link
Contributor Author

runame commented Aug 1, 2025

@tianyu-l I agree that validation before the first step might in principle be desirable, but I don't think it's worth a larger change to train.py. In practice, I think running validation after the first step should almost always be sufficient.

@tianyu-l tianyu-l merged commit 2429e0b into pytorch:main Aug 1, 2025
7 of 8 checks passed
bentherien pushed a commit to bentherien/torchtitan_ that referenced this pull request Aug 5, 2025
Currently, the first time validation metrics are computed is when `step
== job_config.validation.freq`. I think it is preferable to always
compute them for the first step as well.
@runame runame deleted the step-1-validation branch August 8, 2025 06:23
joellidin pushed a commit to one-covenant/torchtitan that referenced this pull request Aug 8, 2025
Currently, the first time validation metrics are computed is when `step
== job_config.validation.freq`. I think it is preferable to always
compute them for the first step as well.
joellidin pushed a commit to one-covenant/torchtitan that referenced this pull request Aug 8, 2025
Currently, the first time validation metrics are computed is when `step
== job_config.validation.freq`. I think it is preferable to always
compute them for the first step as well.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants