-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support quant ckpt limit strategy #9494
Conversation
Thanks for your contribution! |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #9494 +/- ##
===========================================
+ Coverage 52.91% 53.10% +0.19%
===========================================
Files 688 694 +6
Lines 109331 110989 +1658
===========================================
+ Hits 57848 58940 +1092
- Misses 51483 52049 +566 ☔ View full report in Codecov by Sentry. |
|
||
# Quantization times exceeds the limit. Turn off the quantization strategy. | ||
if quant_ckpt_resume_times > MAX_QUANTIZATION_TIMES: | ||
ckpt_quant_stage = "O0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
把这个开关的修改写在这里感觉不太对?MAX_QUANTIZATION_TIMES主要是限制你保存为压缩checkpoint的次数,所以应该把 ckpt_quant_stage 的修改同步到 save逻辑,加载这里改了也没有作用吧?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
最后这里还是有点疑问,这里是加载optimizer逻辑,ckpt_quant_stage = "O0"不应该有外界的改变,而是直接通过checkpoint保存的index来读取。
# save opt index json if checkpoint quantization is on. | ||
if self.args.ckpt_quant_stage != "O0": | ||
sharded_optim_index = {"ckpt_quant_stage": self.args.ckpt_quant_stage} | ||
if self.args.ckpt_quant_stage != "O0" and "quant_reach_limit" not in infohub: |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
@@ -257,7 +267,7 @@ def save_non_merge_optimizer(self, model, optim_state_dict, master_weights, outp | |||
signal_path=signal_dir, | |||
is_sync=is_sync_save, | |||
state_dict_type="optimizer_weight", | |||
ckpt_quant_stage=self.args.ckpt_quant_stage, | |||
ckpt_quant_stage=self.args.ckpt_quant_stage if "quant_reach_limit" not in infohub else "O0", |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
@@ -379,7 +389,7 @@ def save_unified_optimizer(self, model, optimizer, output_dir, signal_dir): | |||
signal_path=signal_dir, | |||
is_sync=is_sync_save, | |||
state_dict_type="optimizer_weight", | |||
ckpt_quant_stage=self.args.ckpt_quant_stage, | |||
ckpt_quant_stage=self.args.ckpt_quant_stage if "quant_reach_limit" not in infohub else "O0", |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* support quant ckpt limit strategy * bug fix * bug fix * fix bug * add log, fix bug Conflicts: paddlenlp/utils/env.py
PR types
PR changes
Description
支持 resume 压缩 ckpt 上限控制