support quant ckpt limit strategy #9494

wtmlon · 2024-11-26T03:06:11Z

PR types

PR changes

Description

支持 resume 压缩 ckpt 上限控制

paddle-bot · 2024-11-26T03:06:15Z

Thanks for your contribution!

codecov · 2024-11-26T07:37:18Z

Codecov Report

Attention: Patch coverage is 11.42857% with 31 lines in your changes missing coverage. Please review.

Project coverage is 53.10%. Comparing base (8fd33a9) to head (12d84c6).
Report is 17 commits behind head on develop.

Files with missing lines	Patch %	Lines
...p/trainer/unified_checkpoint/unified_checkpoint.py	5.00%	19 Missing ⚠️
...lp/quantization/unified_checkpoint_quantization.py	15.38%	11 Missing ⚠️
paddlenlp/transformers/model_utils.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #9494      +/-   ##
===========================================
+ Coverage    52.91%   53.10%   +0.19%     
===========================================
  Files          688      694       +6     
  Lines       109331   110989    +1658     
===========================================
+ Hits         57848    58940    +1092     
- Misses       51483    52049     +566

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

DesmonDay · 2024-11-26T11:11:32Z

paddlenlp/trainer/unified_checkpoint/unified_checkpoint.py

+
+        # Quantization times exceeds the limit. Turn off the quantization strategy.
+        if quant_ckpt_resume_times > MAX_QUANTIZATION_TIMES:
+            ckpt_quant_stage = "O0"


把这个开关的修改写在这里感觉不太对？MAX_QUANTIZATION_TIMES主要是限制你保存为压缩checkpoint的次数，所以应该把 ckpt_quant_stage 的修改同步到 save逻辑，加载这里改了也没有作用吧？

最后这里还是有点疑问，这里是加载optimizer逻辑，ckpt_quant_stage = "O0"不应该有外界的改变，而是直接通过checkpoint保存的index来读取。

paddlenlp/trainer/unified_checkpoint/unified_checkpoint.py

        # save opt index json if checkpoint quantization is on.
-        if self.args.ckpt_quant_stage != "O0":
-            sharded_optim_index = {"ckpt_quant_stage": self.args.ckpt_quant_stage}
+        if self.args.ckpt_quant_stage != "O0" and "quant_reach_limit" not in infohub:


paddlenlp/trainer/unified_checkpoint/unified_checkpoint.py

@@ -257,7 +267,7 @@ def save_non_merge_optimizer(self, model, optim_state_dict, master_weights, outp
            signal_path=signal_dir,
            is_sync=is_sync_save,
            state_dict_type="optimizer_weight",
-            ckpt_quant_stage=self.args.ckpt_quant_stage,
+            ckpt_quant_stage=self.args.ckpt_quant_stage if "quant_reach_limit" not in infohub else "O0",


paddlenlp/trainer/unified_checkpoint/unified_checkpoint.py

@@ -379,7 +389,7 @@ def save_unified_optimizer(self, model, optimizer, output_dir, signal_dir):
            signal_path=signal_dir,
            is_sync=is_sync_save,
            state_dict_type="optimizer_weight",
-            ckpt_quant_stage=self.args.ckpt_quant_stage,
+            ckpt_quant_stage=self.args.ckpt_quant_stage if "quant_reach_limit" not in infohub else "O0",


DesmonDay

LGTM

* support quant ckpt limit strategy * bug fix * bug fix * fix bug * add log, fix bug Conflicts: paddlenlp/utils/env.py

support quant ckpt limit strategy

72a8af1

wtmlon added 2 commits November 26, 2024 15:04

bug fix

66ad46d

bug fix

df75ccf

DesmonDay reviewed Nov 26, 2024

View reviewed changes

fix bug

c7af5f7

DesmonDay reviewed Nov 27, 2024

View reviewed changes

add log, fix bug

12d84c6

DesmonDay approved these changes Nov 27, 2024

View reviewed changes

wawltor merged commit 2985f90 into PaddlePaddle:develop Nov 29, 2024
9 of 12 checks passed

wtmlon added a commit to wtmlon/PaddleNLP that referenced this pull request Nov 29, 2024

support quant ckpt limit strategy (PaddlePaddle#9494)

3791735

* support quant ckpt limit strategy * bug fix * bug fix * fix bug * add log, fix bug Conflicts: paddlenlp/utils/env.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support quant ckpt limit strategy #9494

support quant ckpt limit strategy #9494

wtmlon commented Nov 26, 2024

paddle-bot bot commented Nov 26, 2024

codecov bot commented Nov 26, 2024 •

edited

Loading

DesmonDay Nov 26, 2024 •

edited

Loading

DesmonDay Nov 27, 2024

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

DesmonDay left a comment

support quant ckpt limit strategy #9494

support quant ckpt limit strategy #9494

Conversation

wtmlon commented Nov 26, 2024

PR types

PR changes

Description

paddle-bot bot commented Nov 26, 2024

codecov bot commented Nov 26, 2024 • edited Loading

Codecov Report

DesmonDay Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

DesmonDay Nov 27, 2024

Choose a reason for hiding this comment

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

DesmonDay left a comment

Choose a reason for hiding this comment

codecov bot commented Nov 26, 2024 •

edited

Loading

DesmonDay Nov 26, 2024 •

edited

Loading