From 4c0c1cb68b7d97949236b45dcdb8d99b5feadede Mon Sep 17 00:00:00 2001 From: DesmonDay <908660116@qq.com> Date: Tue, 2 Jan 2024 16:18:04 +0800 Subject: [PATCH] add unified checkpoint training args doc --- docs/trainer.md | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/docs/trainer.md b/docs/trainer.md index b6db8eb215b1..82110d52ab7e 100644 --- a/docs/trainer.md +++ b/docs/trainer.md @@ -661,6 +661,26 @@ Trainer 是一个简单,但功能完整的 Paddle训练和评估模块,并 The path to a folder with a valid checkpoint for your model. (default: None) + --unified_checkpoint + 是否统一混合并行训练的Checkpoint,(可选,默认为False) + Whether to unify hybrid parallel checkpoint. (default: False) + + --unified_checkpoint_config + 与Unified Checkpoint相关的一些优化配置项,以str形式传入配置. + 支持如下选项: + skip_save_model_weight: 当master_weights存在时,是否要保存模型权重. + master_weight_compatible: 1. 仅当optimizer需要master_weights时,才进行加载; 2. 如果checkpoint中不存在master_weights,则将model weight作为master_weights进行加载. + async_save: 在保存Checkpoint至磁盘时做异步保存,不影响训练过程,提高训练效率. + enable_all_options: 上述参数全部开启. + + Some additional config of Unified checkpoint, we provide some options to config. + Following config is support: + skip_save_model_weight, no need to save model weights when the master_weights exist. + master_weight_compatible, 1. if the master_weights exist, only load when needed. + 2. if master_weights does not exist, convert model weights to master_weights when needed. + async_save, enable asynchronous saving checkpoints to disk. + enable_all_options, enable all unified checkpoint optimization configs. + --skip_memory_metrics 是否跳过内存profiler检测。(可选,默认为True,跳过) Whether or not to skip adding of memory profiler reports