We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3G大小的数据预训练,loss值下降很慢,checkpoint测试的表现是没有学习到新知识
#! /bin/bash export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:32 GPUS_PER_NODE=1 NNODES=1 MASTER_ADDR="localhost" MASTER_PORT=12345 OPTS="" # model and dataset settings OPTS+=" --model-config config/cpm-bee-10b.json" OPTS+=" --dataset ../datasets/datasets.json" # training settings OPTS+=" --train-iters 20000" OPTS+=" --batch-size 8" OPTS+=" --max-length 2048" OPTS+=" --lr 0.001" OPTS+=" --warmup-iters 2000" OPTS+=" --lr-decay-style noam" OPTS+=" --weight-decay 0.01" OPTS+=" --clip-grad 1.0" OPTS+=" --loss-scale 1048576" OPTS+=" --loss-scale-factor 2" OPTS+=" --loss-scale-steps 128" # log settings OPTS+=" --inspect-iters 100" OPTS+=" --log-dir ../logs/train/" OPTS+=" --tensorboard ../logs/tensorboard/cpm_live_48_4096/" # saving ckpts OPTS+=" --save-iters 500" OPTS+=" --save ../pretrain_results_0920/" OPTS+=" --save-name cpm_live_checkpoint" # loading ckpts MODEL_STEPS="0" OPTS+=" --start-step ${MODEL_STEPS}" OPTS+=" --load /data/models/cpm-bee-10b/pytorch_model.bin" #OPTS+=" --load-grad " #OPTS+=" --deepspeed ds_config.json " CMD="torchrun --nnodes=${NNODES} --nproc_per_node=${GPUS_PER_NODE} --rdzv_id=1 --rdzv_backend=c10d --rdzv_endpoint=${MASTER_ADDR}:${MASTER_PORT} pretrain_cpm_bee.py ${OPTS} " echo ${CMD} $CMD
预训练数据是从百度百科、维基百科抓取的IT相关的数据
测试结果:
问题:系统是什么? 回答:'系统是什么? 。系统是什么?'
问题:贾宝玉是谁? 回答:贾宝玉是谁? 。贾宝玉是谁?
问题:介绍一下三国演义 回答:介绍一下三国演义\n《三国演义》,又名《三国志通俗演义》、《三国志平话》、《三国志传》等,是中国古典四大名著之一,作者为罗贯中,成书年代为元末明初。
小数据量,大约200M的小数据量是可以正常回答的。
The text was updated successfully, but these errors were encountered:
No branches or pull requests
3G大小的数据预训练,loss值下降很慢,checkpoint测试的表现是没有学习到新知识
预训练数据是从百度百科、维基百科抓取的IT相关的数据
测试结果:
问题:系统是什么?
回答:'系统是什么? 。系统是什么?'
问题:贾宝玉是谁?
回答:贾宝玉是谁? 。贾宝玉是谁?
问题:介绍一下三国演义
回答:介绍一下三国演义\n《三国演义》,又名《三国志通俗演义》、《三国志平话》、《三国志传》等,是中国古典四大名著之一,作者为罗贯中,成书年代为元末明初。
小数据量,大约200M的小数据量是可以正常回答的。
The text was updated successfully, but these errors were encountered: