You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[BUG/Help] <title>按照官方给出的多轮问答数据集构建问答数据之后,运行脚本命令出现Traceback (most recent call last): File "/mnt/ChatGLM2-6B/ptuning/main.py", line 411, in <module> main() File "/mnt/ChatGLM2-6B/ptuning/main.py", line 229, in main train_dataset = train_dataset.map( File "/root/anaconda3/envs/GLM2/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 592, in wrapper out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs) File "/root/anaconda3/envs/GLM2/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 557, in wrapper out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs) File "/root/anaconda3/envs/GLM2/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 3180, in map with Pool(len(kwargs_per_job)) as pool:
#666
Open
1 task done
nevesaynever1 opened this issue
Feb 22, 2024
· 0 comments
运行上述脚本出现下述错误:
Traceback (most recent call last):
File "/mnt/ChatGLM2-6B/ptuning/main.py", line 411, in
main()
File "/mnt/ChatGLM2-6B/ptuning/main.py", line 229, in main
train_dataset = train_dataset.map(
File "/root/anaconda3/envs/GLM2/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 592, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/root/anaconda3/envs/GLM2/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 557, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/root/anaconda3/envs/GLM2/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 3180, in map
with Pool(len(kwargs_per_job)) as pool:
时数据格式的原因吗,排查了一下,和官方构建的数据格式是一样的。
运行上述脚本出现下述错误:
Traceback (most recent call last):
File "/mnt/ChatGLM2-6B/ptuning/main.py", line 411, in
main()
File "/mnt/ChatGLM2-6B/ptuning/main.py", line 229, in main
train_dataset = train_dataset.map(
File "/root/anaconda3/envs/GLM2/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 592, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/root/anaconda3/envs/GLM2/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 557, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/root/anaconda3/envs/GLM2/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 3180, in map
with Pool(len(kwargs_per_job)) as pool:
时数据格式的原因吗,排查了一下,和官方构建的数据格式是一样的。
Environment
- OS: centos
- Python:3.9
- Transformers:4.30.2
- PyTorch:2.0.1
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) : CU118
Anything else?
No response
The text was updated successfully, but these errors were encountered:
Is there an existing issue for this?
Current Behavior
数据集构建格式为:{"content": "长城h3风扇不转。继电器好的。保险丝好的传感器新的风扇也新的这是为什么。就是继电器缺一个信号线", "summaty": "用电脑能读数据流吗?水温多少", "history": []}
{"content": "95", "summaty": "上下水管温差怎么样啊?空气是不是都排干净了呢?", "history": [["长城h3风扇不转。继电器好的。保险丝好的传感器新的风扇也新的这是为什么。就是继电器缺一个信号线", "用电脑能读数据流吗?水温多少"]]}
{"content": "是的。上下水管都好的", "summaty": "那就要检查线路了,一般风扇继电器是由电脑控制吸合的,如果电路存在断路,或者电脑坏了的话会出现继电器不吸合的情况!", "history": [["长城h3风扇不转。继电器好的。保险丝好的传感器新的风扇也新的这是为什么。就是继电器缺一个信号线", "用电脑能读数据流吗?水温多少"], ["95", "上下水管温差怎么样啊?空气是不是都排干净了呢?"]]}
{"content": "你好", "summaty": "你好", "history": []}
{"content": "请问你是谁?", "summaty": "我是chatGLm2-6B", "history": [["你好", "你好"]]}
{"content": "你是一个优秀的人工智能助手吗?", "summaty": "是的,我是。", "history": [["你好", "你好"], ["请问你是谁?","我是chatGLm2-6B"]]}
train.sh脚本文件为:
PRE_SEQ_LEN=128
LR=1e-2
CUDA_VISIBLE_DEVICES=0 python main.py
--do_train
--train_file train.json
--validation_file dev.json
--preprocessing_num_workers 10
--prompt_column content
--response_column summary
--history_column history
--overwrite_cache
--model_name_or_path THUDM/chatglm2-6b
--output_dir ../output
--overwrite_output_dir
--max_source_length 256
--max_target_length 256
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 16
--predict_with_generate
--max_steps 3000
--logging_steps 10
--save_steps 1000
--learning_rate $LR
--pre_seq_len $PRE_SEQ_LEN \
运行上述脚本出现下述错误:
Traceback (most recent call last):
File "/mnt/ChatGLM2-6B/ptuning/main.py", line 411, in
main()
File "/mnt/ChatGLM2-6B/ptuning/main.py", line 229, in main
train_dataset = train_dataset.map(
File "/root/anaconda3/envs/GLM2/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 592, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/root/anaconda3/envs/GLM2/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 557, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/root/anaconda3/envs/GLM2/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 3180, in map
with Pool(len(kwargs_per_job)) as pool:
时数据格式的原因吗,排查了一下,和官方构建的数据格式是一样的。
Expected Behavior
No response
Steps To Reproduce
数据集构建格式为:{"content": "长城h3风扇不转。继电器好的。保险丝好的传感器新的风扇也新的这是为什么。就是继电器缺一个信号线", "summaty": "用电脑能读数据流吗?水温多少", "history": []}
{"content": "95", "summaty": "上下水管温差怎么样啊?空气是不是都排干净了呢?", "history": [["长城h3风扇不转。继电器好的。保险丝好的传感器新的风扇也新的这是为什么。就是继电器缺一个信号线", "用电脑能读数据流吗?水温多少"]]}
{"content": "是的。上下水管都好的", "summaty": "那就要检查线路了,一般风扇继电器是由电脑控制吸合的,如果电路存在断路,或者电脑坏了的话会出现继电器不吸合的情况!", "history": [["长城h3风扇不转。继电器好的。保险丝好的传感器新的风扇也新的这是为什么。就是继电器缺一个信号线", "用电脑能读数据流吗?水温多少"], ["95", "上下水管温差怎么样啊?空气是不是都排干净了呢?"]]}
{"content": "你好", "summaty": "你好", "history": []}
{"content": "请问你是谁?", "summaty": "我是chatGLm2-6B", "history": [["你好", "你好"]]}
{"content": "你是一个优秀的人工智能助手吗?", "summaty": "是的,我是。", "history": [["你好", "你好"], ["请问你是谁?","我是chatGLm2-6B"]]}
train.sh脚本文件为:
PRE_SEQ_LEN=128
LR=1e-2
CUDA_VISIBLE_DEVICES=0 python main.py
--do_train
--train_file train.json
--validation_file dev.json
--preprocessing_num_workers 10
--prompt_column content
--response_column summary
--history_column history
--overwrite_cache
--model_name_or_path THUDM/chatglm2-6b
--output_dir ../output
--overwrite_output_dir
--max_source_length 256
--max_target_length 256
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 16
--predict_with_generate
--max_steps 3000
--logging_steps 10
--save_steps 1000
--learning_rate $LR
--pre_seq_len $PRE_SEQ_LEN \
运行上述脚本出现下述错误:
Traceback (most recent call last):
File "/mnt/ChatGLM2-6B/ptuning/main.py", line 411, in
main()
File "/mnt/ChatGLM2-6B/ptuning/main.py", line 229, in main
train_dataset = train_dataset.map(
File "/root/anaconda3/envs/GLM2/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 592, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/root/anaconda3/envs/GLM2/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 557, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/root/anaconda3/envs/GLM2/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 3180, in map
with Pool(len(kwargs_per_job)) as pool:
时数据格式的原因吗,排查了一下,和官方构建的数据格式是一样的。
Environment
Anything else?
No response
The text was updated successfully, but these errors were encountered: