[Question]: msra数据集无法加载 #9842

zgrennn · 2025-02-11T08:59:04Z

请提出你的问题

slm/examples/information_extraction/msra_ner中文命名实体识别模型微调过程中遇到问题。
微调代码：
!python -u /home/aistudio/PaddleNLP/examples/information_extraction/msra_ner/train.py
--model_type bert
--model_name_or_path bert-base-multilingual-uncased
--dataset msra_ner
--max_seq_length 128
--batch_size 32
--learning_rate 2e-5
--num_train_epochs 3
--logging_steps 1
--save_steps 500
--output_dir ./tmp/msra_ner/
--device gpu
报错信息：
[2025-02-11 16:58:21,671] [ WARNING] - Detected that datasets module was imported before paddlenlp. This may cause PaddleNLP datasets to be unavalible in intranet. Please import paddlenlp before datasets module to avoid download issues
[2025-02-11 16:58:21,937] [ WARNING] - if you run ring_flash_attention.py, please ensure you install the paddlenlp_ops by following the instructions provided at https://github.com/PaddlePaddle/PaddleNLP/blob/develop/csrc/README.md
[2025-02-11 16:58:23,287] [ INFO] - model_type :bert
[2025-02-11 16:58:23,288] [ INFO] - model_name_or_path :bert-base-multilingual-uncased
[2025-02-11 16:58:23,288] [ INFO] - dataset :msra_ner
[2025-02-11 16:58:23,288] [ INFO] - output_dir :./tmp/msra_ner/
[2025-02-11 16:58:23,288] [ INFO] - max_seq_length :128
[2025-02-11 16:58:23,288] [ INFO] - batch_size :32
[2025-02-11 16:58:23,288] [ INFO] - learning_rate :2e-05
[2025-02-11 16:58:23,288] [ INFO] - weight_decay :0.0
[2025-02-11 16:58:23,288] [ INFO] - adam_epsilon :1e-08
[2025-02-11 16:58:23,288] [ INFO] - max_grad_norm :1.0
[2025-02-11 16:58:23,288] [ INFO] - num_train_epochs :3
[2025-02-11 16:58:23,288] [ INFO] - max_steps :-1
[2025-02-11 16:58:23,288] [ INFO] - warmup_steps :0
[2025-02-11 16:58:23,288] [ INFO] - logging_steps :1
[2025-02-11 16:58:23,289] [ INFO] - save_steps :500
[2025-02-11 16:58:23,289] [ INFO] - seed :42
[2025-02-11 16:58:23,289] [ INFO] - device :gpu
Traceback (most recent call last):
File "/home/aistudio/PaddleNLP/examples/information_extraction/msra_ner/train.py", line 216, in
do_train(args)
File "/home/aistudio/PaddleNLP/examples/information_extraction/msra_ner/train.py", line 93, in do_train
raw_datasets = load_dataset(args.dataset)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/datasets/load.py", line 2129, in load_dataset
builder_instance = load_dataset_builder(
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/datasets/load.py", line 1849, in load_dataset_builder
dataset_module = dataset_module_factory(
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/datasets/load.py", line 1731, in dataset_module_factory
raise e1 from None
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/datasets/load.py", line 1618, in dataset_module_factory
raise ConnectionError(f"Couldn't reach '{path}' on the Hub ({e.class.name})") from e
ConnectionError: Couldn't reach 'msra_ner' on the Hub (LocalEntryNotFoundError)

zgrennn added the question Further information is requested label Feb 11, 2025

paddle-bot bot assigned ZHUI Feb 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: msra数据集无法加载 #9842

[Question]: msra数据集无法加载 #9842

zgrennn commented Feb 11, 2025

[Question]: msra数据集无法加载 #9842

[Question]: msra数据集无法加载 #9842

Comments

zgrennn commented Feb 11, 2025

请提出你的问题