Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: msra数据集无法加载 #9842

Open
zgrennn opened this issue Feb 11, 2025 · 0 comments
Open

[Question]: msra数据集无法加载 #9842

zgrennn opened this issue Feb 11, 2025 · 0 comments
Assignees
Labels
question Further information is requested

Comments

@zgrennn
Copy link

zgrennn commented Feb 11, 2025

请提出你的问题

slm/examples/information_extraction/msra_ner中文命名实体识别模型微调过程中遇到问题。
微调代码:
!python -u /home/aistudio/PaddleNLP/examples/information_extraction/msra_ner/train.py
--model_type bert
--model_name_or_path bert-base-multilingual-uncased
--dataset msra_ner
--max_seq_length 128
--batch_size 32
--learning_rate 2e-5
--num_train_epochs 3
--logging_steps 1
--save_steps 500
--output_dir ./tmp/msra_ner/
--device gpu
报错信息:
[2025-02-11 16:58:21,671] [ WARNING] - Detected that datasets module was imported before paddlenlp. This may cause PaddleNLP datasets to be unavalible in intranet. Please import paddlenlp before datasets module to avoid download issues
[2025-02-11 16:58:21,937] [ WARNING] - if you run ring_flash_attention.py, please ensure you install the paddlenlp_ops by following the instructions provided at https://github.com/PaddlePaddle/PaddleNLP/blob/develop/csrc/README.md
[2025-02-11 16:58:23,287] [ INFO] - model_type :bert
[2025-02-11 16:58:23,288] [ INFO] - model_name_or_path :bert-base-multilingual-uncased
[2025-02-11 16:58:23,288] [ INFO] - dataset :msra_ner
[2025-02-11 16:58:23,288] [ INFO] - output_dir :./tmp/msra_ner/
[2025-02-11 16:58:23,288] [ INFO] - max_seq_length :128
[2025-02-11 16:58:23,288] [ INFO] - batch_size :32
[2025-02-11 16:58:23,288] [ INFO] - learning_rate :2e-05
[2025-02-11 16:58:23,288] [ INFO] - weight_decay :0.0
[2025-02-11 16:58:23,288] [ INFO] - adam_epsilon :1e-08
[2025-02-11 16:58:23,288] [ INFO] - max_grad_norm :1.0
[2025-02-11 16:58:23,288] [ INFO] - num_train_epochs :3
[2025-02-11 16:58:23,288] [ INFO] - max_steps :-1
[2025-02-11 16:58:23,288] [ INFO] - warmup_steps :0
[2025-02-11 16:58:23,288] [ INFO] - logging_steps :1
[2025-02-11 16:58:23,289] [ INFO] - save_steps :500
[2025-02-11 16:58:23,289] [ INFO] - seed :42
[2025-02-11 16:58:23,289] [ INFO] - device :gpu
Traceback (most recent call last):
File "/home/aistudio/PaddleNLP/examples/information_extraction/msra_ner/train.py", line 216, in
do_train(args)
File "/home/aistudio/PaddleNLP/examples/information_extraction/msra_ner/train.py", line 93, in do_train
raw_datasets = load_dataset(args.dataset)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/datasets/load.py", line 2129, in load_dataset
builder_instance = load_dataset_builder(
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/datasets/load.py", line 1849, in load_dataset_builder
dataset_module = dataset_module_factory(
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/datasets/load.py", line 1731, in dataset_module_factory
raise e1 from None
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/datasets/load.py", line 1618, in dataset_module_factory
raise ConnectionError(f"Couldn't reach '{path}' on the Hub ({e.class.name})") from e
ConnectionError: Couldn't reach 'msra_ner' on the Hub (LocalEntryNotFoundError)

@zgrennn zgrennn added the question Further information is requested label Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants