We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describe the bug What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图) [采用]https://github.com/modelscope/ms-swift/blob/main/examples/notebook/qwen2_5-self-cognition/self-cognition-sft.ipynb这里的微调代码,发现在加载数据集路径出错。
import os os.environ['CUDA_VISIBLE_DEVICES'] = '0' from swift.llm import ( get_model_tokenizer, load_dataset, get_template, EncodePreprocessor, get_model_arch, get_multimodal_target_regex, LazyLLMDataset ) from swift.utils import get_logger, get_model_parameter_info, plot_images, seed_everything from swift.tuners import Swift, LoraConfig from swift.trainers import Seq2SeqTrainer, Seq2SeqTrainingArguments from functools import partial logger = get_logger() seed_everything(42) # model model_id_or_path = './ms-swift/Internvl25_1B' model_type = 'internvl2_5' system = None # 使用template中定义的默认system output_dir = 'output/InternVL2_5-1B' # dataset dataset = ['./ms-swift/Datasets/Jsonfile/train__swift.jsonl'] # dataset_id或者dataset_path。 data_seed = 42 max_length = 8192 split_dataset_ratio = 0.01 # 切分验证集的比例 num_proc = 4 # 数据处理的进程数 strict = False # lora lora_rank = 8 lora_alpha = 8 freeze_llm = False freeze_vit = True freeze_aligner = True ................. .................
Your hardware and system info Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)
ms-swift3.1版本
Additional context Add any other context about the problem here(在这里补充其他信息) bug
File "./ms-swift/internvl_251B.py", line 88, in <module> train_dataset, val_dataset = load_dataset(dataset, split_dataset_ratio=split_dataset_ratio, num_proc=num_proc, File "./ms-swift/swift/llm/dataset/loader.py", line 468, in load_dataset train_dataset = load_function(dataset_syntax, dataset_meta, **load_kwargs) File "./ms-swift/swift/llm/dataset/loader.py", line 363, in load dataset = DatasetLoader._load_dataset_path( File "./ms-swift/swift/llm/dataset/loader.py", line 197, in _load_dataset_path dataset = hf_load_dataset(file_type, data_files=dataset_path, **kwargs) File "./anaconda3/envs/swift/lib/python3.10/site-packages/datasets/load.py", line 2151, in load_dataset builder_instance.download_and_prepare( File "./anaconda3/envs/swift/lib/python3.10/site-packages/datasets/builder.py", line 924, in download_and_prepare self._download_and_prepare( File "./anaconda3/envs/swift/lib/python3.10/site-packages/datasets/builder.py", line 1000, in _download_and_prepare self._prepare_split(split_generator, **prepare_split_kwargs) File "./anaconda3/envs/swift/lib/python3.10/site-packages/datasets/builder.py", line 1741, in _prepare_split for job_id, done, content in self._prepare_split_single( File "./anaconda3/envs/swift/lib/python3.10/site-packages/datasets/builder.py", line 1897, in _prepare_split_single raise DatasetGenerationError("An error occurred while generating the dataset") from e datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
出现了这个问题,然后我的jsonl文件里面的东西为
{"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "<image>检测出图像中的<ref-object>,并提供每个物体的目标框坐标。"}, {"role": "assistant", "content": "<bbox></bbox>"}], "images": ["./ms-swift/Datasets/Traffic/15/1555.jpg"], "objects": {"ref": ["某物体"], "bbox": [[371, 648, 450, 758]]}}
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)
[采用]https://github.com/modelscope/ms-swift/blob/main/examples/notebook/qwen2_5-self-cognition/self-cognition-sft.ipynb这里的微调代码,发现在加载数据集路径出错。
Your hardware and system info
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)
Additional context
Add any other context about the problem here(在这里补充其他信息)
bug
出现了这个问题,然后我的jsonl文件里面的东西为
The text was updated successfully, but these errors were encountered: