Skip to content

Conversation

@kajyuuen
Copy link
Contributor

negative_passages length for CohereMiracljaqueries2212Dataset may be 0.
Below is an example.
https://huggingface.co/datasets/Cohere/miracl-ja-queries-22-12/viewer/Cohere--miracl-ja-queries-22-12/train?row=2

@conglongli
Copy link
Contributor

Hi @kajyuuen thank you for the contribution. The formatting test is failing, please see my comment at #597 (comment) about how to fix it. Thank you.

Also regarding these Japanese datasets, when I found these datasets I did some basic step-1 training tests using them (https://github.com/microsoft/DeepSpeedExamples/blob/master/applications/DeepSpeed-Chat/training/step1_supervised_finetuning/training_scripts/other_language/run_japanese.sh). However, the generated answers were not that great, potentially due to the quality of model/data (or both). So please take that as a caveat. We welcome your contributions if you find any better model/data/training recipe.

@kajyuuen
Copy link
Contributor Author

@conglongli
Thank you for your comment and advice! I have reformatted it.

@conglongli conglongli merged commit c884346 into deepspeedai:master Jun 26, 2023
LeetJoe pushed a commit to LeetJoe/DeepSpeedExamples that referenced this pull request Sep 15, 2023
…speedai#608)

* Fix get_prompt_and_rejected in CohereMiracljaqueries2212Dataset

* Reformat code using yapf
hwchen2017 pushed a commit that referenced this pull request Jun 8, 2025
* Fix get_prompt_and_rejected in CohereMiracljaqueries2212Dataset

* Reformat code using yapf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants