feat: add OpenAI format dataset for SFT#485
Conversation
Signed-off-by: Atsunori Fujita <afujita@nvidia.com>
ashors1
left a comment
There was a problem hiding this comment.
Thanks for the PR! Could we add some documentation on this class and how it differs from prompt_response_dataset.py? Do you think it would make sense to consider merging this class with PromptResponseDataset?
Signed-off-by: Atsunori Fujita <afujita@nvidia.com>
|
Hi @ashors1, added docstrings and unit tests.
The |
ashors1
left a comment
There was a problem hiding this comment.
Looks good! Thank you for the contribution!
|
@AtsunoriFujita could you run pre-commit on your change? It fails our linter job |
Signed-off-by: Atsunori Fujita <afujita@nvidia.com>
|
Hi @terrykong, applied pre-commit. |
|
@AtsunoriFujita do you mind putting an example run command in the description so that users finding this PR can learn how to use this? |
|
@terrykong, thank you. I added it. |
Signed-off-by: Atsunori Fujita <afujita@nvidia.com>
Signed-off-by: Atsunori Fujita <afujita@nvidia.com>
Signed-off-by: Atsunori Fujita <afujita@nvidia.com>
Signed-off-by: Atsunori Fujita <afujita@nvidia.com>
What does this PR do ?
This PR enables using the OpenAI format dataset from a
json/jsonlwhen running SFT.Issues
List issues that this PR closes (syntax):
Usage
Modify
examples/configs/sft.yamlRun SFT job
Before your PR is "Ready for review"
Pre checks:
Additional Information