-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Customization Dataset Preparation Tool #6029
Conversation
Allows users to read data into prompt-and-completion format .jsonl as expected by the Customization service/NeMo LLM P tuning service Signed-off-by: Zhilin Wang [email protected]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add licence headers as well as docstrings to functions
tools/customization_dataset_preparation/customization_dataset_preparation.py
Show resolved
Hide resolved
tools/customization_dataset_preparation/customization_dataset_preparation.py
Show resolved
Hide resolved
tools/customization_dataset_preparation/customization_dataset_preparation.py
Show resolved
Hide resolved
@@ -0,0 +1,151 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be added as a subsection here docs/source/nlp/nemo_megatron/prompt_learning.rst ? Also there is no dataset_validation.py
file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently set to the run instructions in the header of customization_dataset_preparation.py --> this would be much easier for people to see and read when they use the py file separately (w/o needing to download nemo or going to nemo docs).
Signed-off-by: Zhilin Wang [email protected]
Signed-off-by: Zhilin Wang [email protected]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
* Add Customization Dataset Preparation Tool Allows users to read data into prompt-and-completion format .jsonl as expected by the Customization service/NeMo LLM P tuning service Signed-off-by: Zhilin Wang [email protected] * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add license and usage examples, remove tutorial Signed-off-by: Zhilin Wang [email protected] * Fix typo Signed-off-by: Zhilin Wang [email protected] * Fix some more typos --------- Signed-off-by: Zhilin Wang [email protected] Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Oleksii Kuchaiev <[email protected]>
* Add Customization Dataset Preparation Tool Allows users to read data into prompt-and-completion format .jsonl as expected by the Customization service/NeMo LLM P tuning service Signed-off-by: Zhilin Wang [email protected] * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add license and usage examples, remove tutorial Signed-off-by: Zhilin Wang [email protected] * Fix typo Signed-off-by: Zhilin Wang [email protected] * Fix some more typos --------- Signed-off-by: Zhilin Wang [email protected] Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: hsiehjackson <[email protected]>
Allows users to read data into prompt-and-completion format .jsonl as expected by the Customization service/NeMo LLM P tuning service
Signed-off-by: Zhilin Wang [email protected]
What does this PR do ?
Allows users to read data into prompt-and-completion format .jsonl as expected by the Customization service/NeMo LLM P tuning service
Collection: NLP
Changelog
Usage
See tutorial.ipynb
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information