-
Notifications
You must be signed in to change notification settings - Fork 415
Enable datasets with custom formats #615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable datasets with custom formats #615
Conversation
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Co-authored-by: Copilot <[email protected]> Signed-off-by: Anuradha Karuppiah <[email protected]>
Co-authored-by: Copilot <[email protected]> Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Co-authored-by: Copilot <[email protected]> Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Co-authored-by: Copilot <[email protected]> Signed-off-by: Anuradha Karuppiah <[email protected]>
…iq_simple_calculator_eval/scripts/custom_dataset_parser.py Co-authored-by: Copilot <[email protected]> Signed-off-by: Anuradha Karuppiah <[email protected]>
yczhang-nv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except an empty file in the PR
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
This reverts commit 685c96a.
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enables custom dataset formats by adding a new EvalDatasetCustomConfig class that allows users to specify custom Python functions for dataset transformation. The implementation provides a standardized interface for custom parsers while maintaining compatibility with existing functionality.
Key changes:
- Added EvalDatasetCustomConfig class for custom dataset handling with function loading and execution
- Implemented preprocessing utilities to apply standard filters and transformations to custom datasets
- Updated EvalInputItem model to provide default values for optional fields
Reviewed Changes
Copilot reviewed 15 out of 16 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/nat/data_models/dataset_handler.py | Adds EvalDatasetCustomConfig class with custom function loading capabilities |
| src/nat/eval/dataset_handler/dataset_handler.py | Implements custom dataset handling logic and preprocessing utilities |
| src/nat/eval/evaluator/evaluator_model.py | Updates EvalInputItem to provide default values for workflow-populated fields |
| tests/nat/eval/dataset_handler/test_dataset_handler.py | Adds comprehensive tests for custom dataset functionality |
| examples/evaluation_and_profiling/simple_calculator_eval/ | Provides example implementation with nested JSON parser and configuration |
| docs/source/reference/evaluate.md | Documents custom dataset format usage and API |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.
|
/merge |
Closes: NVIDIA#352 This PR enables custom dataset formats by adding a new EvalDatasetCustomConfig class that allows users to specify custom Python functions for dataset transformation. The implementation provides a standardized interface for custom parsers while maintaining compatibility with existing functionality. Adds EvalDatasetCustomConfig class for custom dataset handling Implements custom function loading and execution with error handling Adds preprocessing utilities to apply standard filters and transformations Includes example implementation and documentation ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. Authors: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) Approvers: - Yuchen Zhang (https://github.com/yczhang-nv) URL: NVIDIA#615 Signed-off-by: Sangharsh Aglave <[email protected]>
Description
Closes: #352
This PR enables custom dataset formats by adding a new EvalDatasetCustomConfig class that allows users to specify custom Python functions for dataset transformation. The implementation provides a standardized interface for custom parsers while maintaining compatibility with existing functionality.
Adds EvalDatasetCustomConfig class for custom dataset handling
Implements custom function loading and execution with error handling
Adds preprocessing utilities to apply standard filters and transformations
Includes example implementation and documentation
By Submitting this PR I confirm: