-
Notifications
You must be signed in to change notification settings - Fork 415
Enable datasets with custom formats #615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
rapids-bot
merged 53 commits into
NVIDIA:develop
from
AnuradhaKaruppiah:ak-customize-input-2
Aug 14, 2025
Merged
Changes from 39 commits
Commits
Show all changes
53 commits
Select commit
Hold shift + click to select a range
68c0aa2
Add a ping option to aiq info mcp
AnuradhaKaruppiah 4c4b001
Add a note in the README for "aiq info mcp ping"
AnuradhaKaruppiah 1e5c7bc
Simplify error handling
AnuradhaKaruppiah 08f557d
Apply user specified timeout to the ping
AnuradhaKaruppiah 525b1c1
Make the timeout int every where
AnuradhaKaruppiah 02e4619
Minor comments cleanup
AnuradhaKaruppiah 13103c4
Simplify output printing
AnuradhaKaruppiah 9d5cae1
Move all anyio calls to asyncio for consistency
AnuradhaKaruppiah 3693af6
Remove unnecessary imports
AnuradhaKaruppiah fcad99a
Add a temporary health check route
AnuradhaKaruppiah f04584c
Drop unnecessary ping status
AnuradhaKaruppiah 9b0f768
Add defaults to the description strings for CLI help
AnuradhaKaruppiah e6e0dd2
WIP
AnuradhaKaruppiah 3e9c347
Provide a MCP front-end plugin worker that can be used for customization
AnuradhaKaruppiah 7b374c6
Unit tests for custom MCP routes
AnuradhaKaruppiah f727aff
Add doc for the health route
AnuradhaKaruppiah 1020f9a
Merge remote-tracking branch 'upstream/develop' into mcp-health-check
AnuradhaKaruppiah d9c4154
Merge remote-tracking branch 'upstream/develop' into mcp-health-check
AnuradhaKaruppiah 459287c
Minor fixup
AnuradhaKaruppiah 267e8d3
Fix unit test failures
AnuradhaKaruppiah 3416bbb
Merge remote-tracking branch 'upstream/develop' into mcp-health-check
AnuradhaKaruppiah 29a7ab4
Fix test link failures
AnuradhaKaruppiah 9f1fe9f
More pylint fixes
AnuradhaKaruppiah 1038b59
Fix pylint errors
AnuradhaKaruppiah 5bf3ea9
Fix imcorrect import
AnuradhaKaruppiah 369b3dc
More pyline fixes
AnuradhaKaruppiah 184d679
Unit tests fixup
AnuradhaKaruppiah 3afc321
Add support for custom dataset parser
AnuradhaKaruppiah f9baf33
Merge remote-tracking branch 'upstream/develop' into ak-customize-inp…
AnuradhaKaruppiah 4e0879b
Update evaluation_and_profiling simple_calculator for custom datasets
AnuradhaKaruppiah b110742
example fixups
AnuradhaKaruppiah 5f6a1ba
Update examples/evaluation_and_profiling/simple_calculator_eval/src/a…
AnuradhaKaruppiah 71efd34
Update docs/source/reference/evaluate.md
AnuradhaKaruppiah bf545bd
Update docs/source/reference/evaluate.md
AnuradhaKaruppiah 68976fb
Remove dup cpyright header
AnuradhaKaruppiah f7dffa4
Update src/aiq/eval/dataset_handler/dataset_handler.py
AnuradhaKaruppiah 229bc0c
Limit the fields that needed to be filled by the custom dataset parser
AnuradhaKaruppiah d6bd553
Update docs/source/reference/evaluate.md
AnuradhaKaruppiah 583f1bb
Update examples/evaluation_and_profiling/simple_calculator_eval/src/a…
AnuradhaKaruppiah eb21a09
Add unit tests
AnuradhaKaruppiah b5fdc4e
Update in response to review comments
AnuradhaKaruppiah f3c5dc6
Docs updates
AnuradhaKaruppiah de86306
Fix CI checks
AnuradhaKaruppiah f5cbee9
More CI check failures
AnuradhaKaruppiah c76bade
Fix path CI checks
AnuradhaKaruppiah 685c96a
More CI fixes
AnuradhaKaruppiah d44310d
Revert "More CI fixes"
AnuradhaKaruppiah 11a654e
Merge remote-tracking branch 'upstream/develop' into ak-customize-inp…
AnuradhaKaruppiah 6c69b68
aiq to nat updates
AnuradhaKaruppiah 2ad28c3
aiq to nat renaming
AnuradhaKaruppiah 4aa0915
More nat renaming fixes
AnuradhaKaruppiah a9a9f76
Update symlinks
AnuradhaKaruppiah c1ee3e6
Merge remote-tracking branch 'upstream/develop' into ak-customize-inp…
AnuradhaKaruppiah File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1 change: 1 addition & 0 deletions
1
examples/evaluation_and_profiling/simple_calculator_eval/configs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| src/aiq_simple_calculator_eval/configs |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| src/aiq_simple_calculator_eval/data |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
14 changes: 14 additions & 0 deletions
14
...valuation_and_profiling/simple_calculator_eval/src/aiq_simple_calculator_eval/__init__.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| # SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. |
85 changes: 85 additions & 0 deletions
85
...e_calculator_eval/src/aiq_simple_calculator_eval/configs/config-custom-dataset-format.yml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,85 @@ | ||
| # SPDX-FileCopyrightText: Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
AnuradhaKaruppiah marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| general: | ||
| use_uvloop: true | ||
|
|
||
| functions: | ||
| calculator_multiply: | ||
| _type: calculator_multiply | ||
| calculator_inequality: | ||
| _type: calculator_inequality | ||
| calculator_divide: | ||
| _type: aiq_simple_calculator/calculator_divide | ||
| current_datetime: | ||
| _type: current_datetime | ||
| calculator_subtract: | ||
| _type: calculator_subtract | ||
|
|
||
| llms: | ||
| nim_llm: | ||
| _type: nim | ||
| model_name: meta/llama-3.1-70b-instruct | ||
| temperature: 0.2 | ||
| max_tokens: 2048 | ||
| eval_llm: | ||
| _type: nim | ||
| model_name: mistralai/mixtral-8x22b-instruct-v0.1 | ||
| temperature: 0.0 | ||
| max_tokens: 1024 | ||
|
|
||
| workflow: | ||
| _type: react_agent | ||
| tool_names: | ||
| - calculator_multiply | ||
| - calculator_inequality | ||
| - current_datetime | ||
| - calculator_divide | ||
| - calculator_subtract | ||
| llm_name: nim_llm | ||
| verbose: true | ||
| retry_agent_response_parsing_errors: true | ||
| parse_agent_response_max_retries: 3 | ||
|
|
||
|
|
||
| eval: | ||
| general: | ||
| output_dir: .tmp/aiq/examples/simple_calculator/eval | ||
| dataset: | ||
| _type: custom | ||
| file_path: examples/evaluation_and_profiling/simple_calculator_eval/data/simple_calculator_nested.json | ||
| function: aiq_simple_calculator_eval.scripts.custom_dataset_parser.extract_nested_questions | ||
| kwargs: | ||
| difficulty: "medium" | ||
| max_rows: 5 | ||
|
|
||
| evaluators: | ||
| tuneable_eval: | ||
| _type: tunable_rag_evaluator | ||
| llm_name: eval_llm | ||
| default_scoring: true | ||
| default_score_weights: | ||
| coverage: 0.5 | ||
| correctness: 0.3 | ||
| relevance: 0.2 | ||
| judge_llm_prompt: > | ||
| You are an intelligent evaluator that scores the generated answer based on the description of the expected answer. | ||
| The score is a measure of how well the generated answer matches the description of the expected answer based on the question. | ||
| Take into account the question, the relevance of the answer to the question and the quality compared to the description of the expected answer. | ||
|
|
||
| Rules: | ||
| - The score must be a float of any value between 0.0 and 1.0 on a sliding scale. | ||
| - The reasoning string must be concise and to the point. It should be 1 sentence and 2 only if extra description is needed. It must explain why the score was given and what is different between the generated answer and the expected answer. | ||
| - The tags <image> and <chart> are real images and charts. | ||
File renamed without changes.
File renamed without changes.
3 changes: 3 additions & 0 deletions
3
.../simple_calculator_eval/src/aiq_simple_calculator_eval/data/simple_calculator_nested.json
Git LFS file not shown
14 changes: 14 additions & 0 deletions
14
...n_and_profiling/simple_calculator_eval/src/aiq_simple_calculator_eval/scripts/__init__.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| # SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. |
80 changes: 80 additions & 0 deletions
80
...ng/simple_calculator_eval/src/aiq_simple_calculator_eval/scripts/custom_dataset_parser.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,80 @@ | ||
| # SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| import json | ||
| from pathlib import Path | ||
|
|
||
| from aiq.eval.evaluator.evaluator_model import EvalInput | ||
| from aiq.eval.evaluator.evaluator_model import EvalInputItem | ||
|
|
||
|
|
||
| def extract_nested_questions(file_path: Path, difficulty: str = None, max_rows: int = None) -> EvalInput: | ||
| """ | ||
| This is a sample custom dataset parser that: | ||
| 1. Loads a nested JSON file | ||
| 2. Extracts the questions array from the nested structure | ||
| 3. Applies optional filtering by difficulty (hard, medium, easy) | ||
| 4. Applies an optional maximum number of questions to return | ||
| 5. Creates an EvalInput object with the extracted questions and returns it | ||
|
|
||
| Expects JSON format: | ||
| { | ||
| "metadata": {...}, | ||
| "configuration": {...}, | ||
| "questions": [ | ||
| {"id": 1, "question": "...", "answer": "...", "category": "...", "difficulty": "...", ...}, | ||
| ... | ||
| ] | ||
| } | ||
|
|
||
| Args: | ||
| file_path: Path to the nested JSON file | ||
| difficulty: Optional difficulty to filter questions by | ||
| max_rows: Optional maximum number of questions to return | ||
|
|
||
| Returns: | ||
| EvalInput object containing the extracted questions | ||
| """ | ||
|
|
||
| # Load the nested JSON | ||
| with open(file_path, 'r') as f: | ||
| data = json.load(f) | ||
|
|
||
| # Extract questions array from the nested structure | ||
| questions = data.get('questions', []) | ||
|
|
||
| # Apply filtering if specified | ||
| if difficulty: | ||
| filtered_questions = [] | ||
| for question in questions: | ||
| # Check if difficulty matches difficulty (hard, medium, easy) | ||
| if question.get('difficulty', '').lower() == difficulty.lower(): | ||
| filtered_questions.append(question) | ||
| questions = filtered_questions | ||
|
|
||
| # Apply max_rows limit if specified | ||
| if max_rows and max_rows > 0: | ||
| questions = questions[:max_rows] | ||
|
|
||
| eval_items = [] | ||
|
|
||
| for item in questions: | ||
| eval_item = EvalInputItem(id=item['id'], | ||
| input_obj=item['question'], | ||
| expected_output_obj=item['answer'], | ||
| full_dataset_entry=item) | ||
| eval_items.append(eval_item) | ||
|
|
||
| return EvalInput(eval_input_items=eval_items) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.