Update the Judge LLM settings in the examples to avoid retries #204

AnuradhaKaruppiah · 2025-05-02T19:58:05Z

The ragas nv_metrics require 3-8 tokens, temperature can be left at the default of 0.1.
Also adjusted the LLM model based on the leadership board.

Description

Closes #202

By Submitting this PR I confirm:

I am familiar with the Contributing Guidelines.
We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
- Any contribution which contains commits that are not Signed-Off will not be accepted.
When the PR is ready for review, new or existing tests cover these changes.
When the PR is ready for review, the documentation is up to date with these changes.

The ragas nv_metrics require 3-8 tokens, temperature can be left at the default of 0.1. Also adjusted the LLM model based on the leadership board. Signed-off-by: Anuradha Karuppiah <[email protected]>

Copilot

Pull Request Overview

This PR updates the judge LLM settings used across various example configurations and documentation to align with the new leadership board recommendations. Key changes include updating the model name from meta/llama-3.3-70b-instruct to meta/llama-3.1-70b-instruct, removing explicit temperature and top_p parameters from the nim_rag_eval_llm configuration, and increasing the max_tokens value from 2–6 tokens to 8 tokens.

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
examples/simple/src/aiq_simple/configs/eval_upload_config.yml	Updated nim_rag_eval_llm configuration to use the new model and token count, removing unneeded temperature and top_p settings.
examples/simple/src/aiq_simple/configs/eval_config.yml	Adjusted nim_rag_eval_llm parameters to match the new standard.
examples/email_phishing_analyzer/configs/config.yml	Consistent update of nim_rag_eval_llm settings for email phishing analyzer.
examples/email_phishing_analyzer/configs/config-reasoning.yml	Similar update to nim_rag_eval_llm configuration.
examples/email_phishing_analyzer/configs/config-phi-3-mini-4k-instruct.yml	Updated nim_rag_eval_llm settings to reflect the new token count and model.
examples/email_phishing_analyzer/configs/config-phi-3-medium-4k-instruct.yml	Modified nim_rag_eval_llm configuration accordingly.
examples/email_phishing_analyzer/configs/config-mixtral-8x22b-instruct-v0.1.yml	Updated nim_rag_eval_llm to the new model name and token count.
examples/email_phishing_analyzer/configs/config-llama-3.3-70b-instruct.yml	Changed model references and removed explicit temperature and top_p parameters.
examples/email_phishing_analyzer/configs/config-llama-3.1-8b-instruct.yml	Updates mirror other nim_rag_eval_llm configurations.
examples/documentation_guides/workflows/text_file_ingest/src/text_file_ingest/configs/config.yml	Adjusted nim_rag_eval_llm settings for consistency with overall configuration changes.
docs/source/guides/evaluate.md	Documentation updated to reflect the new judge LLM model and token configuration, along with guidance on the recommended settings.

Comments suppressed due to low confidence (2)

examples/simple/src/aiq_simple/configs/eval_upload_config.yml:42

Ensure that the removal of explicit 'temperature' and 'top_p' entries in the nim_rag_eval_llm configuration is intentional and that the defaults (e.g., a temperature of 0.1) are correctly applied across all environments.

max_tokens: 8

docs/source/guides/evaluate.md:115

Confirm that the updated judge LLM model name in the documentation aligns with the configuration changes across the project and reflects the intended leadership board update.

model_name: meta/llama-3.1-70b-instruct

Signed-off-by: Anuradha Karuppiah <[email protected]>

AnuradhaKaruppiah · 2025-05-02T20:08:42Z

/merge

…A#204) Closes NVIDIA#202 ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/advanced/contributing.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. Authors: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) Approvers: - Eric Evans II (https://github.com/ericevans-nv) URL: NVIDIA#204 Signed-off-by: Yuchen Zhang <[email protected]>

…A#204) Closes NVIDIA#202 ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/advanced/contributing.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. Authors: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) Approvers: - Eric Evans II (https://github.com/ericevans-nv) URL: NVIDIA#204 Signed-off-by: Eric Evans <[email protected]>

…A#204) Closes NVIDIA#202 ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/advanced/contributing.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. Authors: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) Approvers: - Eric Evans II (https://github.com/ericevans-nv) URL: NVIDIA#204

Update the Judge LLM settings in the examples to avoid retries

6e328fd

The ragas nv_metrics require 3-8 tokens, temperature can be left at the default of 0.1. Also adjusted the LLM model based on the leadership board. Signed-off-by: Anuradha Karuppiah <[email protected]>

AnuradhaKaruppiah added improvement Improvement to existing functionality non-breaking Non-breaking change labels May 2, 2025

AnuradhaKaruppiah self-assigned this May 2, 2025

AnuradhaKaruppiah requested a review from Copilot May 2, 2025 19:58

Copilot AI reviewed May 2, 2025

View reviewed changes

Fix vale warning

0942259

Signed-off-by: Anuradha Karuppiah <[email protected]>

ericevans-nv approved these changes May 2, 2025

View reviewed changes

rapids-bot bot merged commit 06c8aeb into NVIDIA:develop May 2, 2025
10 checks passed

AnuradhaKaruppiah deleted the eval-config branch May 6, 2025 00:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update the Judge LLM settings in the examples to avoid retries #204

Update the Judge LLM settings in the examples to avoid retries #204

Uh oh!

AnuradhaKaruppiah commented May 2, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

AnuradhaKaruppiah commented May 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Update the Judge LLM settings in the examples to avoid retries #204

Update the Judge LLM settings in the examples to avoid retries #204

Uh oh!

Conversation

AnuradhaKaruppiah commented May 2, 2025

Description

By Submitting this PR I confirm:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

AnuradhaKaruppiah commented May 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants