Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 12 additions & 4 deletions docs/source/guides/evaluate.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,11 +112,19 @@ These metrics use a judge LLM for evaluating the generated output and retrieved
llms:
nim_rag_eval_llm:
_type: nim
model_name: meta/llama-3.3-70b-instruct
temperature: 0.0000001
top_p: 0.0001
max_tokens: 2
model_name: meta/llama-3.1-70b-instruct
max_tokens: 8
```
For these metrics, it is recommended to use 8 tokens for the judge LLM.

Evaluation is dependent on the judge LLM's ability to accurately evaluate the generated output and retrieved context. This is the leadership board for the judge LLM:
```
1)- mistralai/mixtral-8x22b-instruct-v0.1
2)- mistralai/mixtral-8x7b-instruct-v0.1
3)- meta/llama-3.1-70b-instruct
4)- meta/llama-3.3-70b-instruct
```
For a complete list of up-to-date judge LLMs, refer to the [RAGAS NV metrics leaderboard](https://github.com/explodinggradients/ragas/blob/main/src/ragas/metrics/_nv_metrics.py)

### Trajectory Evaluator
This evaluator uses the intermediate steps generated by the workflow to evaluate the workflow trajectory. The evaluator configuration includes the evaluator type and any additional parameters required by the evaluator.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,9 +35,7 @@ llms:
nim_rag_eval_llm:
_type: nim
model_name: meta/llama-3.1-70b-instruct
temperature: 0.0000001
top_p: 0.0001
max_tokens: 2
max_tokens: 8
nim_rag_eval_large_llm:
_type: nim
model_name: meta/llama-3.1-70b-instruct
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,7 @@ llms:
nim_rag_eval_llm:
_type: nim
model_name: meta/llama-3.1-70b-instruct
temperature: 0.0000001
top_p: 0.0001
max_tokens: 2
max_tokens: 8
nim_trajectory_eval_llm:
_type: nim
model_name: meta/llama-3.1-70b-instruct
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,7 @@ llms:
nim_rag_eval_llm:
_type: nim
model_name: meta/llama-3.1-70b-instruct
temperature: 0.0000001
top_p: 0.0001
max_tokens: 2
max_tokens: 8
nim_trajectory_eval_llm:
_type: nim
model_name: meta/llama-3.1-70b-instruct
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,7 @@ llms:
nim_rag_eval_llm:
_type: nim
model_name: meta/llama-3.1-70b-instruct
temperature: 0.0000001
top_p: 0.0001
max_tokens: 2
max_tokens: 8
nim_trajectory_eval_llm:
_type: nim
model_name: meta/llama-3.1-70b-instruct
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -50,9 +50,7 @@ llms:
nim_rag_eval_llm:
_type: nim
model_name: meta/llama-3.1-70b-instruct
temperature: 0.0000001
top_p: 0.0001
max_tokens: 2
max_tokens: 8
nim_trajectory_eval_llm:
_type: nim
model_name: meta/llama-3.1-70b-instruct
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,7 @@ llms:
nim_rag_eval_llm:
_type: nim
model_name: meta/llama-3.1-70b-instruct
temperature: 0.0000001
top_p: 0.0001
max_tokens: 2
max_tokens: 8
nim_trajectory_eval_llm:
_type: nim
model_name: meta/llama-3.1-70b-instruct
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,9 +60,7 @@ llms:
nim_rag_eval_llm:
_type: nim
model_name: meta/llama-3.1-70b-instruct
temperature: 0.0000001
top_p: 0.0001
max_tokens: 2
max_tokens: 8
r1_model:
_type: nim
model_name: deepseek-ai/deepseek-r1
Expand Down
4 changes: 1 addition & 3 deletions examples/email_phishing_analyzer/configs/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,7 @@ llms:
nim_rag_eval_llm:
_type: nim
model_name: meta/llama-3.1-70b-instruct
temperature: 0.0000001
top_p: 0.0001
max_tokens: 2
max_tokens: 8
nim_trajectory_eval_llm:
_type: nim
model_name: meta/llama-3.1-70b-instruct
Expand Down
4 changes: 1 addition & 3 deletions examples/simple/src/aiq_simple/configs/eval_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,7 @@ llms:
nim_rag_eval_llm:
_type: nim
model_name: meta/llama-3.1-70b-instruct
temperature: 0.0000001
top_p: 0.0001
max_tokens: 6
max_tokens: 8
nim_trajectory_eval_llm:
_type: nim
model_name: meta/llama-3.1-70b-instruct
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,10 +38,8 @@ llms:
temperature: 0.0
nim_rag_eval_llm:
_type: nim
model_name: meta/llama-3.3-70b-instruct
temperature: 0.0000001
top_p: 0.0001
max_tokens: 2
model_name: meta/llama-3.1-70b-instruct
max_tokens: 8
nim_trajectory_eval_llm:
_type: nim
model_name: meta/llama-3.1-70b-instruct
Expand Down
Loading