From 4bd81838d085f8b21cec0819553f28e76672638a Mon Sep 17 00:00:00 2001 From: Hsin Chen Date: Tue, 3 Jun 2025 17:01:54 -0700 Subject: [PATCH] Update trajectory eval doc Signed-off-by: Hsin Chen --- docs/source/workflows/evaluate.md | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/docs/source/workflows/evaluate.md b/docs/source/workflows/evaluate.md index 52fdae247..d717feb72 100644 --- a/docs/source/workflows/evaluate.md +++ b/docs/source/workflows/evaluate.md @@ -142,9 +142,18 @@ eval: A judge LLM is used to evaluate the trajectory produced by the workflow, taking into account the tools available during execution. It returns a floating-point score between 0 and 1, where 1.0 indicates a perfect trajectory. +To configure the judge LLM, define it in the `llms` section of the configuration file, and reference it in the evaluator configuration using the `llm_name` key. + It is recommended to set `max_tokens` to 1024 for the judge LLM to ensure sufficient context for evaluation. -To configure the judge LLM, define it in the `llms` section of the configuration file, and reference it in the evaluator configuration using the `llm_name` key. +Note: Trajectory evaluation may result in frequent LLM API calls. If you encounter rate-limiting errors (such as `[429] Too Many Requests` error), you can reduce the number of concurrent requests by adjusting the `max_concurrency` parameter in your config. For example: + +```yaml +eval: + general: + max_concurrency: 2 +``` +This setting reduces the number of concurrent requests to avoid overwhelming the LLM endpoint. ## Workflow Output The `aiq eval` command runs the workflow on all the entries in the `dataset`. The output of these runs is stored in a file named `workflow_output.json` under the `output_dir` specified in the configuration file.