Skip to content

Conversation

@andrej1991
Copy link
Collaborator

@andrej1991 andrej1991 commented Nov 11, 2025

By examining the last 100 runs of assisted-chat eval tests in the CI the issue appeared 4 times. The description of the issue:
The LLM asks for cluster ID even if the cluster was created with the previous message in the same conversation.
Example:
Query: Using the ID of the cluster you just created, get the Discovery ISO download URL for cluster 'eval-test-singlenode-d0xsqi07'
Response: I cannot use the cluster name to get the Discovery ISO download URL. I need the cluster ID. The cluster ID is 12e392cb-82e3-43e1-9923-d627b6476f43. Would you like me to get the Discovery ISO download URL for you?

Root cause:
Some test cases used redundant identification of the cluster. They had
an indirect refference like 'the cluster you just created' and a direct refference 'cluster named xyz'. LLMs tend to have a policy that in such redundant definitions it shall verify, which wan is the subject of the query.

Summary by CodeRabbit

  • Tests
    • Updated evaluation test queries to use generic cluster references instead of specific cluster identifiers. This change simplifies test data across SNO and multinode configuration test cases while maintaining existing test structure and behavior.

By examining the last 100 runs of assisted-chat eval tests in the CI the issue appeared 4 times.
The description of the issue:
  The LLM asks for cluster ID even if the cluster was created with the previous message in the same conversation.
Example:
  Query: Using the ID of the cluster you just created, get the Discovery ISO download URL for cluster 'eval-test-singlenode-d0xsqi07'
  Response: I cannot use the cluster name to get the Discovery ISO download URL. I need the cluster ID. The cluster ID is 12e392cb-82e3-43e1-9923-d627b6476f43. Would you like me to get the Discovery ISO download URL for you?

Root cause:
  Some test cases used redundant identification of the cluster. They had
an indirect refference like 'the cluster you just created' and a direct
refference 'cluster named xyz'. LLMs tend to have a policy that in such
redundant definitions it shall verify, which wan is the subject of the
query.
@openshift-ci openshift-ci bot requested review from carbonin and omertuc November 11, 2025 14:41
@coderabbitai
Copy link

coderabbitai bot commented Nov 11, 2025

Walkthrough

Updates three eval query strings in test configuration to replace specific cluster name references with generic phrasing. Changes "for cluster 'NAME'" to "for the cluster" in three distinct eval_query entries within the eval data file.

Changes

Cohort / File(s) Summary
Test evaluation data updates
test/evals/eval_data.yaml
Updated three eval_query strings to replace cluster name references with generic phrasing; modified get_iso_eval_test_sno, get_iso_eval_test_multinode, and cluster ISO retrieval query entries

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Possibly related PRs

Suggested labels

lgtm, approved

Suggested reviewers

  • eranco74
  • keitwb
  • maorfr

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'Resolves asking for cluster ID when it's known' directly matches the main objective of fixing redundant cluster identification in test cases to prevent unnecessary LLM queries.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b98f900 and c6bd664.

📒 Files selected for processing (1)
  • test/evals/eval_data.yaml (2 hunks)
🔇 Additional comments (2)
test/evals/eval_data.yaml (2)

62-62: Aligns with PR objective to remove redundant cluster identification.

The change from an explicit cluster name reference to a generic "the cluster you just created" eliminates redundancy that could confuse the LLM. The query now relies solely on indirect context reference, which should help the assistant extract the cluster ID from the conversation history without unnecessary disambiguation.

Ensure the evaluation test passes with this indirect reference approach—the LLM should correctly identify the cluster ID from the conversation context created in the previous eval step.


90-90: Consistent application of redundancy removal pattern.

This change mirrors line 62, applying the same principle to the multinode workflow: removing the explicit cluster name to rely on implicit context. The expected tool call remains unchanged, confirming the LLM should still extract the cluster ID correctly.

Verify that the evaluation test correctly executes the cluster_iso_download_url tool call with the inferred cluster ID from "the cluster you just created" reference in the multinode workflow context.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@andrej1991
Copy link
Collaborator Author

/retest-required

@carbonin
Copy link
Collaborator

/retest

@openshift-ci
Copy link

openshift-ci bot commented Nov 11, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andrej1991, carbonin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 7b68e95 into rh-ecosystem-edge:main Nov 11, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants