[SRW] LLM Judge Dynamic Template Backend #264

chloewqg · 2025-10-13T20:22:54Z

Description

LLM Judge Dynamic Template Backend

Overall Description of changes

Implements LLM Judge Dynamic Template Backend feature, enabling customizable prompt templates and multiple rating types for LLM-based search relevance judgments.

Key Changes

Customizable Prompt Templates

Split monolithic prompt into modular components (PROMPT_SEARCH_RELEVANCE_SCORE_1_5_START, PROMPT_SEARCH_RELEVANCE_SCORE_0_1_START, PROMPT_SEARCH_RELEVANCE_SCORE_BINARY,
PROMPT_SEARCH_RELEVANCE_SCORE_END) in MLConstants.java:42-74
Users can now provide custom prompt templates via API
Default templates support three rating types: 0-1 scale, 1-5 scale, and binary (RELEVANT/IRRELEVANT)

New Rating Type System

Added LLMJudgmentRatingType enum with three types: SCORE0_1, SCORE1_5, RELEVANT_IRRELEVANT
Created RatingOutputProcessor class for rating sanitization and validation with type-specific handling
Automatic rating clamping and normalization based on configured type

Enhanced Caching System

Added promptTemplateCode field to JudgmentCache model to differentiate cached results by template
Updated JudgmentCacheDao.getJudgmentCache() to include prompt template in cache lookup
Introduced overwriteCache parameter to force cache refresh when needed

API Enhancements

Updated PutLlmJudgmentRequest to accept promptTemplate, ratingType, and overwriteCache parameters
Extended REST APIs (RestPutJudgmentAction, RestPutQuerySetAction) to support new fields
Backward compatible - existing APIs work without new parameters

Refactoring & Code Quality

Renamed queryTextWithReference → queryTextWithCustomInput throughout codebase for clarity
Deprecated old sanitizeLLMResponse() methods in favor of RatingOutputProcessor
Added utility method generatePromptTemplateCode()

End to End Testing Procedure

Step 1: Enable Workbench

Request:

curl -X PUT "http://localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'{
  "persistent": {
    "plugins.search_relevance.workbench_enabled": true
  }
}'

Response:

{
  "acknowledged": true,
  "persistent": {
    "plugins": {
      "search_relevance": {
        "workbench_enabled": "true"
      }
    }
  },
  "transient": {}
}

Status: ✅ Success

Step 2: Create Test Products Index

Request:

curl -X PUT "http://localhost:9200/test_products" -H 'Content-Type: application/json' -d'{
  "mappings": {
    "properties": {
      "name": {"type": "text"},
      "description": {"type": "text"},
      "category": {"type": "keyword"},
      "price": {"type": "float"}
    }
  }
}'

Response:

{
  "acknowledged": true,
  "shards_acknowledged": true,
  "index": "test_products"
}

Status: ✅ Success

Step 3: Load Sample Product Data

Request:

curl -X POST "http://localhost:9200/test_products/_bulk" -H 'Content-Type: application/json' -d'
{"index":{"_id":"1"}}
{"name":"Dell Laptop","description":"High performance laptop for professionals","category":"electronics","price":1200.00}
{"index":{"_id":"2"}}
{"name":"Office Chair","description":"Ergonomic office chair with lumbar support","category":"furniture","price":299.99}
{"index":{"_id":"3"}}
{"name":"Espresso Machine","description":"Premium coffee maker for home baristas","category":"kitchen","price":499.99}
{"index":{"_id":"4"}}
{"name":"Running Shoes","description":"Comfortable athletic shoes for runners","category":"sports","price":129.99}
{"index":{"_id":"5"}}
{"name":"MacBook Pro","description":"Apple laptop with M3 chip for developers","category":"electronics","price":2499.00}
'

Response:

{
  "took": 25,
  "errors": false,
  "items": [
    {"index": {"_index": "test_products", "_id": "1", "result": "created", "status": 201}},
    {"index": {"_index": "test_products", "_id": "2", "result": "created", "status": 201}},
    {"index": {"_index": "test_products", "_id": "3", "result": "created", "status": 201}},
    {"index": {"_index": "test_products", "_id": "4", "result": "created", "status": 201}},
    {"index": {"_index": "test_products", "_id": "5", "result": "created", "status": 201}}
  ]
}

Status: ✅ Success - 5 documents indexed

Step 4: Create Query Set with Custom Fields

This query set includes custom fields (category, targetAudience, referenceAnswer) that can be used in prompt template placeholders.

Request:

curl -X PUT "http://localhost:9200/_plugins/_search_relevance/query_sets" -H 'Content-Type: application/json' -d'{
  "name": "E2E Test Query Set",
  "description": "Query set for testing LLM judgment with custom fields",
  "querySetQueries": [
    {
      "queryText": "laptop for developers",
      "category": "electronics",
      "targetAudience": "professionals",
      "referenceAnswer": "A portable computer suitable for software development"
    },
    {
      "queryText": "coffee machine",
      "category": "kitchen",
      "targetAudience": "home users",
      "referenceAnswer": "An appliance for brewing coffee at home"
    }
  ]
}'

Response:

{
  "query_set_id": "2550758e-c346-4c9b-b6fd-52ff33a40ae0",
  "query_set_result": "CREATED"
}

Status: ✅ Success
Query Set ID: 2550758e-c346-4c9b-b6fd-52ff33a40ae0

Step 5: Create Multi-Field Search Configuration

Request:

curl -X PUT "http://localhost:9200/_plugins/_search_relevance/search_configurations" -H 'Content-Type: application/json' -d'{
  "name": "Products Multi-Field Search",
  "description": "Search both name and description fields",
  "index": "test_products",
  "query": "{\"query\": {\"multi_match\": {\"query\": \"%SearchText%\", \"fields\": [\"name\", \"description\"]}}}"
}'

Response:

{
  "search_configuration_id": "a1ce8022-1a9c-48ec-ab36-c9850680d9c2",
  "search_configuration_result": "CREATED"
}

Status: ✅ Success
Search Configuration ID: a1ce8022-1a9c-48ec-ab36-c9850680d9c2

Test 1: GPT-4 with SCORE0_1 Rating Type

Create Judgment with Custom Prompt Template

Request:

curl -X PUT "http://localhost:9200/_plugins/_search_relevance/judgments" -H 'Content-Type: application/json' -d'{
  "name": "Test 1: GPT-4 SCORE0_1 Custom Template",
  "type": "LLM_JUDGMENT",
  "querySetId": "2550758e-c346-4c9b-b6fd-52ff33a40ae0",
  "searchConfigurationList": ["a1ce8022-1a9c-48ec-ab36-c9850680d9c2"],
  "modelId": "ycmnTZoBJMvqPc66Lqqh",
  "size": 5,
  "tokenLimit": 4000,
  "contextFields": ["name", "description"],
  "ignoreFailure": false,
  "llmJudgmentRatingType": "SCORE0_1",
  "promptTemplate": "Given the query: {{queryText}}\nCategory: {{category}}\nTarget audience: {{targetAudience}}\nReference: {{referenceAnswer}}\n\nRate the relevance of this document on a scale of 0.0 to 1.0, where 0.0 is completely irrelevant and 1.0 is perfectly relevant.",
  "overwriteCache": false
}'

Response:

{
  "judgment_id": "5d91e3d8-0aed-4ab0-a4f5-38637fc41134"
}

Verify Results (after 15 seconds)

Request:

curl -s "http://localhost:9200/_plugins/_search_relevance/judgments/5d91e3d8-0aed-4ab0-a4f5-38637fc41134" | python3 -m json.tool

Response Summary:

{
  "status": "COMPLETED",
  "metadata": {
    "llmJudgmentRatingType": "SCORE0_1",
    "promptTemplate": "Given the query: {{queryText}}\nCategory: {{category}}\nTarget audience: {{targetAudience}}\nReference: {{referenceAnswer}}\n\nRate the relevance of this document on a scale of 0.0 to 1.0...",
    "overwriteCache": false
  },
  "judgmentRatings": [
    {
      "query": "laptop for developers#\ntargetAudience: professionals\nreferenceAnswer: A portable computer suitable for software development\ncategory: electronics",
      "ratings": [
        {"rating": "0.9", "docId": "1"}
      ]
    },
    {
      "query": "coffee machine#\ntargetAudience: home users\nreferenceAnswer: An appliance for brewing coffee at home\ncategory: kitchen",
      "ratings": [
        {"rating": "1.0", "docId": "1"}
      ]
    }
  ]
}

Test 2: GPT-4 with RELEVANT_IRRELEVANT Rating Type

Create Judgment with Binary Rating

Request:

curl -X PUT "http://localhost:9200/_plugins/_search_relevance/judgments" -H 'Content-Type: application/json' -d'{
  "name": "Test 2: GPT-4 RELEVANT_IRRELEVANT",
  "type": "LLM_JUDGMENT",
  "querySetId": "2550758e-c346-4c9b-b6fd-52ff33a40ae0",
  "searchConfigurationList": ["a1ce8022-1a9c-48ec-ab36-c9850680d9c2"],
  "modelId": "ycmnTZoBJMvqPc66Lqqh",
  "size": 5,
  "tokenLimit": 4000,
  "contextFields": ["name", "description", "category"],
  "ignoreFailure": false,
  "llmJudgmentRatingType": "RELEVANT_IRRELEVANT",
  "promptTemplate": "Search query: {{queryText}}\nCategory: {{category}}\nFor: {{targetAudience}}\nExpected: {{referenceAnswer}}\n\nDetermine if this document is RELEVANT or IRRELEVANT to the query.",
  "overwriteCache": false
}'

Response:

{
  "judgment_id": "95077971-d0ca-4191-affa-b2c704ede066"
}

Verify Results

Request:

curl -s "http://localhost:9200/_plugins/_search_relevance/judgments/95077971-d0ca-4191-affa-b2c704ede066" | python3 -m json.tool

Response Summary:

{
  "status": "COMPLETED",
  "metadata": {
    "llmJudgmentRatingType": "RELEVANT_IRRELEVANT",
    "promptTemplate": "Search query: {{queryText}}\nCategory: {{category}}\nFor: {{targetAudience}}\nExpected: {{referenceAnswer}}\n\nDetermine if this document is RELEVANT or IRRELEVANT to the query."
  },
  "judgmentRatings": [
    {
      "query": "laptop for developers#\ntargetAudience: professionals\nreferenceAnswer: A portable computer suitable for software development\ncategory: electronics",
      "ratings": [
        {"rating": "1.0", "docId": "1"}
      ]
    },
    {
      "query": "coffee machine#\ntargetAudience: home users\nreferenceAnswer: An appliance for brewing coffee at home\ncategory: kitchen",
      "ratings": [
        {"rating": "1.0", "docId": "1"}
      ]
    }
  ]
}

Issues Resolved

List any issues this PR will resolve, e.g. Closes [...].

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

src/main/java/org/opensearch/searchrelevance/judgments/LlmJudgmentsProcessor.java

src/main/java/org/opensearch/searchrelevance/utils/RatingOutputProcessor.java

src/main/java/org/opensearch/searchrelevance/common/RatingOutputProcessor.java

src/main/java/org/opensearch/searchrelevance/rest/RestPutJudgmentAction.java

src/main/java/org/opensearch/searchrelevance/judgments/LlmJudgmentsProcessor.java

wrigleyDan · 2025-10-17T12:19:46Z

Thanks for this PR. I believe that more flexibility when generating LLM-assisted judgments hugely improves the chances of this feature being useful.

Personally, I find a scale from 0-3 (or generally a 4-point scale) more useful than a 5-point scale or even more granular scales for explicit judgments and it's what I have seen most of the times in practice. It's typically increases consistency and it forces you to make a choice (it's either more on the relevant side or more on the irrelevant side, not in between).
So while I appreciate being able to add custom prompts I am wondering if the three rating types support what is used in the industry.

Most of the times I see metrics (not judgments) in the range from 0 to 1 is when the similarity of a document to a reference answer is calculated or for other metrics in use cases that go beyond retrieval (for example, faithfulness or response relevance). I would regard these as too granular for an LLM to be applied consistently.

That being said, I would recommend to support three scales:

binary judgments: relevant/irrelevant like suggested is fine. Users should be able to use 0 and 1 instead of the words relevant/irrelevant.
4-point scale: 0-3 as the default is what I would consider most widely used. However there are also scales that use four classes, like Amazon's ESCI dataset.
5-point scale: I think there are use cases where you'd want more than a 4-point scale, so offering that does makes sense.

fen-qin

overall,

would like to know the backward compatibility since this PR includes index and rest api interface changes.
- can you update mapping https://github.com/opensearch-project/search-relevance/tree/main/src/main/resources/mappings if index schema changed
can you check the LLM api level output schema constraints ? i've put more details in the comments

src/main/java/org/opensearch/searchrelevance/utils/RatingOutputProcessor.java

src/main/java/org/opensearch/searchrelevance/common/RatingOutputProcessor.java

src/test/java/org/opensearch/searchrelevance/common/RatingOutputProcessorTests.java

src/main/java/org/opensearch/searchrelevance/rest/RestPutJudgmentAction.java

src/main/java/org/opensearch/searchrelevance/rest/RestPutQuerySetAction.java

...n/java/org/opensearch/searchrelevance/transport/experiment/PutExperimentTransportAction.java

src/main/java/org/opensearch/searchrelevance/model/JudgmentCache.java

heemin32 · 2025-10-20T21:36:53Z

Default templates support three rating types: 0-1 scale, 1-5 scale, and binary (RELEVANT/IRRELEVANT)

As @wrigleyDan mentioned, I don't think need both of 0-1 scale and 1-5 scale when they both are 5 points scale.

heemin32 · 2025-10-20T21:41:53Z

Introduced overwriteCache parameter to force cache refresh when needed

Shouldn't we just increase the version of the judgement for every update and evict cache when version does not match instead of asking user to decide if they want to evict cache or not?

src/main/java/org/opensearch/searchrelevance/common/MLConstants.java

src/main/java/org/opensearch/searchrelevance/common/RatingOutputProcessor.java

src/main/java/org/opensearch/searchrelevance/judgments/LlmJudgmentsProcessor.java

src/main/java/org/opensearch/searchrelevance/common/MLConstants.java

src/main/java/org/opensearch/searchrelevance/judgments/LlmJudgmentsProcessor.java

src/main/java/org/opensearch/searchrelevance/model/Judgment.java

src/main/java/org/opensearch/searchrelevance/model/QueryWithReference.java

src/main/java/org/opensearch/searchrelevance/rest/RestPutJudgmentAction.java

src/main/java/org/opensearch/searchrelevance/rest/RestPutQuerySetAction.java

src/main/java/org/opensearch/searchrelevance/transport/judgment/PutLlmJudgmentRequest.java

src/main/java/org/opensearch/searchrelevance/transport/queryset/PutQuerySetTransportAction.java

src/main/java/org/opensearch/searchrelevance/utils/TextValidationUtil.java

src/main/java/org/opensearch/searchrelevance/executors/ExperimentTaskContext.java

src/main/java/org/opensearch/searchrelevance/utils/RatingOutputProcessor.java

src/main/java/org/opensearch/searchrelevance/dao/JudgmentCacheDao.java

src/main/java/org/opensearch/searchrelevance/ml/MLInputOutputTransformer.java

src/main/java/org/opensearch/searchrelevance/judgments/LlmJudgmentsProcessor.java

epugh

Worked through some of the files.. The qa tests did run for me. I think right now I worry about the amount of plumbing in the qa/bwc stuff. (Should the dir be named just bwc instead of qa?). Also, and maybe too late for this, part are there any well maintained Java projects that handle LLM integration that we should be leveraging? LangChain4j etc? We are definitly integrating a very low level direct manner! Though maybe that is our style?

formatter/formatting.gradle

gradle.properties

qa/README.md

qa/build.gradle

src/main/java/org/opensearch/searchrelevance/common/MLConstants.java

Signed-off-by: Chloe Gao <[email protected]>

fen-qin

Overall looks great. please add index mapping

src/main/java/org/opensearch/searchrelevance/model/JudgmentCache.java

src/main/java/org/opensearch/searchrelevance/common/RatingOutputProcessor.java

src/main/java/org/opensearch/searchrelevance/model/QueryWithReference.java

Signed-off-by: Chloe Gao <[email protected]>

martin-gaievski

Overall code looks code with few comments, special thanks for adding BWC tests capabilities.

src/main/java/org/opensearch/searchrelevance/judgments/LlmJudgmentsProcessor.java

src/main/java/org/opensearch/searchrelevance/utils/RatingOutputProcessor.java

.github/workflows/backwards_compatibility_tests_workflow.yml

src/main/java/org/opensearch/searchrelevance/judgments/LlmJudgmentsProcessor.java

src/main/java/org/opensearch/searchrelevance/utils/ParserUtils.java

src/main/java/org/opensearch/searchrelevance/utils/RatingOutputProcessor.java

src/main/java/org/opensearch/searchrelevance/ml/MLInputOutputTransformer.java

src/main/java/org/opensearch/searchrelevance/transport/queryset/PutQuerySetTransportAction.java

src/main/java/org/opensearch/searchrelevance/ml/MLAccessor.java

Signed-off-by: Chloe Gao <[email protected]>

src/main/java/org/opensearch/searchrelevance/rest/RestPutJudgmentAction.java

src/main/java/org/opensearch/searchrelevance/utils/ParserUtils.java

Signed-off-by: Chloe Gao <[email protected]>

martin-gaievski

Good job, thank you for addressing my comments.

One ask from my side - please open a new issue for improving logic for doing truncation.

chloewqg · 2025-11-13T23:49:10Z

Good job, thank you for addressing my comments.

One ask from my side - please open a new issue for improving logic for doing truncation.

Yeah issue created. #314

fen-qin

LGTM. Would you like to link the following up issues to a centralized issue ?

epugh · 2025-11-15T13:28:21Z

This will be fantantic to have.

chloewqg · 2025-11-18T19:00:46Z

Investigation for Index Mapping Update:

We have two new fields added here, modelId and encodedPromptTemplate. Performed Investigation locally by reverting judgement cache json to old version and performed judgement calling. Here's the judgement cache index mapping:

curl -s "http://localhost:9200/.plugins-search-relevance-judgment-cache/_mapping?pretty" | python3 -m json.tool 2>/dev/null
{
    ".plugins-search-relevance-judgment-cache": {
        "mappings": {
            "properties": {
                "contextFieldsStr": {
                    "type": "keyword"
                },
                "documentId": {
                    "type": "keyword"
                },
                "encodedPromptTemplate": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "id": {
                    "type": "keyword"
                },
                "modelId": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "querySet": {
                    "type": "keyword"
                },
                "queryText": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "rating": {
                    "type": "keyword"
                },
                "timestamp": {
                    "type": "date",
                    "format": "strict_date_time"
                }
            }
        }
    }
}

Difference between keyword and text will come into play when there is upper characters. keyword type preserves upper chars and text convert all chars into lower case.

In encodedPromptTemplate since we use SHA256 to encode, there is no upper case char. So no risk.
In modelId , there could be upper case which will creates issue if we are searching with modelId in judgement cache. However, right now we don't fetch cache with modelId (See JudgmentCacheDao in this PR)

Conclusion is that no issues for now. But will be an issue if we want to use modelId as a condition in judgement cache.

heemin32 · 2025-11-18T19:08:21Z

The different field type from what we defined in judgment_cache.json is an issue imo. Otherwise, why not we define the type as text in that field instead of keyword?

chloewqg requested review from epugh, fen-qin, heemin32, martin-gaievski and wrigleyDan as code owners October 13, 2025 20:22

chloewqg force-pushed the llm_judge_template branch 3 times, most recently from 831f422 to 0c0500e Compare October 14, 2025 03:22

martin-gaievski reviewed Oct 17, 2025

View reviewed changes

fen-qin reviewed Oct 20, 2025

View reviewed changes

heemin32 reviewed Oct 20, 2025

View reviewed changes

src/main/java/org/opensearch/searchrelevance/judgments/LlmJudgmentsProcessor.java Outdated Show resolved Hide resolved

chloewqg force-pushed the llm_judge_template branch from 52e6a2a to eb70bb9 Compare October 22, 2025 17:07

chloewqg force-pushed the llm_judge_template branch from e399c4c to c71ce48 Compare October 29, 2025 19:12

chloewqg mentioned this pull request Oct 30, 2025

[FEATURE] Support multiple LLM connectos with proper rate limit #285

Open

fen-qin reviewed Oct 31, 2025

View reviewed changes

src/main/java/org/opensearch/searchrelevance/model/Judgment.java Outdated Show resolved Hide resolved

fen-qin reviewed Oct 31, 2025

View reviewed changes

src/main/java/org/opensearch/searchrelevance/judgments/LlmJudgmentsProcessor.java Outdated Show resolved Hide resolved

epugh reviewed Nov 1, 2025

View reviewed changes

chloewqg added 8 commits November 4, 2025 00:17

[SRW] LLM Judge Dynamic Template Backend

e2b5816

Signed-off-by: Chloe Gao <[email protected]>

Add Integration Test for LLM Judgement Template

1a9af9b

Signed-off-by: Chloe Gao <[email protected]>

Address Comments

d2d74aa

Signed-off-by: Chloe Gao <[email protected]>

Handle QuerySet Entry in both old and new format

7c161c9

Signed-off-by: Chloe Gao <[email protected]>

Add validation utils for input query set

367a2df

Signed-off-by: Chloe Gao <[email protected]>

llm judgement bwc test

6c0a5a3

Signed-off-by: Chloe Gao <[email protected]>

Fix BWC Tests

ded40f8

Signed-off-by: Chloe Gao <[email protected]>

Fix BWC Test Error Partially

13e1627

Signed-off-by: Chloe Gao <[email protected]>

chloewqg added 9 commits November 4, 2025 00:17

Add Fall back Mechanism for Model that doesn't accept response format

1c7a9c8

Signed-off-by: Chloe Gao <[email protected]>

Fix bwc config

37ebd1e

Signed-off-by: Chloe Gao <[email protected]>

Fix issues

fd131a7

Signed-off-by: Chloe Gao <[email protected]>

fix

ec3b7f6

Signed-off-by: Chloe Gao <[email protected]>

fix

81bc422

Signed-off-by: Chloe Gao <[email protected]>

Address Comments

91ac340

Signed-off-by: Chloe Gao <[email protected]>

Fix few bugs in Prompt Template

746a416

Signed-off-by: Chloe Gao <[email protected]>

Remove QA README

359ecc7

Signed-off-by: Chloe Gao <[email protected]>

Fix Forbidden API failure

a908cb8

Signed-off-by: Chloe Gao <[email protected]>

chloewqg force-pushed the llm_judge_template branch from 1262da7 to a908cb8 Compare November 4, 2025 08:25

chloewqg added 3 commits November 4, 2025 00:49

Fix integ and bwc tests failure

8bb33ed

Signed-off-by: Chloe Gao <[email protected]>

Fix tests

1ca4c5c

Signed-off-by: Chloe Gao <[email protected]>

Fix

af2f55c

Signed-off-by: Chloe Gao <[email protected]>

fen-qin reviewed Nov 11, 2025

View reviewed changes

chloewqg added 2 commits November 11, 2025 09:40

address comments

153376f

Signed-off-by: Chloe Gao <[email protected]>

Fix GPT 3.5 calling

7f389ef

Signed-off-by: Chloe Gao <[email protected]>

martin-gaievski reviewed Nov 11, 2025

View reviewed changes

address comments

73a7e90

Signed-off-by: Chloe Gao <[email protected]>

martin-gaievski reviewed Nov 12, 2025

View reviewed changes

src/main/java/org/opensearch/searchrelevance/rest/RestPutJudgmentAction.java Outdated Show resolved Hide resolved

martin-gaievski reviewed Nov 12, 2025

View reviewed changes

src/main/java/org/opensearch/searchrelevance/utils/ParserUtils.java Show resolved Hide resolved

chloewqg added 2 commits November 12, 2025 22:30

Address comments

4f275fe

Signed-off-by: Chloe Gao <[email protected]>

fic

1c36fae

Signed-off-by: Chloe Gao <[email protected]>

martin-gaievski approved these changes Nov 13, 2025

View reviewed changes

fen-qin approved these changes Nov 14, 2025

View reviewed changes

chloewqg mentioned this pull request Nov 18, 2025

[Enhancement] Add description in Search Configuration #293

Merged

ajleong623 mentioned this pull request Nov 18, 2025

[PROPOSAL] Plugin Indexes Mappings Versioning #310

Open

[SRW] LLM Judge Dynamic Template Backend #264

Are you sure you want to change the base?

[SRW] LLM Judge Dynamic Template Backend #264

Uh oh!

Conversation

chloewqg commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Overall Description of changes

Key Changes

Customizable Prompt Templates

New Rating Type System

Enhanced Caching System

API Enhancements

Refactoring & Code Quality

End to End Testing Procedure

Step 1: Enable Workbench

Step 2: Create Test Products Index

Step 3: Load Sample Product Data

Step 4: Create Query Set with Custom Fields

Step 5: Create Multi-Field Search Configuration

Test 1: GPT-4 with SCORE0_1 Rating Type

Create Judgment with Custom Prompt Template

Verify Results (after 15 seconds)

Test 2: GPT-4 with RELEVANT_IRRELEVANT Rating Type

Create Judgment with Binary Rating

Verify Results

Issues Resolved

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wrigleyDan commented Oct 17, 2025

Uh oh!

fen-qin left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

heemin32 commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

heemin32 commented Oct 20, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

epugh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

chloewqg commented Oct 13, 2025 •

edited

Loading

fen-qin left a comment •

edited

Loading

heemin32 commented Oct 20, 2025 •

edited

Loading