-
Notifications
You must be signed in to change notification settings - Fork 29
[SRW] LLM Judge Dynamic Template Backend #264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
fen-qin
merged 36 commits into
opensearch-project:main
from
chloewqg:llm_judge_template
Dec 16, 2025
Merged
Changes from all commits
Commits
Show all changes
36 commits
Select commit
Hold shift + click to select a range
5d337e7
[SRW] LLM Judge Dynamic Template Backend
chloewqg eb527e3
Add Integration Test for LLM Judgement Template
chloewqg 77d4dea
Address Comments
chloewqg f54b585
Handle QuerySet Entry in both old and new format
chloewqg 183fb01
Add validation utils for input query set
chloewqg 11d4f36
llm judgement bwc test
chloewqg a324202
Fix BWC Tests
chloewqg e036d01
Fix BWC Test Error Partially
chloewqg 6f4b630
Fix Build Grale
chloewqg 925202c
Fix BWC Cluster Upgrade Issue
chloewqg b669046
Fix BWC tests and add judgement creation in bwc tests
chloewqg 1a0f0eb
Fix Errors in Calling GPT with output schema. Remove SCORE 1-5 and co…
chloewqg aaeb674
Fix error
chloewqg 6435106
fix qa
chloewqg d2455dd
Fix error when upgrading to 3.4.0-SNAPSHOT
chloewqg 6bc1f83
Add BWC tests to GitHub CI
chloewqg d82f442
Add Fall back Mechanism for Model that doesn't accept response format
chloewqg ea79c92
Fix bwc config
chloewqg 74c49be
Fix issues
chloewqg cbde74f
fix
chloewqg 2941a5f
fix
chloewqg d08b0e2
Address Comments
chloewqg 652b010
Fix few bugs in Prompt Template
chloewqg da1a03f
Remove QA README
chloewqg a261178
Fix Forbidden API failure
chloewqg a856796
Fix integ and bwc tests failure
chloewqg 6fa18e4
Fix tests
chloewqg dbdad98
Fix
chloewqg ece614d
address comments
chloewqg 85d9eda
Fix GPT 3.5 calling
chloewqg 74164dc
address comments
chloewqg 63029a6
Address comments
chloewqg 6d0c9ca
fic
chloewqg 9467aa8
update judgement cache json version mapping
chloewqg ee55301
Fix version
chloewqg ad770f1
Remove redundant testIndicesHaveExpectedSchemaVersions test
chloewqg File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
49 changes: 49 additions & 0 deletions
49
.github/workflows/backwards_compatibility_tests_workflow.yml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,49 @@ | ||
| name: Backwards Compatibility Tests SearchRelevance | ||
| on: | ||
| push: | ||
| branches: | ||
| - "*" | ||
| - "feature/**" | ||
| pull_request: | ||
| branches: | ||
| - "*" | ||
| - "feature/**" | ||
|
|
||
| jobs: | ||
| Get-CI-Image-Tag: | ||
| uses: opensearch-project/opensearch-build/.github/workflows/get-ci-image-tag.yml@main | ||
| with: | ||
| product: opensearch | ||
|
|
||
| Rolling-Upgrade-BWCTests-SearchRelevance: | ||
| needs: Get-CI-Image-Tag | ||
| strategy: | ||
| matrix: | ||
| java: [21] | ||
| os: [ubuntu-latest] | ||
| bwc_version: ["3.3.0-SNAPSHOT"] | ||
| opensearch_version: ["3.4.0-SNAPSHOT"] | ||
|
|
||
| name: SearchRelevance Rolling-Upgrade BWC Tests | ||
| runs-on: ${{ matrix.os }} | ||
| container: | ||
| image: ${{ needs.Get-CI-Image-Tag.outputs.ci-image-version-linux }} | ||
| options: ${{ needs.Get-CI-Image-Tag.outputs.ci-image-start-options }} | ||
| env: | ||
| BWC_VERSION_ROLLING_UPGRADE: ${{ matrix.bwc_version }} | ||
|
|
||
| steps: | ||
| - name: Run start commands | ||
| run: ${{ needs.Get-CI-Image-Tag.outputs.ci-image-start-command }} | ||
| - name: Checkout search-relevance | ||
| uses: actions/checkout@v4 | ||
| - name: Setup Java ${{ matrix.java }} | ||
| uses: actions/setup-java@v4 | ||
| with: | ||
| distribution: 'temurin' | ||
| java-version: ${{ matrix.java }} | ||
| - name: Run SearchRelevance Rolling-Upgrade BWC Tests | ||
| run: | | ||
| chown -R 1000:1000 `pwd` | ||
| echo "Running rolling-upgrade backwards compatibility tests..." | ||
| su `id -un 1000` -c "./gradlew :qa:rolling-upgrade:testRollingUpgrade -Dtests.bwc.version=${{ matrix.bwc_version }} --refresh-dependencies --no-daemon" | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
chloewqg marked this conversation as resolved.
Show resolved
Hide resolved
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,262 @@ | ||
| /* | ||
| * Copyright OpenSearch Contributors | ||
| * SPDX-License-Identifier: Apache-2.0 | ||
| */ | ||
|
|
||
| import org.apache.tools.ant.taskdefs.condition.Os | ||
|
|
||
| import java.util.concurrent.Callable | ||
| import java.nio.file.Path | ||
|
|
||
| apply plugin: 'opensearch.testclusters' | ||
| apply plugin: 'opensearch.build' | ||
| apply plugin: 'opensearch.rest-test' | ||
| apply plugin: 'io.freefair.lombok' | ||
| apply plugin: 'opensearch.java-agent' | ||
|
|
||
| // Disable a few tasks that come with build | ||
| build.enabled = false | ||
| integTest.enabled = false | ||
| test.enabled = false | ||
| assemble.enabled = false | ||
| dependenciesInfo.enabled = false | ||
| dependencyLicenses.enabled = false | ||
| thirdPartyAudit.enabled = false | ||
| validateNebulaPom.enabled = false | ||
| loggerUsageCheck.enabled = false | ||
|
|
||
| java { | ||
| targetCompatibility = JavaVersion.VERSION_21 | ||
| sourceCompatibility = JavaVersion.VERSION_21 | ||
| } | ||
|
|
||
| configurations { | ||
| zipArchive | ||
| } | ||
|
|
||
| repositories { | ||
| mavenLocal() | ||
| maven { url "https://ci.opensearch.org/ci/dbc/snapshots/maven/" } | ||
| mavenCentral() | ||
| maven { url "https://plugins.gradle.org/m2/" } | ||
| } | ||
|
|
||
| def knnJarDirectory = "$rootDir/build/dependencies/opensearch-knn" | ||
|
|
||
| dependencies { | ||
| api "org.opensearch:opensearch:${opensearch_version}" | ||
| zipArchive group: 'org.opensearch.plugin', name:'opensearch-job-scheduler', version: "${opensearch_build}" | ||
| zipArchive group: 'org.opensearch.plugin', name:'opensearch-knn', version: "${opensearch_build}" | ||
| zipArchive group: 'org.opensearch.plugin', name:'opensearch-ml-plugin', version: "${opensearch_build}" | ||
| compileOnly fileTree(dir: knnJarDirectory, include: ["opensearch-knn-${opensearch_build}.jar", "remote-index-build-client-${opensearch_build}.jar"]) | ||
| compileOnly group: 'com.google.guava', name: 'guava', version:'33.4.8-jre' | ||
| compileOnly group: 'org.apache.commons', name: 'commons-lang3', version: '3.20.0' | ||
| // json-path 2.10.0 depends on slf4j 2.0.11, which conflicts with the version used by OpenSearch core. | ||
| // Excluding slf4j here since json-path is only used for testing, and logging failures in this context are acceptable. | ||
| testRuntimeOnly('com.jayway.jsonpath:json-path:2.10.0') { | ||
| // OpenSearch core is using slf4j 1.7.36. Therefore, we cannot change the version here. | ||
| exclude group: 'org.slf4j', module: 'slf4j-api' | ||
| exclude group: 'net.minidev', module: 'json-smart' | ||
| } | ||
| testRuntimeOnly group: 'net.minidev', name:'json-smart', version: "${versions.json_smart}" | ||
| api "org.apache.logging.log4j:log4j-api:${versions.log4j}" | ||
| api "org.apache.logging.log4j:log4j-core:${versions.log4j}" | ||
| api "junit:junit:${versions.junit}" | ||
| testImplementation "org.opensearch.test:framework:${opensearch_version}" | ||
| testImplementation(testFixtures(rootProject)) | ||
| } | ||
|
|
||
| ext { | ||
| licenseFile = rootProject.file('LICENSE.txt') | ||
| noticeFile = rootProject.file('NOTICE.txt') | ||
| } | ||
|
|
||
| def tmp_dir = project.file('build/private/artifact_tmp').absoluteFile | ||
| tmp_dir.mkdirs() | ||
| String default_bwc_version = System.getProperty("bwc.version", rootProject.ext.default_bwc_version) | ||
| String search_relevance_bwc_version = System.getProperty("tests.bwc.version", default_bwc_version) | ||
| boolean isSnapshot = search_relevance_bwc_version.contains("-SNAPSHOT") | ||
| String search_relevance_bwc_version_no_qualifier = isSnapshot ? search_relevance_bwc_version - "-SNAPSHOT" : search_relevance_bwc_version | ||
|
|
||
| String os_platform = "linux" | ||
| String artifact_type = "tar" | ||
| String file_ext = "tar.gz" | ||
|
|
||
| if (Os.isFamily(Os.FAMILY_WINDOWS)) { | ||
| os_platform = "windows" | ||
| artifact_type = "zip" | ||
| file_ext = "zip" | ||
| } | ||
|
|
||
| ext{ | ||
| plugins = [provider(new Callable<RegularFile>(){ | ||
| @Override | ||
| RegularFile call() throws Exception { | ||
| return new RegularFile() { | ||
| @Override | ||
| File getAsFile() { | ||
| return configurations.zipArchive.asFileTree.matching{include "**/opensearch-job-scheduler-${opensearch_build}.zip"}.getSingleFile() | ||
| } | ||
| } | ||
| } | ||
| }), provider(new Callable<RegularFile>(){ | ||
| @Override | ||
| RegularFile call() throws Exception { | ||
| return new RegularFile() { | ||
| @Override | ||
| File getAsFile() { | ||
| return configurations.zipArchive.asFileTree.matching{include "**/opensearch-ml-plugin-${opensearch_build}.zip"}.getSingleFile() | ||
| } | ||
| } | ||
| } | ||
| }), provider(new Callable<RegularFile>(){ | ||
| @Override | ||
| RegularFile call() throws Exception { | ||
| return new RegularFile() { | ||
| @Override | ||
| File getAsFile() { | ||
| return configurations.zipArchive.asFileTree.matching{include "**/opensearch-knn-${opensearch_build}.zip"}.getSingleFile() | ||
| } | ||
| } | ||
| } | ||
| }), rootProject.tasks.bundlePlugin.archiveFile] | ||
| } | ||
|
|
||
| task deleteTempDirectories { | ||
| doFirst { | ||
| if (tmp_dir.exists()) { | ||
| File[] tempFiles = tmp_dir.listFiles() | ||
| if (tempFiles != null) { | ||
| for (File child : tempFiles) { | ||
| if (child.exists() && child.toString().contains("opensearch-")) { | ||
| project.delete(child) | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // Task to pull opensearch artifact from archive | ||
| task pullOpensearchArtifact { | ||
| dependsOn "deleteTempDirectories" | ||
|
|
||
| doLast{ | ||
| ext{ | ||
| if (isSnapshot) { | ||
| srcUrl = "https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/${search_relevance_bwc_version_no_qualifier}/latest/${os_platform}/x64/${artifact_type}/dist/opensearch/opensearch-${search_relevance_bwc_version_no_qualifier}-${os_platform}-x64.${file_ext}" | ||
| } else { | ||
| srcUrl = "https://artifacts.opensearch.org/releases/bundle/opensearch/${search_relevance_bwc_version}/opensearch-${search_relevance_bwc_version}-${os_platform}-x64.${file_ext}" | ||
| } | ||
| } | ||
| ant.get( | ||
| src: srcUrl, | ||
| dest: tmp_dir.absolutePath, | ||
| httpusecaches: false | ||
| ) | ||
| copy { | ||
| if (Os.isFamily(Os.FAMILY_WINDOWS)) { | ||
| from zipTree(Path.of(tmp_dir.absolutePath, "opensearch-${search_relevance_bwc_version_no_qualifier}-${os_platform}-x64.${file_ext}")) | ||
| } else { | ||
| from tarTree(Path.of(tmp_dir.absolutePath, "opensearch-${search_relevance_bwc_version_no_qualifier}-${os_platform}-x64.${file_ext}")) | ||
| } | ||
| into tmp_dir.absolutePath | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // Task to pull ml plugin from archive | ||
| task pullMlCommonsBwcPlugin { | ||
| doLast { | ||
| copy { | ||
| from(Path.of(tmp_dir.absolutePath, "opensearch-${search_relevance_bwc_version_no_qualifier}", "plugins", "opensearch-ml")) | ||
| into Path.of(tmp_dir.absolutePath, "opensearch-ml") | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // Task to pull KNN plugin from archive | ||
| task pullKnnBwcPlugin { | ||
| dependsOn "pullOpensearchArtifact" | ||
|
|
||
| doLast { | ||
| copy { | ||
| from(Path.of(tmp_dir.absolutePath, "opensearch-${search_relevance_bwc_version_no_qualifier}", "plugins", "opensearch-knn")) | ||
| into Path.of(tmp_dir.absolutePath, "opensearch-knn") | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // Task to pull job scheduler plugin from archive | ||
| task pullJobSchedulerBwcPlugin { | ||
| dependsOn "pullKnnBwcPlugin" | ||
| doLast { | ||
| copy { | ||
| from(Path.of(tmp_dir.absolutePath, "opensearch-${search_relevance_bwc_version_no_qualifier}", "plugins", "opensearch-job-scheduler")) | ||
| into Path.of(tmp_dir.absolutePath, "opensearch-job-scheduler") | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // Task to pull search relevance plugin from archive | ||
| task pullBwcPlugin { | ||
| doLast { | ||
| copy { | ||
| from(Path.of(tmp_dir.absolutePath, "opensearch-${search_relevance_bwc_version_no_qualifier}", "plugins", "opensearch-search-relevance")) | ||
| into Path.of(tmp_dir.absolutePath, "opensearch-search-relevance") | ||
| } | ||
| delete Path.of(tmp_dir.absolutePath, "opensearch-${search_relevance_bwc_version_no_qualifier}"), java.nio.file.Path.of(tmp_dir.absolutePath, "opensearch-${search_relevance_bwc_version_no_qualifier}-${os_platform}-x64.${file_ext}") | ||
| } | ||
| } | ||
|
|
||
| // Task to zip opensearch-job-scheduler plugin from archive | ||
| task zipBwcJobSchedulerPlugin(type: Zip) { | ||
| dependsOn "pullJobSchedulerBwcPlugin" | ||
| from(Path.of(tmp_dir.absolutePath, "opensearch-job-scheduler")) | ||
| destinationDirectory = tmp_dir | ||
| archiveFileName = "opensearch-job-scheduler-${search_relevance_bwc_version_no_qualifier}.zip" | ||
| doLast { | ||
| delete Path.of(tmp_dir.absolutePath, "opensearch-job-scheduler") | ||
| } | ||
| } | ||
|
|
||
| // Task to zip ml-commons plugin from archive | ||
| task zipBwcMlCommonsPlugin(type: Zip) { | ||
| dependsOn "pullMlCommonsBwcPlugin" | ||
| dependsOn "zipBwcJobSchedulerPlugin" | ||
| from(Path.of(tmp_dir.absolutePath, "opensearch-ml")) | ||
| destinationDirectory = tmp_dir | ||
| archiveFileName = "opensearch-ml-${search_relevance_bwc_version_no_qualifier}.zip" | ||
| doLast { | ||
| delete Path.of(tmp_dir.absolutePath, "opensearch-ml") | ||
| } | ||
| } | ||
|
|
||
| // Task to zip knn plugin from archive | ||
| task zipBwcKnnPlugin(type: Zip) { | ||
| dependsOn "pullKnnBwcPlugin" | ||
| dependsOn "zipBwcMlCommonsPlugin" | ||
| from(Path.of(tmp_dir.absolutePath, "opensearch-knn")) | ||
| destinationDirectory = tmp_dir | ||
| archiveFileName = "opensearch-knn-${search_relevance_bwc_version_no_qualifier}.zip" | ||
| doLast { | ||
| delete Path.of(tmp_dir.absolutePath, "opensearch-knn") | ||
| } | ||
| } | ||
|
|
||
| // Task to zip search relevance plugin from archive | ||
| task zipBwcPlugin(type: Zip) { | ||
| dependsOn "zipBwcKnnPlugin" | ||
| dependsOn "pullBwcPlugin" | ||
| from(Path.of(tmp_dir.absolutePath, "opensearch-search-relevance")) | ||
| destinationDirectory = tmp_dir | ||
| archiveFileName = "opensearch-search-relevance-${search_relevance_bwc_version_no_qualifier}.zip" | ||
| doLast { | ||
| delete Path.of(tmp_dir.absolutePath, "opensearch-search-relevance") | ||
| } | ||
| } | ||
|
|
||
|
|
||
| task bwcTestSuite { | ||
| dependsOn ":qa:rolling-upgrade:testRollingUpgrade" | ||
| } |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.