Add search relevance stats API #63

q-andy · 2025-06-05T00:03:16Z

Description

Adds stats API framework to search relevance backend. This is largely ported from the design in opensearch-project/neural-search#1196.

This stats API is default enabled.

Framework should be merged first, then additional stats can be added based on the examples.

API details

API	Method	Status	Mutating or Non-Mutating	Functionality
/_plugins/_search_relevance/stats	GET	New	Non-Mutating	Retrieves stat counters from nodes and returns them in response

Path Parameters

nodes: specify node ids to retrieve stats from (default all)
stats: specify stat names to retrieve (default all)

Query Parameters

include_metadata: boolean, include recent_interval/stat_type/minutes_since_last_event (default false)
flat_stat_paths: boolean, flatten the JSON response (default false)

Example calls

GET /_plugins/_search_relevance/stats
GET /_plugins/_search_relevance/stats/include_metadata=false
GET /_plugins/_search_relevance/<node_id>/stats/<stat_name>?include_metadata=true&flat_stat_paths=true

Cluster level setting to disable Stats API/Collection

PUT /_cluster/settings
{
    "persistent" : {
        "plugins.search_relevance.stats_enabled" : "false" // default true
    }
}

Example response:

{
	"_nodes": {
		"total": 1,
		"successful": 1,
		"failed": 0
	},
	"cluster_name": "integTest",
	"info": {
		"cluster_version": "3.1.0"
	},
	"all_nodes": {
		"judgments": {
			"import_judgment_rating_generations": 1,
			"llm_judgment_rating_generations": 0,
			"ubi_judgment_rating_generations": 1
		},
		"experiments": {
			"experiment_hybrid_optimizer_executions": 1,
			"experiment_pairwise_comparison_executions": 1,
			"experiment_executions": 3,
			"experiment_pointwise_evaluation_executions": 1
		}
	},
	"nodes": {
		"KXrV8g32RtSzWIenoJQd4g": {
			"judgments": {
				"import_judgment_rating_generations": 1,
				"llm_judgment_rating_generations": 0,
				"ubi_judgment_rating_generations": 1
			},
			"experiments": {
				"experiment_hybrid_optimizer_executions": 1,
				"experiment_pairwise_comparison_executions": 1,
				"experiment_executions": 3,
				"experiment_pointwise_evaluation_executions": 1
			}
		}
	}
}

GET /_plugins/search_relevance/stats?flat_stat_paths=true
{
	"_nodes": {
		"total": 1,
		"successful": 1,
		"failed": 0
	},
	"cluster_name": "integTest",
	"info": {
		"cluster_version": "3.1.0"
	},
	"all_nodes": {
		"experiments.experiment_pairwise_comparison_executions": 1,
		"experiments.experiment_pointwise_evaluation_executions": 1,
		"judgments.ubi_judgment_rating_generations": 1,
		"experiments.experiment_executions": 3,
		"judgments.llm_judgment_rating_generations": 0,
		"judgments.import_judgment_rating_generations": 1,
		"experiments.experiment_hybrid_optimizer_executions": 1
	},
	"nodes": {
		"KXrV8g32RtSzWIenoJQd4g": {
			"experiments.experiment_pairwise_comparison_executions": 1,
			"experiments.experiment_pointwise_evaluation_executions": 1,
			"judgments.ubi_judgment_rating_generations": 1,
			"experiments.experiment_executions": 3,
			"judgments.llm_judgment_rating_generations": 0,
			"experiments.experiment_hybrid_optimizer_executions": 1,
			"judgments.import_judgment_rating_generations": 1
		}
	}
}

Issues Resolved

Resolves #47

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

q-andy · 2025-06-06T00:30:21Z

There need to be additional stats added after this is merged to make use of this API. I added

import_judgment_score_generations
ubi_judgment_score_generations
llm_judgment_score_generations

As examples but there are probably other stats that are more vital to track.

epugh

I can't really speak to how stats work (new to me) but did review some specific to search relevance items.

We do need to extend this out to the more intensive operations, like issueing queries while running an evaluation.

Lastly, I am a bit surprised how many Java classes are in the org.opensearch.searchrelevance.stats package... Should many of these classes be imported from another project? org.opensearch.stats or something like that? Or be part of the base of a plugin?

Thank you for the contribution and making it possible to ship this in 3.1!

src/main/java/org/opensearch/searchrelevance/judgments/ImportJudgmentsProcessor.java

src/main/java/org/opensearch/searchrelevance/rest/RestSearchRelevanceStatsAction.java

codecov · 2025-06-06T15:34:19Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 0.00%. Comparing base (d75648d) to head (79e44b0).
Report is 9 commits behind head on main.

Additional details and impacted files

@@    Coverage Diff     @@
##   main   #63   +/-   ##
==========================
==========================

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

src/main/java/org/opensearch/searchrelevance/stats/events/EventStatName.java

src/main/java/org/opensearch/searchrelevance/stats/info/InfoStatsManager.java

...ain/java/org/opensearch/searchrelevance/transport/stats/SearchRelevanceStatsNodeRequest.java

...in/java/org/opensearch/searchrelevance/transport/stats/SearchRelevanceStatsNodeResponse.java

src/main/java/org/opensearch/searchrelevance/transport/stats/SearchRelevanceStatsRequest.java

...java/org/opensearch/searchrelevance/transport/stats/SearchRelevanceStatsTransportAction.java

src/main/java/org/opensearch/searchrelevance/stats/SearchRelevanceStatsInput.java

q-andy · 2025-06-07T00:25:48Z

Thanks for the comments all.

We do need to extend this out to the more intensive operations, like issueing queries while running an evaluation.

Yes, I'm still catching up on context here of what the common operations users of this plugin are doing. I was planning on implementing the framework first so that way the other maintainers with more context could implement stats using their best judgment. @epugh do you have some more examples of things that would be useful to understand how the plugin is being used?

I am a bit surprised how many Java classes are in the org.opensearch.searchrelevance.stats package... Should many of these classes be imported from another project? org.opensearch.stats or something like that? Or be part of the base of a plugin?

We've discussed this before on neural side, having a common package would be ideal, but there's a lot of refactoring that has to be done to make that happen. Currently each plugin implements stats APIs with their own business logic indepedently. Given the turn-around time for initial release here in 3.1 here, we can implement like this for now and refactor in the future.

Signed-off-by: Andy Qin <[email protected]> # Conflicts: # src/test/java/org/opensearch/searchrelevance/plugin/SearchRelevancePluginTests.java # Conflicts: # src/main/java/org/opensearch/searchrelevance/judgments/ImportJudgmentsProcessor.java # src/main/java/org/opensearch/searchrelevance/judgments/LlmJudgmentsProcessor.java # src/main/java/org/opensearch/searchrelevance/judgments/UbiJudgmentsProcessor.java # Conflicts: # src/main/java/org/opensearch/searchrelevance/judgments/ImportJudgmentsProcessor.java # src/main/java/org/opensearch/searchrelevance/judgments/LlmJudgmentsProcessor.java # src/main/java/org/opensearch/searchrelevance/judgments/UbiJudgmentsProcessor.java

martin-gaievski

Please add entry to changelog. Othere than that PR looks good to me

Signed-off-by: Andy Qin <[email protected]>

q-andy · 2025-06-10T19:24:26Z

@martin-gaievski @epugh @fen-qin updated changelog. Also added stats to track experiment type executions. Here's the result after running the demo.sh script:

{
	"_nodes": {
		"total": 1,
		"successful": 1,
		"failed": 0
	},
	"cluster_name": "integTest",
	"info": {
		"cluster_version": "3.1.0"
	},
	"all_nodes": {
		"judgments": {
			"import_judgment_rating_generations": 1,
			"llm_judgment_rating_generations": 0,
			"ubi_judgment_rating_generations": 1
		},
		"experiments": {
			"experiment_hybrid_optimizer_executions": 1,
			"experiment_pairwise_comparison_executions": 1,
			"experiment_executions": 3,
			"experiment_pointwise_evaluation_executions": 1
		}
	},
	"nodes": {
		"KXrV8g32RtSzWIenoJQd4g": {
			"judgments": {
				"import_judgment_rating_generations": 1,
				"llm_judgment_rating_generations": 0,
				"ubi_judgment_rating_generations": 1
			},
			"experiments": {
				"experiment_hybrid_optimizer_executions": 1,
				"experiment_pairwise_comparison_executions": 1,
				"experiment_executions": 3,
				"experiment_pointwise_evaluation_executions": 1
			}
		}
	}
}

fen-qin

lgtm

q-andy · 2025-06-10T19:47:44Z

Before merging let me also port the changes from opensearch-project/neural-search#1360

Signed-off-by: Andy Qin <[email protected]>

martin-gaievski · 2025-06-10T19:53:09Z

src/main/java/org/opensearch/searchrelevance/dao/ExperimentDao.java

+
+    private void recordStats(Experiment experiment) {
+        EventStatsManager.increment(EventStatName.EXPERIMENT_EXECUTIONS);
+        Optional.ofNullable(experimentTypeIncrementers.get(experiment.type())).ifPresent(Runnable::run);


Is this same pattern for other plugins, this total number of executions is effectively the sum of all metrics for individual types? This saves us efforts for aggregating them outside OS but increases numbers of KPIs we're collecting.

It's on a case by case basis but here it's the same pattern, the idea for this API design is to reduce external aggregations when possible. For hybrid query stats in neural search for example, we track the total number of normalization processor executions in addition to the normalization/combination technique breakdowns, see here

In this case the primary way users will be interacting with SRW is by running experiments. So I think have a coarse grained metric like this is worthwhile, even if we can aggregate the info in other ways

q-andy · 2025-06-10T21:57:59Z

Ready to merge after approvals

martin-gaievski

Looks good, thank you!

* Add stats API Signed-off-by: Andy Qin <[email protected]> * Rename all stats references to search relevance Signed-off-by: Andy Qin <[email protected]> * Add experiment stats Signed-off-by: Andy Qin <[email protected]> * Add include category parameters Signed-off-by: Andy Qin <[email protected]> --------- Signed-off-by: Andy Qin <[email protected]>

q-andy force-pushed the stats branch 2 times, most recently from 7dc0410 to 6ec1481 Compare June 5, 2025 23:58

q-andy marked this pull request as ready for review June 5, 2025 23:59

epugh reviewed Jun 6, 2025

View reviewed changes

epugh added the v3.1.0 label Jun 6, 2025