[RFC]: Query Rewriting, Logical Planning, and Cost-Based Execution in OpenSearch Core

### Is your feature request related to a problem? Please describe

The current execution path is very straightforward: we take the DSL, run it through `SearchModule`’s registered `QueryParser`s, and end up with a `QueryBuilder` tree. We then call `toQuery()` on that tree to build Lucene `Query` objects, hand them to Lucene, and Lucene decides execution order at the scorer level.

Lucene’s scorer ordering comes from `Weight.scorer()` and `scorerCost()`, but that’s local to each scorer and doesn’t give us cross-clause control. There’s no point in the pipeline where we normalize the query tree, estimate cost across clauses, or choose an execution strategy before Lucene starts pulling scorers.

That means:
- We can’t ensure that the most selective filters run first.
- We can’t prefer vector-first or term-first for hybrid queries.
- We have no central place to collapse redundant clauses or apply systematic rewrites. Current rewrites happen in
`QueryPhase.preProcess()` and we should ideally centralise it.
- We can’t explain to ourselves or to users why a query was executed in a certain order.

---

## Why this is a real problem

The simplest example:  
```json
{
  "bool": {
    "filter": [
      {"knn": {"embedding": {"vector": [...], "k": 10}}},
      {"term": {"status": "error"}},
      {"range": {"timestamp": {"gte": "2023"}}}
    ]
  }
}
```
If Lucene’s scorer chooses to lead with the range, then term, then KNN, the KNN work is done on a huge candidate set. If we flipped it to KNN → term → range, we’d do far less work for the same result.

In practice, this problem shows up in:
- Monitoring/log queries: big time ranges + filters + vector matches for anomaly search.
- Security queries: IP/network ranges with expensive scripted scoring + selective terms.
- Multi-field disjunctions where some fields are sparse and others are dense.

---



### Describe the solution you'd like

## What we add: a planning layer

The idea is to insert one clear stage between “DSL parsed” and “Lucene query built”:

1. **Rewriting** – One place to apply all query simplifications and normalizations. Instead of relying on each `QueryBuilder`’s ad hoc `rewrite()` method, we have a `QueryRewriteService` that walks the tree and applies a sequence of pluggable rewriters.

    Examples:
    - Flatten nested bools:
      ```json
      {"bool": {"filter": [{"bool": {"filter": [ ... ]}}]}}
      ```
      → single `bool` with all filters at one level.
    - Merge `term` clauses on the same field into a `terms` query.
    - Drop no-op `match_all` in `must` or `filter` contexts.
    - Add fallback terms for KNN queries (e.g., centroid tokens) if configured.

    Each rewriter is a small class:
    ```java
    interface QueryRewriter {
        QueryBuilder rewrite(QueryBuilder qb, RewriteContext ctx);
    }
    ```
    Rewriters are chained and order matters. They operate only on the builder tree, no Lucene objects yet.

---

2. **Logical plan** – After rewrite, we walk the builder tree into our own lightweight plan structure. This is where we identify clause type, field, nesting, and assign hints.

    Example node:
    ```java
    class LogicalQueryNode {
        QueryBuilder clause;
        ClauseType type; // TERM, RANGE, KNN, SCRIPT, EXISTS, etc.
        String field;
        boolean isNested;
        ExecutionHint hint; // FILTER, VECTOR, SCRIPT
        double estimatedCost;
        List<LogicalQueryNode> children;
    }
    ```
    This is not meant for execution — it’s for reasoning about the query before we commit to Lucene’s representation.

    A `bool.filter` with 5 clauses becomes 5 sibling `LogicalQueryNode`s. A `nested` query becomes a parent with its own children. For `should` clauses we record their grouping but can still estimate each branch’s cost.

---

3. **Cost estimation** – We now attach estimated costs to each logical node. The goal isn’t perfect precision; it’s “cheap vs expensive” in a way that lets us order intelligently.

    Inputs for cost estimation:
    - `FieldStats` from index mappings/statistics (doc count, cardinality, min/max for numerics).
    - Lucene `IndexReader.docFreq()` for term queries if stats aren’t available. Alternative can be `IndexSearcher.collectionStatistics()` since `IndexReader.docFreq()` can be expensive.
    - Fixed costs for types like KNN or script where we can’t derive from stats.

    Example heuristic table:
    | Type                        | Cost      |
    |-----------------------------|-----------|
    | term (low-card keyword)     | 10_000    |
    | term (high-card text)       | 500_000  |
    | range on timestamp          | 1_000_000 |
    | exists                      | 100_000  |
    | knn                         | 500_000  |
    | script                      | 50_000_000|

    These numbers are just units for comparison — the scale doesn’t matter as long as it’s consistent.

    We can improve this later with:
    - Per-field dynamic baselines (from real query logs).
    - Cost models that factor in `size`, `from`, and aggregation load.
    - ML-predicted costs from query features.

---

4. **Physical plan** – Now we decide:
    - Clause execution order (e.g., sort filters by ascending cost).
    - Strategy:
        - **TERM_FIRST** — run selective term/range filters, then vector.
        - **VECTOR_FIRST** — run KNN first, post-filter results.
        - **PARALLEL** — evaluate branches independently (disjunction-heavy queries).
        - **STAGED** — KNN → lexical filter → script rerank.
    - Scorer choice: `BooleanScorer2` (good for disjunctions), `ConjunctionDISI` (for conjunction-heavy queries), `ImpactScorer` (if impact-based skipping helps).
    - Collector choice: `TopDocsCollector`, `TotalHitCountCollector`, or multi-collector for staged execution.
    - Early termination: enable if topK and plan predicts low result volume.

    Plan example:
    ```json
    {
      "clauses": [
        {"field": "status", "type": "term", "cost": 10000},
        {"field": "embedding", "type": "knn", "cost": 500000},
        {"field": "timestamp", "type": "range", "cost": 1000000}
      ],
      "execution_order": ["status", "embedding", "timestamp"],
      "strategy": "vector_first",
      "scorer": "BooleanScorer2",
      "early_termination": true
    }
    ```

---

## Where it fits

This entire flow sits inside `SearchService` before the call to `SearchExecutionContext.toQuery()`. We rewrite builders, build a logical plan, attach costs, sort clauses, decide strategy, and only then convert to Lucene `Query` objects. That way, Lucene sees exactly the structure we want it to execute.

No changes are needed in Lucene itself — we’re just controlling our side of the API.  All changes are internal to query execution. Existing queries continue to work identically with potential performance improvements.

---

## Rollout plan

- **Phase 1**: Implement `QueryRewriteService` and a few basic rewriters (flatten bools, merge terms, drop match_all).
- **Phase 2**: Build the `LogicalPlanBuilder` and plug in cost estimation.
- **Phase 3**: Implement the physical planner with clause ordering and basic strategy selection.
- Later: shard-aware planning (take shard-level stats into account), join planning, ML-driven strategies, adaptive replanning mid-query.

---

## Why this is worth it

Even just Phase 3 with a simple heuristic cost model will improve hybrid search and filter-heavy queries noticeably. We’ll also finally be able to debug clause order and scorer selection, which is currently a black box. This sets the stage for more advanced features — streaming aggregations, join optimizations, vector reranking — without bolting them onto an uncontrolled execution path.

This is a standard pattern - statistics based cost optimisers exist in databases like Postgres and Spark.

 The QueryRewriter and cost estimation interfaces allow plugins to contribute domain-specific optimizations.

---

If folks agree this is a good direction, I can get a branch up with the rewrite service and logical plan scaffolding so we can try it against real workloads before worrying about cost model tuning.


### Related component

_No response_

### Describe alternatives you've considered

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC]: Query Rewriting, Logical Planning, and Cost-Based Execution in OpenSearch Core #18906

Is your feature request related to a problem? Please describe

Why this is a real problem

Describe the solution you'd like

What we add: a planning layer

Where it fits

Rollout plan

Why this is worth it

Related component

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Type	Cost
term (low-card keyword)	10_000
term (high-card text)	500_000
range on timestamp	1_000_000
exists	100_000
knn	500_000
script	50_000_000

[RFC]: Query Rewriting, Logical Planning, and Cost-Based Execution in OpenSearch Core #18906

Description

Is your feature request related to a problem? Please describe

Why this is a real problem

Describe the solution you'd like

What we add: a planning layer

Where it fits

Rollout plan

Why this is worth it

Related component

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions