Skip to content

[RFC]: Query Rewriting, Logical Planning, and Cost-Based Execution in OpenSearch Core #18906

@atris

Description

@atris

Is your feature request related to a problem? Please describe

The current execution path is very straightforward: we take the DSL, run it through SearchModule’s registered QueryParsers, and end up with a QueryBuilder tree. We then call toQuery() on that tree to build Lucene Query objects, hand them to Lucene, and Lucene decides execution order at the scorer level.

Lucene’s scorer ordering comes from Weight.scorer() and scorerCost(), but that’s local to each scorer and doesn’t give us cross-clause control. There’s no point in the pipeline where we normalize the query tree, estimate cost across clauses, or choose an execution strategy before Lucene starts pulling scorers.

That means:

  • We can’t ensure that the most selective filters run first.
  • We can’t prefer vector-first or term-first for hybrid queries.
  • We have no central place to collapse redundant clauses or apply systematic rewrites. Current rewrites happen in
    QueryPhase.preProcess() and we should ideally centralise it.
  • We can’t explain to ourselves or to users why a query was executed in a certain order.

Why this is a real problem

The simplest example:

{
  "bool": {
    "filter": [
      {"knn": {"embedding": {"vector": [...], "k": 10}}},
      {"term": {"status": "error"}},
      {"range": {"timestamp": {"gte": "2023"}}}
    ]
  }
}

If Lucene’s scorer chooses to lead with the range, then term, then KNN, the KNN work is done on a huge candidate set. If we flipped it to KNN → term → range, we’d do far less work for the same result.

In practice, this problem shows up in:

  • Monitoring/log queries: big time ranges + filters + vector matches for anomaly search.
  • Security queries: IP/network ranges with expensive scripted scoring + selective terms.
  • Multi-field disjunctions where some fields are sparse and others are dense.

Describe the solution you'd like

What we add: a planning layer

The idea is to insert one clear stage between “DSL parsed” and “Lucene query built”:

  1. Rewriting – One place to apply all query simplifications and normalizations. Instead of relying on each QueryBuilder’s ad hoc rewrite() method, we have a QueryRewriteService that walks the tree and applies a sequence of pluggable rewriters.

    Examples:

    • Flatten nested bools:
      {"bool": {"filter": [{"bool": {"filter": [ ... ]}}]}}
      → single bool with all filters at one level.
    • Merge term clauses on the same field into a terms query.
    • Drop no-op match_all in must or filter contexts.
    • Add fallback terms for KNN queries (e.g., centroid tokens) if configured.

    Each rewriter is a small class:

    interface QueryRewriter {
        QueryBuilder rewrite(QueryBuilder qb, RewriteContext ctx);
    }

    Rewriters are chained and order matters. They operate only on the builder tree, no Lucene objects yet.


  1. Logical plan – After rewrite, we walk the builder tree into our own lightweight plan structure. This is where we identify clause type, field, nesting, and assign hints.

    Example node:

    class LogicalQueryNode {
        QueryBuilder clause;
        ClauseType type; // TERM, RANGE, KNN, SCRIPT, EXISTS, etc.
        String field;
        boolean isNested;
        ExecutionHint hint; // FILTER, VECTOR, SCRIPT
        double estimatedCost;
        List<LogicalQueryNode> children;
    }

    This is not meant for execution — it’s for reasoning about the query before we commit to Lucene’s representation.

    A bool.filter with 5 clauses becomes 5 sibling LogicalQueryNodes. A nested query becomes a parent with its own children. For should clauses we record their grouping but can still estimate each branch’s cost.


  1. Cost estimation – We now attach estimated costs to each logical node. The goal isn’t perfect precision; it’s “cheap vs expensive” in a way that lets us order intelligently.

    Inputs for cost estimation:

    • FieldStats from index mappings/statistics (doc count, cardinality, min/max for numerics).
    • Lucene IndexReader.docFreq() for term queries if stats aren’t available. Alternative can be IndexSearcher.collectionStatistics() since IndexReader.docFreq() can be expensive.
    • Fixed costs for types like KNN or script where we can’t derive from stats.

    Example heuristic table:

    Type Cost
    term (low-card keyword) 10_000
    term (high-card text) 500_000
    range on timestamp 1_000_000
    exists 100_000
    knn 500_000
    script 50_000_000

    These numbers are just units for comparison — the scale doesn’t matter as long as it’s consistent.

    We can improve this later with:

    • Per-field dynamic baselines (from real query logs).
    • Cost models that factor in size, from, and aggregation load.
    • ML-predicted costs from query features.

  1. Physical plan – Now we decide:

    • Clause execution order (e.g., sort filters by ascending cost).
    • Strategy:
      • TERM_FIRST — run selective term/range filters, then vector.
      • VECTOR_FIRST — run KNN first, post-filter results.
      • PARALLEL — evaluate branches independently (disjunction-heavy queries).
      • STAGED — KNN → lexical filter → script rerank.
    • Scorer choice: BooleanScorer2 (good for disjunctions), ConjunctionDISI (for conjunction-heavy queries), ImpactScorer (if impact-based skipping helps).
    • Collector choice: TopDocsCollector, TotalHitCountCollector, or multi-collector for staged execution.
    • Early termination: enable if topK and plan predicts low result volume.

    Plan example:

    {
      "clauses": [
        {"field": "status", "type": "term", "cost": 10000},
        {"field": "embedding", "type": "knn", "cost": 500000},
        {"field": "timestamp", "type": "range", "cost": 1000000}
      ],
      "execution_order": ["status", "embedding", "timestamp"],
      "strategy": "vector_first",
      "scorer": "BooleanScorer2",
      "early_termination": true
    }

Where it fits

This entire flow sits inside SearchService before the call to SearchExecutionContext.toQuery(). We rewrite builders, build a logical plan, attach costs, sort clauses, decide strategy, and only then convert to Lucene Query objects. That way, Lucene sees exactly the structure we want it to execute.

No changes are needed in Lucene itself — we’re just controlling our side of the API. All changes are internal to query execution. Existing queries continue to work identically with potential performance improvements.


Rollout plan

  • Phase 1: Implement QueryRewriteService and a few basic rewriters (flatten bools, merge terms, drop match_all).
  • Phase 2: Build the LogicalPlanBuilder and plug in cost estimation.
  • Phase 3: Implement the physical planner with clause ordering and basic strategy selection.
  • Later: shard-aware planning (take shard-level stats into account), join planning, ML-driven strategies, adaptive replanning mid-query.

Why this is worth it

Even just Phase 3 with a simple heuristic cost model will improve hybrid search and filter-heavy queries noticeably. We’ll also finally be able to debug clause order and scorer selection, which is currently a black box. This sets the stage for more advanced features — streaming aggregations, join optimizations, vector reranking — without bolting them onto an uncontrolled execution path.

This is a standard pattern - statistics based cost optimisers exist in databases like Postgres and Spark.

The QueryRewriter and cost estimation interfaces allow plugins to contribute domain-specific optimizations.


If folks agree this is a good direction, I can get a branch up with the rewrite service and logical plan scaffolding so we can try it against real workloads before worrying about cost model tuning.

Related component

No response

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    SearchSearch query, autocomplete ...etcSearch:PerformanceenhancementEnhancement or improvement to existing feature or requestlucene

    Type

    No type

    Projects

    Status

    🆕 New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions