Skip to content

perf(native): Avoid LIKE rewrites for prefix/suffix patterns in native execution#27363

Merged
pramodsatya merged 1 commit intoprestodb:masterfrom
pramodsatya:like_cpp
Mar 26, 2026
Merged

perf(native): Avoid LIKE rewrites for prefix/suffix patterns in native execution#27363
pramodsatya merged 1 commit intoprestodb:masterfrom
pramodsatya:like_cpp

Conversation

@pramodsatya
Copy link
Copy Markdown
Contributor

@pramodsatya pramodsatya commented Mar 18, 2026

Description

This change conditionally disables Presto's coordinator-side LIKE pattern rewrites when native-execution-enabled=true. Velox's OptimizedLike implementation provides superior performance for simple LIKE patterns compared to Presto's rewrites that decompose LIKE into SUBSTR/STRPOS calls.

Motivation and Context

Presto currently optimizes SQL LIKE patterns at the coordinator level:

  • 'foo%'SUBSTR(x, 1, len) = 'foo'
  • '%foo'SUBSTR(x, -len) = 'foo'
  • '%foo%'STRPOS(x, 'foo') != 0

When native execution is enabled, Velox's engine natively evaluates LIKE expressions using the OptimizedLike template class, which provides fast-path implementations:

  • Prefix/suffix matching: Direct memcmp on byte ranges (no character indexing overhead)
  • Substring matching: std::string_view::find for O(n) scanning

In contrast, Presto's SUBSTR/STRPOS rewrites require:

  • Character-position counting via UTF-8 codepoint iteration
  • Byte-range lookups with explicit index conversion
  • Generalized string function evaluation overhead

For constant patterns, Velox's optimized paths are better performant. By letting Velox evaluate LIKE natively, we unlock Velox's optimization by eliminating unnecessary rewriting.

Impact

  • Coordinators without native execution enabled (default): LIKE rewrites continue as before
  • No semantic changes: Native LIKE evaluation is equivalent to the coordinator-side rewrites

Test Plan

  • Validated with existing TestRowExpressionTranslator test suite

Release Notes

== NO RELEASE NOTE ==

Summary by Sourcery

Guard LIKE prefix/suffix rewrite logic in the SQL-to-row-expression translator with the native execution enablement flag to preserve expected behavior under native execution.

Enhancements:

  • Extend SQL-to-row-expression translation APIs and visitor to be aware of native execution enablement.
  • Skip LIKE prefix/suffix optimization rewrites when native execution is enabled while preserving existing behavior otherwise.

Summary by Sourcery

Gate coordinator-side LIKE prefix/suffix pattern rewrites on the native execution flag so that native execution uses Velox’s built-in LIKE evaluation.

Enhancements:

  • Extend SQL-to-row-expression translation APIs and visitor to propagate whether native execution is enabled.
  • Skip LIKE prefix/suffix optimization rewrites in the LIKE predicate visitor when native execution is enabled, retaining existing behavior otherwise.

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Mar 18, 2026
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Mar 18, 2026

Reviewer's Guide

Makes SQL-to-row-expression translation aware of native execution mode and disables LIKE prefix/suffix rewrites when native execution is enabled so that Velox can apply its own optimized LIKE evaluation.

Sequence diagram for LIKE translation with nativeExecutionEnabled flag

sequenceDiagram
    actor Client
    participant Coordinator
    participant SqlToRowExpressionTranslator
    participant Visitor
    participant VeloxEngine

    Client->>Coordinator: Submit query with LIKE predicate
    Coordinator->>Coordinator: Determine nativeExecutionEnabled
    Coordinator->>SqlToRowExpressionTranslator: translate(expression, types, layout, functionAndTypeManager, session, context)
    SqlToRowExpressionTranslator->>SqlToRowExpressionTranslator: translate(..., user, transactionId, sqlFunctionProperties, sessionFunctions, context, nativeExecutionEnabled)
    SqlToRowExpressionTranslator->>Visitor: new Visitor(..., nativeExecutionEnabled)
    SqlToRowExpressionTranslator->>Visitor: process(expression, context)
    Visitor->>Visitor: visitLikePredicate(node, context)
    alt nativeExecutionEnabled == false
        Visitor->>Visitor: generateLikePrefixOrSuffixMatch(value, pattern)
        Visitor-->>SqlToRowExpressionTranslator: rewritten SUBSTR/STRPOS RowExpression
    else nativeExecutionEnabled == true
        Visitor->>Visitor: skip generateLikePrefixOrSuffixMatch
        Visitor-->>SqlToRowExpressionTranslator: LIKE RowExpression without rewrite
        SqlToRowExpressionTranslator-->>VeloxEngine: pass LIKE RowExpression
        VeloxEngine-->>Coordinator: evaluate LIKE via OptimizedLike
    end
    Coordinator-->>Client: Return query results
Loading

Class diagram for SqlToRowExpressionTranslator nativeExecutionEnabled wiring

classDiagram
    class SqlToRowExpressionTranslator {
        +RowExpression translate(Expression expression, Map~NodeRef_Expression_, Type~ types, Map~VariableReferenceExpression, Integer~ layout, FunctionAndTypeManager functionAndTypeManager, Session session, Context context)
        +RowExpression translate(Expression expression, Map~NodeRef_Expression_, Type~ types, Map~VariableReferenceExpression, Integer~ layout, FunctionAndTypeManager functionAndTypeManager, Optional~String~ user, Optional~TransactionId~ transactionId, SqlFunctionProperties sqlFunctionProperties, Map~SqlFunctionId, SqlInvokedFunction~ sessionFunctions, Context context)
        -RowExpression translate(Expression expression, Map~NodeRef_Expression_, Type~ types, Map~VariableReferenceExpression, Integer~ layout, FunctionAndTypeManager functionAndTypeManager, Optional~String~ user, Optional~TransactionId~ transactionId, SqlFunctionProperties sqlFunctionProperties, Map~SqlFunctionId, SqlInvokedFunction~ sessionFunctions, Context context, boolean nativeExecutionEnabled)
    }

    class Visitor {
        -Map~NodeRef_Expression_, Type~ types
        -Map~VariableReferenceExpression, Integer~ layout
        -FunctionAndTypeResolver functionAndTypeResolver
        -Optional~String~ user
        -Optional~TransactionId~ transactionId
        -SqlFunctionProperties sqlFunctionProperties
        -Map~SqlFunctionId, SqlInvokedFunction~ sessionFunctions
        -FunctionResolution functionResolution
        -boolean nativeExecutionEnabled

        +Visitor(Map~NodeRef_Expression_, Type~ types, Map~VariableReferenceExpression, Integer~ layout, FunctionAndTypeManager functionAndTypeManager, Optional~String~ user, Optional~TransactionId~ transactionId, SqlFunctionProperties sqlFunctionProperties, Map~SqlFunctionId, SqlInvokedFunction~ sessionFunctions, boolean nativeExecutionEnabled)
        +RowExpression visitLikePredicate(LikePredicate node, Context context)
        -RowExpression generateLikePrefixOrSuffixMatch(RowExpression value, RowExpression pattern)
    }

    SqlToRowExpressionTranslator ..> Visitor : creates
Loading

File-Level Changes

Change Details Files
Thread native-execution enablement through SqlToRowExpressionTranslator and its Visitor so translation can behave differently under native execution.
  • Extend the public translate(...) overload that takes a Session to pass a native-execution-enabled flag derived from the session.
  • Introduce a new private translate(...) overload that accepts a nativeExecutionEnabled boolean and delegate the existing public non-Session translate(...) method to it with a default of false.
  • Update the Visitor constructor to accept and store a nativeExecutionEnabled flag for use during expression translation.
presto-main-base/src/main/java/com/facebook/presto/sql/relational/SqlToRowExpressionTranslator.java
Guard LIKE prefix/suffix rewrite logic so it is only applied when native execution is disabled.
  • Extend Visitor state with a nativeExecutionEnabled field used during LIKE predicate translation.
  • Wrap the generateLikePrefixOrSuffixMatch(...) call in visitLikePredicate with a check that only executes the rewrite when native execution is not enabled.
  • Preserve existing behavior for non-native execution by leaving the LIKE rewrite path unchanged when nativeExecutionEnabled is false.
presto-main-base/src/main/java/com/facebook/presto/sql/relational/SqlToRowExpressionTranslator.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@pramodsatya pramodsatya marked this pull request as ready for review March 18, 2026 20:16
@pramodsatya pramodsatya requested review from a team, feilong-liu and jaystarshot as code owners March 18, 2026 20:16
@prestodb-ci prestodb-ci requested review from a team, Dilli-Babu-Godari and Joe-Abraham and removed request for a team March 18, 2026 20:16
@pramodsatya pramodsatya requested review from aditi-pandit, majetideepak and tdcmeehan and removed request for Dilli-Babu-Godari and Joe-Abraham March 18, 2026 20:16
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • Consider avoiding the extra boolean parameter on the private translate/Visitor constructor by threading native-execution awareness through Context or a small options object, which will scale better if additional execution-mode flags are added later.
  • It would be helpful to document in the visitLikePredicate logic (e.g., a short comment) that disabling the prefix/suffix rewrite under native execution intentionally defers to Velox's OptimizedLike implementation, so future maintainers understand why this early-return is gated.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Consider avoiding the extra boolean parameter on the private `translate`/`Visitor` constructor by threading native-execution awareness through `Context` or a small options object, which will scale better if additional execution-mode flags are added later.
- It would be helpful to document in the `visitLikePredicate` logic (e.g., a short comment) that disabling the prefix/suffix rewrite under native execution intentionally defers to Velox's `OptimizedLike` implementation, so future maintainers understand why this early-return is gated.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@pramodsatya
Copy link
Copy Markdown
Contributor Author

@majetideepak, sharing results from standalone microbenchmark comparing Velox's native LIKE evaluation (OptimizedLike in Re2Functions.cpp) against Presto's coordinator-side LIKE rewrites (SUBSTR/STRPOS decompositions). Built and run in Release mode on Apple M1, 10K rows × 50 iterations.

Results

LikeVsRewrite Microbenchmark
Vector size: 10000, Iterations: 50

======================================================================
  PREFIX ASCII:  like(x, 'hello_world%') vs substr(x,1,11)='hello_world'
======================================================================
  Expression                                    ns/row      relative
----------------------------------------------------------------------
  like_native (memcmp)                          9.75        1.00x
  substr_eq (Presto rewrite)                   28.95        0.34x

======================================================================
  PREFIX UTF-8:  like(x, 'élève%') vs substr(x,1,5)='élève'
======================================================================
  Expression                                    ns/row      relative
----------------------------------------------------------------------
  like_native (memcmp)                         42.85        1.00x
  substr_eq (Presto rewrite)                   87.11        0.49x

======================================================================
  SUFFIX ASCII:  like(x, '%hello_world') vs substr(x,-11)='hello_world'
======================================================================
  Expression                                    ns/row      relative
----------------------------------------------------------------------
  like_native (memcmp)                          7.18        1.00x
  substr_eq (Presto rewrite)                56449.70        0.00x

======================================================================
  SUFFIX UTF-8:  like(x, '%élève') vs substr(x,-5)='élève'
======================================================================
  Expression                                    ns/row      relative
----------------------------------------------------------------------
  like_native (memcmp)                          6.40        1.00x
  substr_eq (Presto rewrite)                35907.14        0.00x

======================================================================
  SUBSTRING ASCII:  like(x, '%hello_world%') vs strpos(x,'hello_world')>0
======================================================================
  Expression                                    ns/row      relative
----------------------------------------------------------------------
  like_native (sv::find)                     2999.87        1.00x
  strpos_gt0 (Presto rewrite)                1354.90        2.21x

======================================================================
  SUBSTRING UTF-8:  like(x, '%élève%') vs strpos(x,'élève')>0
======================================================================
  Expression                                    ns/row      relative
----------------------------------------------------------------------
  like_native (sv::find)                     1732.29        1.00x
  strpos_gt0 (Presto rewrite)                1494.59        1.16x

======================================================================
  EXACT ASCII:  like(x, 'hello_world') vs x='hello_world'
======================================================================
  Expression                                    ns/row      relative
----------------------------------------------------------------------
  like_native (memcmp)                          2.99        1.00x
  equality (std::string==)                      6.93        0.43x

======================================================================
  relative > 1.0  →  like_native is faster
  relative < 1.0  →  Presto rewrite is faster
======================================================================

Analysis

Pattern Velox approach Presto rewrite Velox (ns/row) Presto (ns/row) Speedup Why
'prefix%' ASCII memcmp first N bytes SUBSTR(x,1,len)=needle 9.75 28.95 3.0x Velox does fixed memcmp; Presto walks N UTF-8 codepoints to find byte offset
'prefix%' UTF-8 memcmp first N bytes SUBSTR(x,1,len)=needle 42.85 87.11 2.0x Same as above, larger strings reduce relative gap
'%suffix' ASCII memcmp last N bytes SUBSTR(x,-len)=needle 7.18 56,449.70 7,862x Velox is O(needle_len); Presto walks entire string counting codepoints to find total length
'%suffix' UTF-8 memcmp last N bytes SUBSTR(x,-len)=needle 6.40 35,907.14 5,611x Same — Presto's full-string UTF-8 length scan dominates
'%substr%' ASCII string_view::find STRPOS(x,needle)>0 2,999.87 1,354.90 ~1x (tie) Both do identical byte-level substring search
'%substr%' UTF-8 string_view::find STRPOS(x,needle)>0 1,732.29 1,494.59 ~1x (tie) Same byte-level search, run-to-run noise
'exact' size() + memcmp std::string== 2.99 6.93 2.3x Velox avoids string construction overhead

Suffix matching is the dominant win — Velox's O(needle_len) memcmp vs Presto's O(string_len) UTF-8 codepoint walk yields 3–4 orders of magnitude improvement. Prefix and exact patterns show solid 2–3x wins. Substring matching is an identical underlying operation and has same performance. No regressions in any pattern class.

Copy link
Copy Markdown
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pramodsatya. Looks good.

@pramodsatya pramodsatya merged commit e2d8bc9 into prestodb:master Mar 26, 2026
155 of 160 checks passed
@pramodsatya pramodsatya deleted the like_cpp branch March 26, 2026 00:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants