Skip to content

Conversation

@pramodsatya
Copy link
Contributor

@pramodsatya pramodsatya commented Jun 5, 2024

Description

To support constant folding and consistent semantics between the Presto coordinator (Java) and the Presto C++ worker, it is necessary to use consistent expression evaluation. To support this, a native expression evaluation endpoint, v1/expressions, has been added to the Presto C++ sidecar, and a plugin has been created which can utilize Velox expression evaluation behind a standard ExpressionOptimizer.

Depends on Velox changes from facebookincubator/velox#13424 to add an ExpressionOptimizer for optimizing and constant folding TypedExprs. The optimized Velox core::TypedExpr is converted to a Presto protocol::RowExpression in the Presto native sidecar with helper class VeloxToPrestoExprConverter. The end to end flow between the coordinator and sidecar looks like:

RowExpression == (PrestoToVeloxExpr) ==> TypedExpr == (Velox API optimizeExpression()) ==> optimized `TypedExpr` == (VeloxToPrestoExprConverter) ==> optimized RowExpression

Please refer to this document for sidecar implementation details.

Motivation and Context

Consistency between C++ and Java semantics. Support for using C++ functions during constant folding of expressions in the planner. Please refer to RFC-0006.

Test Plan

Tests have been added by extending the TestRowExpressionInterpreter class to also test native expression evaluation in TestNativeExpressionOptimizer.java. However, this feature is still in Beta, and to support production workloads with complete certainty a fuzzer will be created to surface any remaining bugs with the integration at a later time.

Unit tests for simple expression conversions are added in VeloxToPrestoExprConverter.cpp.

Release Notes

== NO RELEASE NOTE ==

@pramodsatya pramodsatya changed the title [WIP] Add proxygen endpoint for expression evaluation [native] Add proxygen endpoint for expression evaluation Aug 5, 2024
@tdcmeehan tdcmeehan self-assigned this Aug 5, 2024
@pramodsatya pramodsatya force-pushed the expr_endpt branch 2 times, most recently from 822f79f to 2352cd4 Compare September 11, 2024 14:52
Copy link
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pramodsatya : Have done a first round of comments. Will read your tests more closely once you address the comments here.

Copy link
Contributor Author

@pramodsatya pramodsatya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback @aditi-pandit, addressed the comments. Could you please take another look?

facebook-github-bot pushed a commit to facebookincubator/velox that referenced this pull request Nov 16, 2024
Summary:
prestodb/presto#23331 adds a native expression optimizer to delegate expression evaluation to the native sidecar. This is used to constant fold expressions on the presto native sidecar, instead of on the presto java coordinator (which is the current behavior). prestodb/presto#22927 implements a proxygen endpoint to accept `RowExpression`s from `NativeSidecarExpressionInterpreter`, optimize them if possible (rewrite special form expressions), and compile the `RowExpression` to a velox expression with constant folding enabled. This velox expression is then converted back to a `RowExpression` and returned by the sidecar to the coordinator.

When the constant folded velox expression is of type `velox::exec::ConstantExpr`, we need to return a `RowExpression` of type `ConstantExpression`. This requires us to serialize the constant value from `velox::exec::ConstantExpr` into `protocol::ConstantExpression::valueBlock`. This can be done by serializing the constant value vector to presto SerializedPage::column format, followed by base 64 encoding the result (reverse engineering the logic from `Base64Util.cpp::readBlock`).

This PR adds a new function, `serializeSingleColumn`, to `PrestoVectorSerde`. This can be used to serialize input data from vectors containing a single element into a single PrestoPage column format (without the PrestoPage header).
This function is not added to `PrestoBatchVectorSerializer` alongside the existing `serialize` function since that would require adding it as a virtual function in `BatchVectorSerializer` as well, and this is not desirable since the `PrestoPage` format is not relevant in this base class. There is an existing function `deserializeSingleColumn` in `PrestoVectorSerde` which is used to deserialize data from a single column, since `serializeSingleColumn` performs the inverse operation to this function, it is added alongside it in `PrestoVectorSerde`.

Pull Request resolved: #10657

Reviewed By: amitkdutta

Differential Revision: D66044754

Pulled By: pedroerp

fbshipit-source-id: e509605067920f8207e5a3fa67552badc2ce0eba
@pramodsatya pramodsatya changed the title [native] Add proxygen endpoint for expression evaluation [native] Add row expression optimizer Dec 10, 2024
Copy link
Contributor Author

@pramodsatya pramodsatya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback @aditi-pandit, addressed the review comments. Could you please take another look?

@pramodsatya pramodsatya marked this pull request as ready for review December 10, 2024 17:49
@pramodsatya pramodsatya requested a review from a team as a code owner December 10, 2024 17:49
@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Dec 10, 2024
@prestodb-ci prestodb-ci requested review from a team and imjalpreet and removed request for a team December 10, 2024 17:49
@pramodsatya pramodsatya requested a review from presto-oss April 14, 2025 06:03
@pramodsatya pramodsatya changed the title [native] Add row expression optimizer [native] Add expression optimizer Apr 14, 2025
@pramodsatya
Copy link
Contributor Author

pramodsatya commented Apr 22, 2025

@aditi-pandit, @czentgr, could you please help review this PR? Added an API to constant fold TypedExpr, as discussed in the native worker group meeting (documented here).

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the doc! Some minor nits of formatting.

steveburnett
steveburnett previously approved these changes Apr 28, 2025
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull updated branch, new local doc build, looks good. Thanks!

Copy link
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pramodsatya : Haven't read all the code in detail but have one high level comment about the API in VeloxToPrestoExpr

public:
explicit VeloxToPrestoExprConverter(memory::MemoryPool* pool) : pool_(pool) {}

/// Converts a velox expression `expr` to a Presto protocol RowExpression.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't follow this... What is the output json ?

return getCallExpression(call, input);
}

LOG(ERROR) << "Unable to convert Velox expression: {}" << expr->toString();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function signature could be changed to return an optional json. If there is an error here, you can return nullopt. The input expression wouldn't be needed then.

The caller of this function can then decide to return the input expression if the conversion was not successful.

return getSpecialFormExpression(callTypedExpr, input);
}
return getCallExpression(callTypedExpr, input);
} else if (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move Cast logic before Call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants