Skip to content

[native] Add longVariableConstraints to function metadata#24759

Merged
pdabre12 merged 1 commit intoprestodb:masterfrom
pdabre12:add-longVariableConstraints
Apr 10, 2025
Merged

[native] Add longVariableConstraints to function metadata#24759
pdabre12 merged 1 commit intoprestodb:masterfrom
pdabre12:add-longVariableConstraints

Conversation

@pdabre12
Copy link
Contributor

@pdabre12 pdabre12 commented Mar 19, 2025

Description

Add longVariableConstraints to function metadata.
longVariableConstraints are optional constraints present in the signature which are used to identify the applicable functions in the coordinator.

For eg:
Consider a query like SELECT mod(orderkey, linenumber) FROM lineitem.
While identifying the applicable functions for mod, there is a function signature with decimal args and a decimal return type.

{
        "docString": "presto.default.mod",
        "functionKind": "SCALAR",
        "longVariableConstraints":[
            {
                "expression":"max(i5, i6)",
                "name":"i7"
            },
            {
                "expression":"min(i2 - i6, i1 - i5) + max(i5, i6)",
                "name":"i3"
            }
        ],
        "outputType":"decimal(i3,i7)",
        "paramTypes":["decimal(i1,i5)","decimal(i2,i6)"],
        "routineCharacteristics": {
          "determinism": "DETERMINISTIC",
          "language": "CPP",
          "nullCallClause": "RETURNS_NULL_ON_NULL_INPUT"
        },
        "schema": "default",
        "typeVariableConstraints":[],
        "variableArity":false
      }
 To resolve the variables `i3` and `i7` to valid integers, you use the underlying formula specified in the longVariableConstraints.
 
 To clarify more, 
 For this eg: select mod(decimal(`<i1>` 18,`<i5>` 2), decimal(`<i2>` 38,`<i6>` 2));
 i7 will be resolved to max(i5, i6) = max(2,2) = 2
 i3 will be resolved to min(i2 - i6, i1 - i5) + max(i5, i6) = min(38 - 2, 18 - 2) + max(2,2) = 15 + 2 = 17
 
 The final output type = decimal(i3, i7) = (17,2)

Motivation and Context

When sidecar is enabled, this query fails
"SELECT mod(orderkey, linenumber) FROM lineitem"); with the error: Variable is not bound: i3
The issue arose when identifying applicable functions for the mod operation, as the lack of longVariableConstraints prevented the coordinator from binding variables properly. Without this metadata, the coordinator could not determine how to bind variables to the functions.

This change addresses the issue by adding the necessary longVariableConstraints metadata to the functions, enabling the coordinator to bind the variables correctly using the formulae defined in longVariableConstraints.

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.
== RELEASE NOTES ==

General Changes

*  (Functions with long constraints) fail in sidecar

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Mar 19, 2025
@pdabre12 pdabre12 force-pushed the add-longVariableConstraints branch 2 times, most recently from 64bd2eb to f8aad49 Compare March 25, 2025 19:52
@pdabre12 pdabre12 marked this pull request as ready for review March 25, 2025 19:59
@pdabre12 pdabre12 requested review from a team as code owners March 25, 2025 19:59
@pdabre12 pdabre12 requested a review from presto-oss March 25, 2025 19:59
@prestodb-ci prestodb-ci requested review from a team, ScrapCodes and imsayari404 and removed request for a team March 25, 2025 19:59
@pdabre12 pdabre12 changed the title [WIP] [native] Add longVariableConstraints to function metadata [native] Add longVariableConstraints to function metadata Mar 25, 2025
@pdabre12 pdabre12 force-pushed the add-longVariableConstraints branch from f8aad49 to ee2e5a7 Compare March 26, 2025 17:56
@pdabre12
Copy link
Contributor Author

@ScrapCodes Addressed your comments, PTAL.

@pdabre12 pdabre12 force-pushed the add-longVariableConstraints branch from ee2e5a7 to de146c6 Compare March 26, 2025 20:57
Copy link
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pdabre12 : Have an overall design question.

@ScrapCodes
Copy link
Contributor

Hi @aditi-pandit and @pdabre12 , I have a basic question - how are these long variable constraint expressed by an end user/ developer?

@aditi-pandit
Copy link
Contributor

@ScrapCodes : The Velox developers add these constraints in function registration. Most of the current usage is for computing precision, scale of the result of arithmetic calculations of decimals

example:
https://github.com/facebookincubator/velox/blob/main/velox/functions/sparksql/DecimalArithmetic.cpp#L608.

@pdabre12 pdabre12 force-pushed the add-longVariableConstraints branch 2 times, most recently from 1d4f27f to 5747600 Compare April 7, 2025 21:49
ScrapCodes
ScrapCodes previously approved these changes Apr 8, 2025
Copy link
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pdabre12 :This looks good overall. Just wanted you to confirm about the protocol generation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please can you confirm that you built this with the make presto_protocol command, and these are not manual changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can confirm that this is built with the make presto_protocol command.

@pdabre12 pdabre12 force-pushed the add-longVariableConstraints branch from 2b893ae to 3f46353 Compare April 9, 2025 18:47
Copy link
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pdabre12

@aditi-pandit
Copy link
Contributor

@jaystarshot : PTAL.

jaystarshot
jaystarshot previously approved these changes Apr 10, 2025
@jaystarshot jaystarshot dismissed their stale review April 10, 2025 01:05

checking tests

Copy link
Member

@jaystarshot jaystarshot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may not have the right context but can you help me understand what the behavior change is (i.e what was happening before and what this fixes) and check which test case to look at to understand that?

"date_trunc('month', from_unixtime(orderkey, '+03:00')), date_trunc('day', from_unixtime(orderkey, '-07:00')), " +
"date_trunc('hour', from_unixtime(orderkey, '-09:30')), date_trunc('minute', from_unixtime(orderkey, '+05:30')), " +
"date_trunc('second', from_unixtime(orderkey, '+00:00')) FROM orders");
assertQuery("SELECT mod(orderkey, linenumber) FROM lineitem");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this testing? Was the query failing before?

Copy link
Contributor Author

@pdabre12 pdabre12 Apr 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was testing function e2e test cases when sidecar is enabled, I ran into this issue there. Draft PR
Before this change, this query was failing
"SELECT mod(orderkey, linenumber) FROM lineitem"); with the error: Variable is not bound: i3.
When identifying the applicable functions from all the available mod functions , because of the lack of the longVariableConstraints, the coordinator did not know how to bind these variables.

This PR adds the necessary longVariableConstraints metadata to functions so coordinator knows how to bind these variables using the formulae defined in longVariableConstraints.

Copy link
Member

@jaystarshot jaystarshot Apr 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a @constraint annotation for any mod function? (java implementation)

Copy link
Contributor Author

@pdabre12 pdabre12 Apr 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are functions that are pulled in from the C++ worker.
The constraints are already defined in the C++ function signatures, just adding them to the metadata returned via the \v1\functions endpoint here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then why are constraints added in c++ function signature for mod when there is no corresponding definition in java?

Copy link
Contributor Author

@pdabre12 pdabre12 Apr 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@jaystarshot jaystarshot Apr 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see so this just passes those constraints to the v1/functions? so this query should still succeed in native coordinator-worker setup but i guess its failing in the sidecar only?

Unrelated but still I am a bit confused by why mod has those constraints in c++ but not java.

Copy link
Contributor Author

@pdabre12 pdabre12 Apr 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's failing for native execution with sidecar enabled only.
For native execution without sidecar enabled, we still depend on the Java built-in functions for resolution and java Mod function signatures already have the constraints defined, hence we do not run into this issue.

Copy link
Contributor Author

@pdabre12 pdabre12 Apr 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even java mod functions have these constraints , they are added here

TypeSignature decimalRightSignature = parseTypeSignature("decimal(b_precision, b_scale)", ImmutableSet.of("b_precision", "b_scale"));
TypeSignature decimalResultSignature = parseTypeSignature("decimal(r_precision, r_scale)", ImmutableSet.of("r_precision", "r_scale"));
return SignatureBuilder.builder()
.longVariableConstraints(
longVariableExpression("r_precision", "min(b_precision - b_scale, a_precision - a_scale) + max(a_scale, b_scale)"),
longVariableExpression("r_scale", "max(a_scale, b_scale)"))
.argumentTypes(decimalLeftSignature, decimalRightSignature)
.returnType(decimalResultSignature);
}

@pdabre12 pdabre12 requested a review from jaystarshot April 10, 2025 01:28
Copy link
Member

@jaystarshot jaystarshot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a release note (functions with long constraints) fail in sidecar

@pdabre12
Copy link
Contributor Author

The failing presto-cpp-macos-build-engine job seems unrelated to the changes.

@pdabre12 pdabre12 merged commit 6ff6375 into prestodb:master Apr 10, 2025
104 of 105 checks passed
@pdabre12 pdabre12 deleted the add-longVariableConstraints branch April 10, 2025 03:56
@ZacBlanco ZacBlanco mentioned this pull request May 29, 2025
21 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants