-
Notifications
You must be signed in to change notification settings - Fork 5.5k
fix Update json_extract to Produce Canonicalized Output #24614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
LGTM. Maybe also add a verifier run? |
d7fc7ce to
01e3b68
Compare
presto-main/src/test/java/com/facebook/presto/operator/scalar/TestJsonExtractFunctions.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/operator/scalar/JsonExtract.java
Outdated
Show resolved
Hide resolved
|
Also let's gate this change behind a configuration/session property. You can look at usages of other properties in SqlFunctionProperties for examples |
presto-main/src/main/java/com/facebook/presto/operator/scalar/JsonExtract.java
Outdated
Show resolved
Hide resolved
rschlussel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
general approach looks good to me!
presto-main/src/main/java/com/facebook/presto/operator/scalar/JsonExtract.java
Outdated
Show resolved
Hide resolved
presto-main/src/test/java/com/facebook/presto/operator/scalar/TestJsonExtract.java
Show resolved
Hide resolved
presto-common/src/main/java/com/facebook/presto/common/function/SqlFunctionProperties.java
Outdated
Show resolved
Hide resolved
9038dd7 to
a97427a
Compare
rschlussel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you run a verifier comparing this with Prestissimo and check if they now match?
presto-main/src/test/java/com/facebook/presto/type/TestRowOperators.java
Outdated
Show resolved
Hide resolved
presto-main/src/test/java/com/facebook/presto/sql/gen/TestExpressionCompiler.java
Outdated
Show resolved
Hide resolved
| private boolean warnOnPossibleNans; | ||
| private boolean legacyCharToVarcharCoercion; | ||
| private boolean legacyJsonCast = true; | ||
| private boolean canonicalizedJsonExtract = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @rschlussel, shall we enable canonicalizedJsonExtract by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should because it brings parity with presto c++. do you have any performance numbers about the extra cost? You can create a benchmark and get numbers for both on and off (see https://github.com/prestodb/presto/blob/master/presto-main/src/test/java/com/facebook/presto/operator/scalar/BenchmarkJsonToArrayCast.java for an example) and if the regression isn't too significant we can enable by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rschlussel, @amitkdutta, please check the benchmark presto-main/src/test/java/com/facebook/presto/operator/scalar/BenchmarkJsonExtract.java and the test result in the summary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given the 11% regression, let's disable by default and we can enable it ourselves in our configuration. Otherwise, this looks good to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @rschlussel, I've updated the default to false. Thanks for your thoughtful review! Can I get a stamp on this?
presto-main/src/test/java/com/facebook/presto/sql/gen/TestExpressionCompiler.java
Outdated
Show resolved
Hide resolved
presto-main/src/test/java/com/facebook/presto/type/TestRowOperators.java
Outdated
Show resolved
Hide resolved
138e49a to
a80d9af
Compare
|
|
|
@duxiao1212 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
@duxiao1212 Yes json_extract is available in Prestissimo . Feel free to verify against it. |
## Description The original pull request [#24614](#24614) incorrectly compares canonicalizedJsonExtract and legacyJsonCast in the equals function of an object. This issue can be seen in the code [here](https://github.com/prestodb/presto/pull/24614/files#diff-e921c5d186f9d5daa836bc7330f52caf8c1b84d19cf42288d5a8a7c9a6d2a5d5R156). As a result, whenever a SQL function requires caching, the cache is never hit, leading to the creation of new SQL function objects repeatedly. This behavior eventually causes an OOM error in the JVM metaspace. and eventually this error led to UER SEV. After the problematic comparison was updated and tested through shadow cluster by @rschlussel , we are confident that the issue has been resolved in this PR. Therefore, we plan to bring back the json canonicalized extract ## Motivation and Context Reintroduced json_extract to generate canonicalized output ## Impact <!---Describe any public API or user-facing feature change or any performance impact--> low impact ## Test Plan <!---Please fill in how you tested your change--> N/A ## Contributor checklist - [x] Please make sure your submission complies with our [contributing guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md), in particular [code style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style) and [commit standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards). - [x] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced. - [x] Documented new properties (with its default value), SQL syntax, functions, or other functionality. - [x] If release notes are required, they follow the [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines). - [x] Adequate tests were added if applicable. - [x] CI passed. ## Release Notes Please follow [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines) and fill in the release notes below. ``` == NO RELEASE NOTE == ```
# This is the 1st commit message: [native] Add a test group for async data cache e2e tests. # This is the commit message prestodb#2: [native] Advance velox. # This is the commit message prestodb#3: Fix error when describing a nonexistent table # This is the commit message prestodb#4: [native] test modification. # This is the commit message prestodb#5: Changes to enable ssl/tls in hms Co-authored-by: Arin Mathew <[email protected]> Changes to move ssl related properties to seperate class # This is the commit message prestodb#6: [native] Add tests for UUID type # This is the commit message prestodb#7: Reintroduced json_extract to generate canonicalized output (prestodb#24879) ## Description The original pull request [prestodb#24614](prestodb#24614) incorrectly compares canonicalizedJsonExtract and legacyJsonCast in the equals function of an object. This issue can be seen in the code [here](https://github.com/prestodb/presto/pull/24614/files#diff-e921c5d186f9d5daa836bc7330f52caf8c1b84d19cf42288d5a8a7c9a6d2a5d5R156). As a result, whenever a SQL function requires caching, the cache is never hit, leading to the creation of new SQL function objects repeatedly. This behavior eventually causes an OOM error in the JVM metaspace. and eventually this error led to UER SEV. After the problematic comparison was updated and tested through shadow cluster by @rschlussel , we are confident that the issue has been resolved in this PR. Therefore, we plan to bring back the json canonicalized extract ## Motivation and Context Reintroduced json_extract to generate canonicalized output ## Impact <!---Describe any public API or user-facing feature change or any performance impact--> low impact ## Test Plan <!---Please fill in how you tested your change--> N/A ## Contributor checklist - [x] Please make sure your submission complies with our [contributing guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md), in particular [code style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style) and [commit standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards). - [x] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced. - [x] Documented new properties (with its default value), SQL syntax, functions, or other functionality. - [x] If release notes are required, they follow the [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines). - [x] Adequate tests were added if applicable. - [x] CI passed. ## Release Notes Please follow [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines) and fill in the release notes below. ``` == NO RELEASE NOTE == ``` # This is the commit message prestodb#8: Reuse JoinType in IndexJoinNode Reuse JoinType instead of creating IndexJoinNode's own. JoinType is already part of Prestissimo protocol. Adding IndexJoinNode with another JoinType would cause naming conflict. # This is the commit message prestodb#9: Add jaro-winkler implementation, documentation and tests # This is the commit message prestodb#10: Add documentation for Iceberg support in PrestoCPP # This is the commit message prestodb#11: Add memory pool debug regex # This is the commit message prestodb#12: doc on hive csv limitations # This is the commit message prestodb#13: Add support for S3 WebIdentity authentication # This is the commit message prestodb#14: use com.facebook.airlift:security in presto-hive-metastore # This is the commit message prestodb#15: [native] Advance Velox # This is the commit message prestodb#16: [native] Add protocol for index lookup join plan # This is the commit message prestodb#17: [native] Add sidecar and sidecar plugin documentation
…24879) ## Description The original pull request [prestodb#24614](prestodb#24614) incorrectly compares canonicalizedJsonExtract and legacyJsonCast in the equals function of an object. This issue can be seen in the code [here](https://github.com/prestodb/presto/pull/24614/files#diff-e921c5d186f9d5daa836bc7330f52caf8c1b84d19cf42288d5a8a7c9a6d2a5d5R156). As a result, whenever a SQL function requires caching, the cache is never hit, leading to the creation of new SQL function objects repeatedly. This behavior eventually causes an OOM error in the JVM metaspace. and eventually this error led to UER SEV. After the problematic comparison was updated and tested through shadow cluster by @rschlussel , we are confident that the issue has been resolved in this PR. Therefore, we plan to bring back the json canonicalized extract ## Motivation and Context Reintroduced json_extract to generate canonicalized output ## Impact <!---Describe any public API or user-facing feature change or any performance impact--> low impact ## Test Plan <!---Please fill in how you tested your change--> N/A ## Contributor checklist - [x] Please make sure your submission complies with our [contributing guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md), in particular [code style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style) and [commit standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards). - [x] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced. - [x] Documented new properties (with its default value), SQL syntax, functions, or other functionality. - [x] If release notes are required, they follow the [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines). - [x] Adequate tests were added if applicable. - [x] CI passed. ## Release Notes Please follow [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines) and fill in the release notes below. ``` == NO RELEASE NOTE == ```
## Description Previously, in the PR #24614, we accidentally compared different fields in the SqlFunctionProperties, which caused an OOM error in the JVM metaspace and eventually led to a SEV. To prevent this issue from occurring again, we need to ensure that the SqlFunctionProperties equality function is implemented correctly. Additionally, we should create a unit test to reproduce the issue and prevent it from happening in the future. ## Motivation and Context Prevent incorrect implementation of the equals method in SqlFunctionProperties ## Impact low impact ## Test Plan N/A ## Contributor checklist - [x] Please make sure your submission complies with our [contributing guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md), in particular [code style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style) and [commit standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards). - [x] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced. - [x] Documented new properties (with its default value), SQL syntax, functions, or other functionality. - [x] If release notes are required, they follow the [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines). - [x] Adequate tests were added if applicable. - [x] CI passed. ## Release Notes ``` == NO RELEASE NOTE == ```
Description
Canonicalize json_extract output by replacing the copy operation with writing the JSON to output stream with sorted keys
Motivation and Context
This relates to issue #24563.
Ensuring JSON output canonicalization guarantees the accuracy of JSON comparisons and aligns the behavior with json_parse
Impact
low impact
Test Plan
SET SESSION canonicalized_json_extract = true
Enabling canonicalized JSON extract (isCanonicalizedJsonExtract = true) causes a 11.7% performance drop compared to disabling it (isCanonicalizedJsonExtract = false), specifically in complex test cases involving nested JSON objects in the JSON path.
Contributor checklist
Release Notes
== RELEASE NOTES ==
Presto Changes
canonicalized_json_extract. :pr:24614