docs: Add to Presto C++ limitations doc#27120
Conversation
Reviewer's guide (collapsed on small PRs)Reviewer's GuideAdds a new Presto C++ documentation page documenting known query differences and workarounds between legacy Presto and Presto C++, and wires it into the existing Presto C++ docs hierarchy. File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 5 issues
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `presto-docs/src/main/sphinx/presto_cpp/queries.rst:61` </location>
<code_context>
+ For example, in the above case ``event_time`` is used for comparison throughout the lambda.
+ If we rewrote the expression as following, where ``x`` and ``y`` have different fields, it will fail:
+ ``(x, y) -> if (x.event_time < y.event_start_time, -1, if (x.event_time > y.event_start_time, 1, 0))``
+* Any additional nesting other from the two ``if`` uses shown above will fail.
+
+``Array_sort`` can support any transformation lambda that returns a comparable type. This example is not supported:
</code_context>
<issue_to_address>
**issue (typo):** Fix wording in "other from" phrase.
"Other from" is ungrammatical here. Consider rephrasing to: `Any additional nesting other than the two ``if`` uses shown above will fail.`
```suggestion
* Any additional nesting other than the two ``if`` uses shown above will fail.
```
</issue_to_address>
### Comment 2
<location> `presto-docs/src/main/sphinx/presto_cpp/queries.rst:137-140` </location>
<code_context>
+
+Canonicalization
+----------------
+* **Issue**: Presto ``json_extract`` `can return json that is not canonicalized <https://github.com/prestodb/presto/issues/24563#issue-2852506643>`_.
+* **Workaround**: ``json_extract`` has been rewritten in Prestissimo to always return canonical JSON.
+
</code_context>
<issue_to_address>
**nitpick (typo):** Capitalize JSON consistently.
In the bullet point, change "json" to "JSON" in the phrase "can return json that is not canonicalized".
```suggestion
Canonicalization
----------------
* **Issue**: Presto ``json_extract`` `can return JSON that is not canonicalized <https://github.com/prestodb/presto/issues/24563#issue-2852506643>`_.
* **Workaround**: ``json_extract`` has been rewritten in Prestissimo to always return canonical JSON.
```
</issue_to_address>
### Comment 3
<location> `presto-docs/src/main/sphinx/presto_cpp/queries.rst:150` </location>
<code_context>
+* Presto C++ uses `RE2 <https://github.com/google/re2>`_, a widely adopted modern regular expression parsing library.
+ RE2 provides most of the functionality of PCRE using a C++ interface very close to
+ that of PCRE, which guarantees `linear time execution and a fixed stack footprint <https://swtch.com/~rsc/regexp/regexp3.html#caveats>`_.
+* Presto Java uses `JONI <https://github.com/jruby/joni>`_ which is a deprecated port of ONI.
+
+While both frameworks support almost all regular expression syntaxes, RE2 differs from JONI and PCRE in
</code_context>
<issue_to_address>
**issue (typo):** Correct the name of the regex engine JONI is based on.
"Deprecated port of ONI" appears to be a typo. JONI is a port of Oniguruma (often abbreviated ONIG). Please update the text to refer to Oniguruma/ONIG instead of ONI.
```suggestion
* Presto Java uses `JONI <https://github.com/jruby/joni>`_ which is a deprecated port of Oniguruma (ONIG).
```
</issue_to_address>
### Comment 4
<location> `presto-docs/src/main/sphinx/presto_cpp/queries.rst:221` </location>
<code_context>
+URL Functions
+=============
+
+Presto and Prestissimo implement different URL functions specs which can lead to
+some URL function mismatches. Prestissimo implements `RFC-3986 <https://datatracker.ietf.org/doc/html/rfc3986>`_ whereas Presto
+implements `RFC-2396 <https://datatracker.ietf.org/doc/html/rfc2396>`_. This can lead to subtle differences as presented in
</code_context>
<issue_to_address>
**suggestion (typo):** Tighten phrasing of "URL functions specs".
"URL functions specs" reads a bit awkwardly. Consider "URL function specs" or "URL function specifications" instead.
```suggestion
Presto and Prestissimo implement different URL function specifications which can lead to
```
</issue_to_address>
### Comment 5
<location> `presto-docs/src/main/sphinx/presto_cpp/queries.rst:289-291` </location>
<code_context>
+
+In Presto, the result of CAST(TIMESTAMP AS TIME) or CAST(TIMESTAMP AS TIME WITH TIME ZONE) would change based on the
+session property ``legacy_timestamp`` (true by default) when applied to the user's time zone.
+In Presto C++ for TIME/TIME WITH TIMEZONE the behavior will be equivalent to the property being false.
+
+Note: ``TIMESTAMP`` behavior in Presto and Presto C++ is unchanged.
</code_context>
<issue_to_address>
**issue (typo):** Align TIME WITH TIME ZONE spelling with the rest of the document.
Here this is written as "TIME WITH TIMEZONE"; elsewhere you use "TIME WITH TIME ZONE". Please update this occurrence to "TIME/TIME WITH TIME ZONE" for consistency.
```suggestion
In Presto, the result of CAST(TIMESTAMP AS TIME) or CAST(TIMESTAMP AS TIME WITH TIME ZONE) would change based on the
session property ``legacy_timestamp`` (true by default) when applied to the user's time zone.
In Presto C++ for TIME/TIME WITH TIME ZONE the behavior will be equivalent to the property being false.
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
Okay, this is interesting. The release note entry appears to be correct, but the CI check fails with the following error: It appears that the Sourcery edit of the PR description is being parsed by the release-note CI test? This doesn't block merge but I wanted to mention this as I hadn't seen this before in other PRs. If there's anything I can do to fix this - other than simply delete the Sourcery summary manually - let me know. Update: CI check now passes after I pushed an update. Don't understand why it failed the first time. |
c225b0f to
967bdc8
Compare
|
Addressed Sourcery suggested edits and did another beginning-to-end edit for consistency, phrasing, and structure. |
amitkdutta
left a comment
There was a problem hiding this comment.
Looks good. Thanks @steveburnett
tdcmeehan
left a comment
There was a problem hiding this comment.
A lot of these sound more like limitations. What do you think about merging this doc with the limitations doc?
@tdcmeehan Okay, I'll work on rewriting this. Do you want me to merge the entire queries page into limitations.rst? |
|
@steveburnett yes. I think we can just add these details to that page, since it's all about the differences between C++ and Java clusters which we won't fix (at least any time soon). |
Will do! |
967bdc8 to
e8db284
Compare
e8db284 to
84bd688
Compare
Co-authored-by: Amit Dutta <amit.kolorob@gmail.com> Co-authored-by: Krishna Pai <kgpai@meta.com>
84bd688 to
c0c1d1a
Compare
tdcmeehan
left a comment
There was a problem hiding this comment.
This is a great addition. Thank you!
Description
I edited a draft document at the request of @amitkdutta and @kgpai to help prepare that document for publication as the blog post Presto vs Prestissimo – Known differences and workarounds.
I thought the content in the blog post was valuable and should be added to the Presto documentation. In this PR I have revised the blog post to follow the format and style of Presto documentation to add to the Presto docs.
Following feedback I incorporated the content of the blog post into presto_cpp/limitations.rst.
Motivation and Context
Improves Presto documentation of Presto C++, helping readers to be aware of limitations when running Presto queries in C++, and advise how to rewrite Presto queries to run successfully in Presto C++.
Impact
Documentation.
Test Plan
Local doc builds.
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.
Summary by Sourcery
Documentation: