diff --git a/presto-docs/src/main/sphinx/presto_cpp/limitations.rst b/presto-docs/src/main/sphinx/presto_cpp/limitations.rst index ea957b5e60cea..d6f5e7a360b5f 100644 --- a/presto-docs/src/main/sphinx/presto_cpp/limitations.rst +++ b/presto-docs/src/main/sphinx/presto_cpp/limitations.rst @@ -49,25 +49,39 @@ Aggregate Functions reduce_agg ^^^^^^^^^^ -In C++ based Presto, ``reduce_agg`` is not permitted to return ``null`` in either the -``inputFunction`` or the ``combineFunction``. In Presto (Java), this is permitted -but undefined behavior. For more information about ``reduce_agg`` in Presto, -see `reduce_agg <../functions/aggregate.html#reduce_agg>`_. +In C++ based Presto, ``reduce_agg`` is not permitted to return ``null`` in either the +``inputFunction`` or the ``combineFunction``. In Presto (Java), this is permitted +but undefined behavior. For more information about ``reduce_agg`` in Presto, +see :doc:`/functions/aggregate`. reduce lambda ^^^^^^^^^^^^^ -For the reduce lambda function, the array size is controlled by the session property -``native_expression_max_array_size_in_reduce``, as it is inefficient to support such -cases for arbitrarily large arrays. This property is set at ``100K``. Queries that +For the reduce lambda function, the array size is controlled by the session property +``native_expression_max_array_size_in_reduce``, as it is inefficient to support such +cases for arbitrarily large arrays. This property is set at ``100K``. Queries that fail due to this limit must be revised to meet this limit. +spatial_partitioning +^^^^^^^^^^^^^^^^^^^^ +The ``spatial_partitioning`` aggregate function is not supported in Presto C++. + +This function was an internal utility for spatial join optimization and was never +fully documented or publicly supported. It generated a KD-tree representation in +JSON format that could be used with the ``spatial_partitions`` scalar function +(see `Geospatial Differences`_) to optimize spatial join queries. + +Due to its internal nature, lack of documentation, limited usage, and incompatibility +with Presto's streaming architecture, this function was not implemented in Presto C++. +Standard spatial join operations using :doc:`/functions/geospatial` functions such as +:func:`!ST_Contains` or :func:`!ST_Intersects` should be used instead. + Array Functions --------------- Array sort with lambda comparator ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -``Case`` is not supported for the lambda comparator. Use ``If`` Instead. The following +``Case`` is not supported for the lambda comparator. Use ``If`` Instead. The following example is not supported in Presto C++: .. code-block:: sql @@ -81,7 +95,7 @@ example is not supported in Presto C++: ELSE 0 END -To work with Presto C++, the best option is to use transform lambda whenever possible. +To work with Presto C++, the best option is to use transform lambda whenever possible. For example: .. code-block:: sql @@ -92,28 +106,28 @@ Or, rewrite using ``if`` as in the following example: .. code-block:: sql - (x, y) -> IF (x.event_time < y.event_time, -1, + (x, y) -> IF (x.event_time < y.event_time, -1, IF (x.event_time > y.event_time, 1, 0)) When using ``If``, follow these rules when using a lambda in array sort: * The lambda should use ``if else``. Case is not supported. * The lambda should return ``1``, ``0``, ``-1``. Cover all the cases. -* The lambda should use the same expression when doing the comparison. - For example, in the above case ``event_time`` is used for comparison throughout the lambda. - If we rewrote the expression as following, where ``x`` and ``y`` have different fields, it will fail: +* The lambda should use the same expression when doing the comparison. + For example, in the above case ``event_time`` is used for comparison throughout the lambda. + If we rewrote the expression as following, where ``x`` and ``y`` have different fields, it will fail: ``(x, y) -> if (x.event_time < y.event_start_time, -1, if (x.event_time > y.event_start_time, 1, 0))`` * Any additional nesting other than the two ``if`` uses shown above will fail. -``Array_sort`` can support any transformation lambda that returns a comparable type. +``Array_sort`` can support any transformation lambda that returns a comparable type. This example is not supported in Presto C++: .. code-block:: sql "array_sort"("map_values"(m), (a, b) -> ( - CASE WHEN (a[1] [2] > b[1] [2]) THEN 1 - WHEN (a[1] [2] < b[1] [2]) THEN -1 - WHEN (a[1] [2] = b[1] [2]) THEN + CASE WHEN (a[1] [2] > b[1] [2]) THEN 1 + WHEN (a[1] [2] < b[1] [2]) THEN -1 + WHEN (a[1] [2] = b[1] [2]) THEN IF((a[3] > b[3]), 1, -1) END) To run in Presto C++, rewrite the query as shown in this example: @@ -121,7 +135,7 @@ To run in Presto C++, rewrite the query as shown in this example: .. code-block:: sql "array_sort"("map_values"(m), (a) -> ROW(a[1][2], a[3])) - + Casting ------- @@ -135,30 +149,45 @@ Casting of Unicode strings to digits is not supported. The following example is Date and Time Functions ----------------------- -The maximum date range supported by ``from_unixtime`` is between (292 Million BCE, 292 Million CE). -The exact values corresponding to this are [292,275,055-05-16 08:54:06.192 BC, +292,278,994-08-17 00:12:55.807 CE], -corresponding to a UNIX time between [-9223372036854775, 9223372036854775]. +The maximum date range supported by ``from_unixtime`` is between (292 Million BCE, 292 Million CE). +The exact values corresponding to this are [292,275,055-05-16 08:54:06.192 BC, +292,278,994-08-17 00:12:55.807 CE], +corresponding to a UNIX time between [-9223372036854775, 9223372036854775]. -Presto and Presto C++ both support the same range but Presto queries succeed because Presto silently -truncates. Presto C++ throws an error if the values exceed this range. +Presto and Presto C++ both support the same range but Presto queries succeed because Presto silently +truncates. Presto C++ throws an error if the values exceed this range. Geospatial Differences ---------------------- -There are cosmetic representation changes as well as numerical precision differences. -Some of these differences result in different output for spatial predicates such +There are cosmetic representation changes as well as numerical precision differences. +Some of these differences result in different output for spatial predicates such as ST_Intersects. Differences include: -* Equivalent but different representations for geometries. Polygons may have their rings - rotated, EMPTY geometries may be of a different type, MULTI-types and - GEOMETRYCOLLECTIONs may have their elements in a different order. In general, +* Equivalent but different representations for geometries. Polygons may have their rings + rotated, EMPTY geometries may be of a different type, MULTI-types and + GEOMETRYCOLLECTIONs may have their elements in a different order. In general, WKTs/WKBs may be different. -* Numerical precision: Differences in numerical techniques may result in different - coordinate values, and also different results for predicates (ST_Relates and children, - including ST_Contains, ST_Crosses, ST_Disjoint, ST_Equals, ST_Intersects, +* Numerical precision: Differences in numerical techniques may result in different + coordinate values, and also different results for predicates (ST_Relates and children, + including ST_Contains, ST_Crosses, ST_Disjoint, ST_Equals, ST_Intersects, ST_Overlaps, ST_Relate, ST_Touches, ST_Within). * ST_IsSimple, ST_IsValid, simplify_geometry and geometry_invalid_reason may give different results. +spatial_partitions +^^^^^^^^^^^^^^^^^^ +The ``spatial_partitions`` scalar function is not supported in Presto C++. + +This function was an internal utility for spatial join optimization and was never +fully documented or publicly supported. It worked with the ``spatial_partitioning`` +aggregate function (see `Aggregate Functions`_) by consuming a KD-tree representation +loaded via a session property. The query optimizer would automatically use this +function to partition data for spatial joins. + +Due to its internal nature, lack of documentation, limited usage, and incompatibility +with Presto's streaming architecture, this function was not implemented in Presto C++. +Standard spatial join operations using :doc:`/functions/geospatial` functions such as +:func:`!ST_Contains`, :func:`!ST_Intersects` should be used instead. + JSON Functions -------------- @@ -168,9 +197,9 @@ Use of functions in JSON path ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Using functions inside a JSON path is not supported. -To run queries with functions inside a JSON path in Presto C++, rewrite paths to -use equivalent and often faster UDFs (User-Defined Functions) outside the JSON -path, improving job portability and efficiency. Aggregates might be necessary. +To run queries with functions inside a JSON path in Presto C++, rewrite paths to +use equivalent and often faster UDFs (User-Defined Functions) outside the JSON +path, improving job portability and efficiency. Aggregates might be necessary. Generally, functions should be extracted from the JSON path for better portability. @@ -181,7 +210,7 @@ For example, this Presto query: CAST(JSON_EXTRACT(config, '$.table_name_to_properties.keys()' ) AS ARRAY(ARRAY(VARCHAR))) -can be revised to work in both Presto and Presto C++ as the following: +can be revised to work in both Presto and Presto C++ as the following: .. code-block:: sql @@ -191,16 +220,16 @@ Use of expressions in JSON path ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Paths containing filter expressions are not supported. -To run such queries in Presto C++, revise the query to do the filtering as a +To run such queries in Presto C++, revise the query to do the filtering as a part of the SQL expression query, rather than in the JSON path. -For example, consider this Presto query: +For example, consider this Presto query: .. code-block:: sql JSON_EXTRACT(config, '$.store.book[?(@.price > 10)]') -The same query rewritten to run in Presto C++: +The same query rewritten to run in Presto C++: .. code-block:: sql @@ -211,61 +240,61 @@ The same query rewritten to run in Presto C++: Erroring on Invalid JSON ^^^^^^^^^^^^^^^^^^^^^^^^ -Presto can successfully run ``json_extract`` on certain invalid JSON, but Presto C++ -always fails. Extracting data from invalid JSON is indeterminate and relying on -that behavior can have unintended consequences. +Presto can successfully run ``json_extract`` on certain invalid JSON, but Presto C++ +always fails. Extracting data from invalid JSON is indeterminate and relying on +that behavior can have unintended consequences. -Because Presto C++ takes the safe approach to always throw an error on invalid -JSON, wrap calls in a try to ensure the query succeeds and validate that the -results correspond to your expectations. +Because Presto C++ takes the safe approach to always throw an error on invalid +JSON, wrap calls in a try to ensure the query succeeds and validate that the +results correspond to your expectations. Canonicalization ^^^^^^^^^^^^^^^^ -Presto ``json_extract`` can return `JSON that is not canonicalized `_. +Presto ``json_extract`` can return `JSON that is not canonicalized `_. ``json_extract`` has been rewritten in Presto C++ to always return canonical JSON. Regex Functions --------------- -Unsupported Cases +Unsupported Cases ^^^^^^^^^^^^^^^^^ -Presto C++ uses `RE2 `_, a widely adopted modern regular -expression parsing library. +Presto C++ uses `RE2 `_, a widely adopted modern regular +expression parsing library. -Presto uses `JONI `_, a deprecated port of Oniguruma (ONIG). +Presto uses `JONI `_, a deprecated port of Oniguruma (ONIG). -While both frameworks support almost all regular expression syntaxes, RE2 differs from +While both frameworks support almost all regular expression syntaxes, RE2 differs from JONI and PCRE in certain cases. The following are not supported in Presto C++ but are supported in Presto: * before text matching (?=re) * before text not matching (?!re) -* after text matching (?<=re) +* after text matching (?<=re) * after text not matching (?`_, -must be rewritten to run in Presto C++. See `Syntax `_ -for a full list of unsupported regular expressions in RE2 and -`Caveats `_ for an explanation of -why RE2 skips certain syntax in Perl. +Presto queries using these, and +unsupported regular expressions listed in `Syntax `_, +must be rewritten to run in Presto C++. See `Syntax `_ +for a full list of unsupported regular expressions in RE2 and +`Caveats `_ for an explanation of +why RE2 skips certain syntax in Perl. Regex Compilation Limit in Velox ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Because Regex compilation is CPU intensive, unbounded compilation can cause problems. -The number of regular expressions that can be dynamically compiled for a query is limited -to 250 to keep the overall shared cluster environment healthy. +Because Regex compilation is CPU intensive, unbounded compilation can cause problems. +The number of regular expressions that can be dynamically compiled for a query is limited +to 250 to keep the overall shared cluster environment healthy. -If this limit is reached, rewrite the query to use fewer compiled regular expressions. +If this limit is reached, rewrite the query to use fewer compiled regular expressions. In this example the regex can change based on the ``test_name`` column value, which could exceed the 250 limit: .. code-block:: sql - code_location_path LIKE '%' || test_name || '%' + code_location_path LIKE '%' || test_name || '%' -Revise the query as follows to avoid this limit: +Revise the query as follows to avoid this limit: .. code-block:: sql @@ -277,14 +306,14 @@ Time and Time with Time Zone IANA Named Timezones Support ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Support for IANA named time zones - for example, `Europe/London`, `UTC`, `America/New_York`, -`Asia/Kolkata` - in ``TIME`` and ``TIME WITH TIME ZONE`` was removed from Presto C++ -to align with the SQL standard. Only fixed-offset time zones such as `+02:00` are +Support for IANA named time zones - for example, `Europe/London`, `UTC`, `America/New_York`, +`Asia/Kolkata` - in ``TIME`` and ``TIME WITH TIME ZONE`` was removed from Presto C++ +to align with the SQL standard. Only fixed-offset time zones such as `+02:00` are now supported for these types. Named time zones may still work when the Presto coordinator handles the query. -To run queries involving ``TIME`` and ``TIME WITH TIME ZONE``, migrate to fixed-offset +To run queries involving ``TIME`` and ``TIME WITH TIME ZONE``, migrate to fixed-offset time zones as soon as possible. These queries will fail in Presto C++, but may still work in Presto: @@ -305,21 +334,21 @@ These queries using fixed offsets will run successfully in Presto C++: Casting from TIMESTAMP to TIME ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -In Presto, the result of CAST(TIMESTAMP AS TIME) or CAST(TIMESTAMP AS TIME WITH TIME ZONE) -would change based on the session property ``legacy_timestamp`` (true by default) when -applied to the user's time zone. In Presto C++ for ``TIME`` and ``TIME WITH TIME ZONE``, +In Presto, the result of CAST(TIMESTAMP AS TIME) or CAST(TIMESTAMP AS TIME WITH TIME ZONE) +would change based on the session property ``legacy_timestamp`` (true by default) when +applied to the user's time zone. In Presto C++ for ``TIME`` and ``TIME WITH TIME ZONE``, the behavior is equivalent to the property being `false`. Note: ``TIMESTAMP`` behavior in Presto and Presto C++ is unchanged. -For examples, consider the following queries and their responses when run in Presto: +For example, consider the following queries and their responses when run in Presto: .. code-block:: sql -- Default behavior with legacy_timestamp=true: -- Session Timezone - America/Los_Angeles - -- DST Active Dates + -- DST Active Dates select cast(TIMESTAMP '2023-08-05 10:15:00.000' as TIME); -- Returns: 09:15:00.000 select cast(TIMESTAMP '2023-08-05 10:15:00.000' as TIME WITH TIME ZONE); @@ -339,7 +368,7 @@ For examples, consider the following queries and their responses when run in Pre select cast(TIMESTAMP '2023-12-05 10:15:00.000 America/Los_Angeles' as TIME WITH TIME ZONE); -- 10:15:00.000 America/Los_Angeles -Consider the following queries and their responses when run in Presto C++ (Velox): +Consider the following queries and their responses when run in Presto C++ (Velox): .. code-block:: sql @@ -347,7 +376,7 @@ Consider the following queries and their responses when run in Presto C++ (Velox -- Session Timezone - America/Los_Angeles - -- DST Active Dates + -- DST Active Dates select cast(TIMESTAMP '2023-08-05 10:15:00.000' as TIME); -- Returns: 10:15:00.000 select cast(TIMESTAMP '2023-08-05 10:15:00.000' as TIME WITH TIME ZONE); @@ -371,18 +400,18 @@ Note: ``TIMESTAMP`` supports named time zones, unlike ``TIME`` and ``TIME WITH T DST Implications ^^^^^^^^^^^^^^^^ -Because IANA zones are not supported for ``TIME``, Presto C++ does not manage DST transitions. +Because IANA zones are not supported for ``TIME``, Presto C++ does not manage DST transitions. All time interpretation is strictly in the provided offset, not local civil time. -For example, ``14:00:00 +02:00`` always means 14:00 at a +02:00 fixed offset, regardless +For example, ``14:00:00 +02:00`` always means 14:00 at a +02:00 fixed offset, regardless of DST changes that might apply under an IANA zone. Recommendations ^^^^^^^^^^^^^^^ * Use fixed-offset time zones like +02:00 with ``TIME`` and ``TIME WITH TIME ZONE``. -* Do not use IANA time zone names for ``TIME`` and ``TIME WITH TIME ZONE``. -* Confirm that your Presto C++ usage does not depend on legacy timestamp behavior. If your workload - depends on legacy ``TIME`` behavior, including support of IANA timezones, handle this outside +* Do not use IANA time zone names for ``TIME`` and ``TIME WITH TIME ZONE``. +* Confirm that your Presto C++ usage does not depend on legacy timestamp behavior. If your workload + depends on legacy ``TIME`` behavior, including support of IANA timezones, handle this outside Presto or reach out so that we can discuss alternative solutions. * Test: Try your most critical workflows with these settings. @@ -390,9 +419,9 @@ Recommendations URL Functions ------------- -Presto and Presto C++ implement different URL function specifications which can lead to -some URL function mismatches. Presto C++ implements `RFC-3986 `_ whereas Presto -implements `RFC-2396 `_. This can lead to subtle differences as presented in +Presto and Presto C++ implement different URL function specifications which can lead to +some URL function mismatches. Presto C++ implements `RFC-3986 `_ whereas Presto +implements `RFC-2396 `_. This can lead to subtle differences as presented in `this issue `_. Window Functions @@ -402,6 +431,6 @@ Aggregate window functions do not support ``IGNORE NULLS``, returning the follow ``!ignoreNulls Aggregate window functions do not support IGNORE NULLS.`` -For Presto C++, remove the ``IGNORE NULLS`` clause. This clause is only defined for value functions -and does not apply to aggregate window functions. In Presto the results obtained with and without +For Presto C++, remove the ``IGNORE NULLS`` clause. This clause is only defined for value functions +and does not apply to aggregate window functions. In Presto the results obtained with and without the clause are similar, Presto C++ includes this clause whereas Presto just warns. \ No newline at end of file