Add release notes for 0.271 by cocozianu · Pull Request #17352 · prestodb/presto

cocozianu · 2022-02-24T20:55:10Z

Missing Release Notes

Amit Dutta

Adding checkArgument function to properly throw Presto Exception when… #17245 Adding checkArgument function to properly throw Presto Exception when… (Merged by: Timothy Meehan)

Masha Basmanova

Mark tdigest_agg function as public #17316 Mark tdigest_agg function as public (Merged by: Maria Basmanova)

Rongrong Zhong

Make sure ConsistentHashingNodeProvider returns unique candidates #17228 Make sure ConsistentHashingNodeProvider returns unique candidates (Merged by: Rongrong Zhong)

singcha

add warning when map with double or real as key is created #17246 add warning when map with double or real as key is created (Merged by: Rebecca Schlussel)

Extracted Release Notes

Spill large blocks to separate spill files in distinct aggregates #17096 (Author: Arjun Gupta): Spill large blocks to separate spill files in distinct aggregates
- Add a new configuration property experimental.distinct-aggregation-large-block-spill-enabled to enable spilling of blocks that are larger than experimental.distinct-aggregation-large-block-size-threshold bytes into a separate spill file. This can be overridden by distinct_aggregation_large_block_spill_enabled session property.
- Add a new configuration property experimental.distinct-aggregation-large-block-size-threshold to define the threshold size beyond which the block will be spilled into a separate spill file. This can be overridden by distinct_aggregation_large_block_size_threshold session property.
Test and fix cast from bigint to varchar #17152 (Author: v-jizhang): Test and fix cast from bigint to varchar
- Test and fix cast from bigint to varchar.
Support basic timestamp in the iceberg connector #17190 (Author: Chunxu Tang): Support basic timestamp in the iceberg connector
- Support basic timestamp in the iceberg connector.
Remove iceberg.catalog.uri from iceberg connector configs #17237 (Author: Chunxu Tang): Remove iceberg.catalog.uri from iceberg connector configs
- Remove the iceberg.catalog.uri config. Use hive.metastore.uri instead.
Add Prepared statement sql to query lifecycle events #17258 (Author: Shashwat Arghode): Add Prepared statement sql to query lifecycle events
- Add support for viewing expanded prepared query in Web UI.
Support ORC format caching for iceberg connector #17262 (Author: JySongWithZhangCe): Support ORC format caching for iceberg connector
- Support ORC format caching module for iceberg connector.
Only collect stats for primitive types in ThriftHiveMetastore #17268 (Author: Otakar Trunecek): Only collect stats for primitive types in ThriftHiveMetastore
- Fix ANALYZE TABLE for partitioned Hive tables with complex columns (array, map, struct).
- Improve performance of ANALYZE TABLE on hive tables with complex columns.
Fix CBO broadcast join reordering #17297 (Author: Rongrong Zhong): Fix CBO broadcast join reordering
- Fix reorder joins optimization where plan might not be optimal when original build side is larger than configured join-max-broadcast-table-size.

All Commits

0b50dcc Mark tdigest_agg function as public (Masha Basmanova)
73516ea Reduce memory used by primitive flat map value blocks (Sergii Druzkin)
246dff7 Avoid volatile writes for empty responses in PageBufferClient (James Petty)
f4ce316 Fix documentation for tdigest_agg (Masha Basmanova)
2da621d Reduce memory consumed by flat map value block (Sergii Druzkin)
33a3f4d Add Prepared statement sql to query lifecycle events (Shashwat Arghode)
492aded Disabling flaky test (Swapnil Tailor)
877ae0e Optimize creation of OrcDecompressor.OutputBuffer in OrcInputStream (Sergii Druzkin)
d53b41d Split ci workflow into multiple workflows (Swapnil Tailor)
2ff205f Fix CBO broadcast join reordering (Rongrong Zhong)
7330a04 Improve SimpleTtlNodeSelector performance (Neerad Somanchi)
2be3158 Fix long dictionary raw bytes estimate (Arunachalam Thirupathi)
211390b add warning when map with double or real as key is created (singcha)
60d644f Option to disable string dictionary encoding in OrcWriter (Arunachalam Thirupathi)
61ea2fa Use Factory method in OrcWriterOptions (Arunachalam Thirupathi)
a207136 Reuse compression buffer in OrcWriter (Arunachalam Thirupathi)
ae1dec7 Support enums in partition columnst (Pranjal Shankhdhar)
6ea0fd1 Add thrift serde support for ClusterStats (Abhishek Aggarwal)
aae9d9f Support ORC format caching for iceberg connector (JySongWithZhangCe)
a8fa57f Fix flaky TestClusterStatsResource (abhiseksaikia)
dfb6fd3 Use minified JS for production UI (Mayank Garg)
1b06b01 Make coordinator index page use /v1/queryState (Mayank Garg)
3e8e662 Fix query ordering comparator in QueryResource (Mayank Garg)
a7af002 Extend QueryStateInfo to handle Coordinator UI (Mayank Garg)
a25592e Only collect stats for primitive types in ThriftHiveMetastore (Otakar Trunecek)
621d42d Support basic timestamp in the iceberg connector (Chunxu Tang)
ba19b45 Increasing timeout for a test to avoid it's flakyness (Swapnil Tailor)
8987909 Update iceberg connector doc with hadoop catalog configs (Chunxu Tang)
4a96813 Improve MetadataQueryOptimizer with a threshold to limit calls to Metastore (ericyuliu)
dd0c621 Make sure ConsistentHashingNodeProvider returns unique candidates (Rongrong Zhong)
d37ac07 Spill large blocks to separate spill files in distinct aggregates (Arjun Gupta)
003ecf9 Test and fix cast from bigint to varchar (v-jizhang)
66e4f63 Remove iceberg.catalog.uri from iceberg connector configs (Chunxu Tang)
afdd49e Add Thrift support for ResourceGroupInfo (Zitong Wei)
c147721 Plumb enum type through create table/view and insert into DDL (mengdilin)
28abe81 Adding checkArgument function to properly throw Presto Exception when argument is invalid. (Amit Dutta)

Cherry-pick of trinodb/trino#10722 The ParquetWriter::flush method calls int OutputStreamSliceOutput::size() to get the data page offset which is a long, thus flushing fails trying to write files larger than ~2 GB with an integer overflow exception. Co-authored-by: Saulius Valatka <saulius.vl@gmail.com>

The goal of this task is to add Thrift annotations so that in the future we can use Thrift serde.

Port some code from parquet-mr repo https://github.com/apache/parquet-mr Co-authored-by: Gabor Szadovszky <gabor.szadovszky@cloudera.com> More details about Parquet Column Indexes feature: Column Indexes also named as page level indexes that have min/max values for each page in a given column chunk. When reading pages, a reader doesn't need to process the page header to determine whether the page could be skipped based on the statistics. More information about this feature can be found https://github.com/apache/parquet-format/blob/master/PageIndex.md

- In Disaggregated coordinator setup, we want tests to wait for the cluster to be up and ready before allowing queries to run. Sometimes it takes a while for one of the server to be up and causes query to timeout as they won't start executing and the test timeout reaches. Adding the waitForClusterToGetReady method to some of the missing places to avoid this to happen for tests. - Also changing order of RM creation given coordinator refreshAndStartQueries loop ends up eating lot of CPU and delaying RM to start. - Fixing TestMemoryManager#testClusterPoolsMultiCoordinator which start failling due to change in RM order as we were trying to reserve/free memory on RM as well. - Fixing TestServerInfoResource#createQueryRunnerWithNoClusterReadyCheck and testGetServerStateWithoutRequiredCoordinators which stucks in waiting for cluster to be ready due to the /v1/info/state check.

Referenced commits: apache/iceberg@718b85d, apache/iceberg@d5443e3

…perty pinot.topn_large

As of now, we only support streaming aggregation for the cases where group-by keys are the same as order-by keys, cases where group-by keys are a subset of order-by keys are not supported for now. Co-Authored-By: Zhan Yuan <yuanzhanhku@gmail.com>

We can always enable streaming aggregation for partial aggregations without affecting correctness. But if the data isn't clustered (for example: ordered) by the group-by keys, it may cause regressions on latency and resource usage. This session property is just a solution to force enabling streaming aggregation when we know the execution would benefit from partial streaming aggregation. We can work on determining it based on the input table properties later.

Storage based broadcast join uses distributed storage for storing broadcast table. Before broadcasting the data, driver will perform a size check to ensure that the table size is under threshold. If the table size is beyond threshold size, driver fails the query with exceeded broadcast memory limit. This threshold can be overriden using `spark_broadcast_join_max_memory_override` property and users can override it to insanely high value. This would prevent the query from failing in driver memory check but it will eventually fail on executor with JVM OOMs. This PR add two checks to prevent this kind of OOMs: 1. The first check is added in driver where threshold bytes is computed using threshold = min(spark_broadcast_join_max_memory_override, query_max_memory_per_node). This will max out the threshold to max available memory and fail the query on driver if size goes beyond `query_max_memory_per_node` 2. When hashtable is created on executors, entire data is read from storage and stored in a list<page>. This data is then deserailized which further inceased memory usage. While loading this data, memory usage is not updated which means that if the table is huge, we will keep loading the data until JVM OOMs. A memory callback is added in this logic where after loading data from each file, a callback will be made to update the memory usage so far. With this, EXECEEDED_MEMORY_LIMIT error is thrown as soon as memory usage in executor goes beyond `query_max_memory_per_node` while loading HT in memory

Switch join-distribution-type default value to AUTOMATIC. Switch optimizer.join-reordering-strategy default value to AUTOMATIC.

Code coverage will add synthetic members to classes that has been causing issues for getting tests to run with code coverage enabled. Skipping or ignoring these members is the fix.

If appendRowNumber flag is set to true, every page returned by the reader will contain an additional block in the end that stores the row numbers. If a file has n rows, row numbers range from [0 to n-1]

Cherry-pick of trinodb/trino#80 Presto doesn't maintain the quotedness of an identifier when using SqlQueryFormatter so it results in throwing parsing error when we run prepare query of the syntax [prestodb#10739]. This PR solves that above issue. Co-authored-by: praveenkrishna <praveenkrishna@tutanota.com>

Cherry-pick of trinodb/trino#6380: Identifiers with characters such as @ or : are not being treated as delimited identifiers, which causes issues when the planner creates synthetic identifiers as part of the plan IR expressions. When the IR expressions are serialized, they are not being properly quoted, which causes failures when parsing the plan on the work side. For example, for a query such as: SELECT try("a@b") FROM t The planner creates an expression of the form: "$internal$try"("$INTERNAL$BIND"("a@b", (a@b_0) -> "a@b_0")) The argument to the lambda expression is not properly quoted. Co-authored-by: Martin Traverso <mtraverso@gmail.com>

Cherry-pick of trinodb/trino#7252 Solves prestodb#16680 Co-authored-by: Shashikant Jaiswal <shashi@okera.com>

Presto Release Bot and others added 24 commits February 17, 2022 16:12

Prepare for next development iteration - 0.272-SNAPSHOT

953f44d

Support json type in PinotSegmentPageSource

6b5da9a

ThreadResource supports JSON serde at present.

c07106b

The goal of this task is to add Thrift annotations so that in the future we can use Thrift serde.

Add unit test for ThreadResource

c911689

Support iceberg connector concurrent insertion

59d092c

Referenced commits: apache/iceberg@718b85d, apache/iceberg@d5443e3

fixing presto pinot doc for pinot catalog configs and add session pro…

afd4d2f

…perty pinot.topn_large

Add ability to disable splitting file in hive connector

39d20a8

check field runtimeStats to prevent accessing undefined properties

5edaef8

Enable cost based optimization by default

e3c9af6

Switch join-distribution-type default value to AUTOMATIC. Switch optimizer.join-reordering-strategy default value to AUTOMATIC.

Create an abstract class for metadata of multiple iceberg catalogs

b1402dc

Remove the junit dependency to fix the build

39b6cea

Skip synthetic members to enable code coverage

7b83d6a

Code coverage will add synthetic members to classes that has been causing issues for getting tests to run with code coverage enabled. Skipping or ignoring these members is the fix.

Add default implementation to HiveMetastore to fix the build

aab1acf

Support optional row numbers in OrcSelectiveReader

3509718

If appendRowNumber flag is set to true, every page returned by the reader will contain an additional block in the end that stores the row numbers. If a file has n rows, row numbers range from [0 to n-1]

Add a new JDBC driver parameter timeZoneId

79370de

Cherry-pick of trinodb/trino#7252 Solves prestodb#16680 Co-authored-by: Shashikant Jaiswal <shashi@okera.com>

Add release notes for 0.271

71fd1c6

cocozianu changed the base branch from master to release-0.271 February 24, 2022 21:41

cocozianu closed this Feb 24, 2022

cocozianu deleted the release-notes-0.271 branch February 24, 2022 21:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add release notes for 0.271#17352

Add release notes for 0.271#17352
cocozianu wants to merge 24 commits intoprestodb:release-0.271from
cocozianu:release-notes-0.271

cocozianu commented Feb 24, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

Conversation

cocozianu commented Feb 24, 2022

Missing Release Notes

Amit Dutta

Masha Basmanova

Rongrong Zhong

singcha

Extracted Release Notes

All Commits

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants