Skip to content

Add release notes for 0.271#17352

Closed
cocozianu wants to merge 24 commits intoprestodb:release-0.271from
cocozianu:release-notes-0.271
Closed

Add release notes for 0.271#17352
cocozianu wants to merge 24 commits intoprestodb:release-0.271from
cocozianu:release-notes-0.271

Conversation

@cocozianu
Copy link
Copy Markdown
Contributor

Missing Release Notes

Amit Dutta

Masha Basmanova

Rongrong Zhong

singcha

Extracted Release Notes

All Commits

  • 0b50dcc Mark tdigest_agg function as public (Masha Basmanova)
  • 73516ea Reduce memory used by primitive flat map value blocks (Sergii Druzkin)
  • 246dff7 Avoid volatile writes for empty responses in PageBufferClient (James Petty)
  • f4ce316 Fix documentation for tdigest_agg (Masha Basmanova)
  • 2da621d Reduce memory consumed by flat map value block (Sergii Druzkin)
  • 33a3f4d Add Prepared statement sql to query lifecycle events (Shashwat Arghode)
  • 492aded Disabling flaky test (Swapnil Tailor)
  • 877ae0e Optimize creation of OrcDecompressor.OutputBuffer in OrcInputStream (Sergii Druzkin)
  • d53b41d Split ci workflow into multiple workflows (Swapnil Tailor)
  • 2ff205f Fix CBO broadcast join reordering (Rongrong Zhong)
  • 7330a04 Improve SimpleTtlNodeSelector performance (Neerad Somanchi)
  • 2be3158 Fix long dictionary raw bytes estimate (Arunachalam Thirupathi)
  • 211390b add warning when map with double or real as key is created (singcha)
  • 60d644f Option to disable string dictionary encoding in OrcWriter (Arunachalam Thirupathi)
  • 61ea2fa Use Factory method in OrcWriterOptions (Arunachalam Thirupathi)
  • a207136 Reuse compression buffer in OrcWriter (Arunachalam Thirupathi)
  • ae1dec7 Support enums in partition columnst (Pranjal Shankhdhar)
  • 6ea0fd1 Add thrift serde support for ClusterStats (Abhishek Aggarwal)
  • aae9d9f Support ORC format caching for iceberg connector (JySongWithZhangCe)
  • a8fa57f Fix flaky TestClusterStatsResource (abhiseksaikia)
  • dfb6fd3 Use minified JS for production UI (Mayank Garg)
  • 1b06b01 Make coordinator index page use /v1/queryState (Mayank Garg)
  • 3e8e662 Fix query ordering comparator in QueryResource (Mayank Garg)
  • a7af002 Extend QueryStateInfo to handle Coordinator UI (Mayank Garg)
  • a25592e Only collect stats for primitive types in ThriftHiveMetastore (Otakar Trunecek)
  • 621d42d Support basic timestamp in the iceberg connector (Chunxu Tang)
  • ba19b45 Increasing timeout for a test to avoid it's flakyness (Swapnil Tailor)
  • 8987909 Update iceberg connector doc with hadoop catalog configs (Chunxu Tang)
  • 4a96813 Improve MetadataQueryOptimizer with a threshold to limit calls to Metastore (ericyuliu)
  • dd0c621 Make sure ConsistentHashingNodeProvider returns unique candidates (Rongrong Zhong)
  • d37ac07 Spill large blocks to separate spill files in distinct aggregates (Arjun Gupta)
  • 003ecf9 Test and fix cast from bigint to varchar (v-jizhang)
  • 66e4f63 Remove iceberg.catalog.uri from iceberg connector configs (Chunxu Tang)
  • afdd49e Add Thrift support for ResourceGroupInfo (Zitong Wei)
  • c147721 Plumb enum type through create table/view and insert into DDL (mengdilin)
  • 28abe81 Adding checkArgument function to properly throw Presto Exception when argument is invalid. (Amit Dutta)

Presto Release Bot and others added 24 commits February 17, 2022 16:12
Cherry-pick of trinodb/trino#10722

The ParquetWriter::flush method calls int OutputStreamSliceOutput::size()
to get the data page offset which is a long, thus flushing fails trying
to write files larger than ~2 GB with an integer overflow exception.

Co-authored-by: Saulius Valatka <saulius.vl@gmail.com>
The goal of this task is to add Thrift annotations so that in the future we can use Thrift serde.
Port some code from parquet-mr repo https://github.com/apache/parquet-mr
Co-authored-by: Gabor Szadovszky <gabor.szadovszky@cloudera.com>

More details about Parquet Column Indexes feature:

Column Indexes also named as page level indexes that have min/max values for each page in a given column chunk. When reading pages, a reader doesn't need to process the page header to determine whether the page could be skipped based on the statistics. More information about this feature can be found https://github.com/apache/parquet-format/blob/master/PageIndex.md
- In Disaggregated coordinator setup, we want tests to wait for the cluster to be up and ready before allowing queries to run.
Sometimes it takes a while for one of the server to be up and causes query to timeout as they won't start executing and the test timeout reaches.
Adding the waitForClusterToGetReady method to some of the missing places to avoid this to happen for tests.
- Also changing order of RM creation given coordinator refreshAndStartQueries loop ends up eating lot of CPU and delaying RM to start.
- Fixing TestMemoryManager#testClusterPoolsMultiCoordinator which start failling due to change in RM order as we were trying to reserve/free memory on RM as well.
- Fixing TestServerInfoResource#createQueryRunnerWithNoClusterReadyCheck and testGetServerStateWithoutRequiredCoordinators which stucks in waiting for cluster to be ready due to the /v1/info/state check.
As of now, we only support streaming aggregation for the cases where group-by keys
are the same as order-by keys, cases where group-by keys are a subset of order-by keys
are not supported for now.

Co-Authored-By: Zhan Yuan <yuanzhanhku@gmail.com>
We can always enable streaming aggregation for partial aggregations without affecting correctness.
But if the data isn't clustered (for example: ordered) by the group-by keys, it may cause regressions on latency
and resource usage. This session property is just a solution to force enabling streaming aggregation
when we know the execution would benefit from partial streaming aggregation.
We can work on determining it based on the input table properties later.
Storage based broadcast join uses distributed storage for storing
broadcast table. Before broadcasting the data, driver will perform
a size check to ensure that the table size is under threshold. If
the table size is beyond threshold size, driver fails the query with
exceeded broadcast memory limit. This threshold can be overriden
using `spark_broadcast_join_max_memory_override` property and users
can override it to insanely high value. This would prevent the query
from failing in driver memory check but it will eventually fail on
executor with JVM OOMs.

This PR add two checks to prevent this kind of OOMs:
1. The first check is added in driver where threshold bytes is computed
using threshold = min(spark_broadcast_join_max_memory_override,
query_max_memory_per_node). This will max out the threshold to max
available memory and fail the query on driver if size goes beyond
`query_max_memory_per_node`

2. When hashtable is created on executors, entire data is read from
storage and stored in a list<page>. This data is then deserailized
which further inceased memory usage. While loading this data, memory
usage is not updated which means that if the table is huge, we will
keep loading the data until JVM OOMs. A memory callback is added in
this logic where after loading data from each file, a callback will
be made to update the memory usage so far. With this, EXECEEDED_MEMORY_LIMIT
error is thrown as soon as memory usage in executor goes beyond
`query_max_memory_per_node` while loading HT in memory
Switch join-distribution-type default value to AUTOMATIC.
Switch optimizer.join-reordering-strategy default value to AUTOMATIC.
Code coverage will add synthetic members to classes that has been
causing issues for getting tests to run with code coverage
enabled. Skipping or ignoring these members is the fix.
If appendRowNumber flag is set to true, every page returned by the
reader will contain an additional block in the end that stores the
row numbers. If a file has n rows, row numbers range from [0 to n-1]
Cherry-pick of trinodb/trino#80

Presto doesn't maintain the quotedness of an identifier when
using SqlQueryFormatter so it results in throwing parsing error
when we run prepare query of the syntax
[prestodb#10739].
This PR solves that above issue.

Co-authored-by: praveenkrishna <praveenkrishna@tutanota.com>
Cherry-pick of trinodb/trino#6380:

Identifiers with characters such as @ or : are not being treated as
delimited identifiers, which causes issues when the planner creates
synthetic identifiers as part of the plan IR expressions. When the
IR expressions are serialized, they are not being properly quoted,
which causes failures when parsing the plan on the work side.

For example, for a query such as:

    SELECT try("a@b") FROM t

The planner creates an expression of the form:

    "$internal$try"("$INTERNAL$BIND"("a@b", (a@b_0) -> "a@b_0"))

The argument to the lambda expression is not properly quoted.

Co-authored-by: Martin Traverso <mtraverso@gmail.com>
Cherry-pick of trinodb/trino#7252
Solves prestodb#16680

Co-authored-by: Shashikant Jaiswal <shashi@okera.com>
@cocozianu cocozianu changed the base branch from master to release-0.271 February 24, 2022 21:41
@cocozianu cocozianu closed this Feb 24, 2022
@cocozianu cocozianu deleted the release-notes-0.271 branch February 24, 2022 21:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.