Add release notes for 0.271#17352
Closed
cocozianu wants to merge 24 commits intoprestodb:release-0.271from
Closed
Conversation
Cherry-pick of trinodb/trino#10722 The ParquetWriter::flush method calls int OutputStreamSliceOutput::size() to get the data page offset which is a long, thus flushing fails trying to write files larger than ~2 GB with an integer overflow exception. Co-authored-by: Saulius Valatka <saulius.vl@gmail.com>
The goal of this task is to add Thrift annotations so that in the future we can use Thrift serde.
Port some code from parquet-mr repo https://github.com/apache/parquet-mr Co-authored-by: Gabor Szadovszky <gabor.szadovszky@cloudera.com> More details about Parquet Column Indexes feature: Column Indexes also named as page level indexes that have min/max values for each page in a given column chunk. When reading pages, a reader doesn't need to process the page header to determine whether the page could be skipped based on the statistics. More information about this feature can be found https://github.com/apache/parquet-format/blob/master/PageIndex.md
- In Disaggregated coordinator setup, we want tests to wait for the cluster to be up and ready before allowing queries to run. Sometimes it takes a while for one of the server to be up and causes query to timeout as they won't start executing and the test timeout reaches. Adding the waitForClusterToGetReady method to some of the missing places to avoid this to happen for tests. - Also changing order of RM creation given coordinator refreshAndStartQueries loop ends up eating lot of CPU and delaying RM to start. - Fixing TestMemoryManager#testClusterPoolsMultiCoordinator which start failling due to change in RM order as we were trying to reserve/free memory on RM as well. - Fixing TestServerInfoResource#createQueryRunnerWithNoClusterReadyCheck and testGetServerStateWithoutRequiredCoordinators which stucks in waiting for cluster to be ready due to the /v1/info/state check.
…perty pinot.topn_large
As of now, we only support streaming aggregation for the cases where group-by keys are the same as order-by keys, cases where group-by keys are a subset of order-by keys are not supported for now. Co-Authored-By: Zhan Yuan <yuanzhanhku@gmail.com>
We can always enable streaming aggregation for partial aggregations without affecting correctness. But if the data isn't clustered (for example: ordered) by the group-by keys, it may cause regressions on latency and resource usage. This session property is just a solution to force enabling streaming aggregation when we know the execution would benefit from partial streaming aggregation. We can work on determining it based on the input table properties later.
Storage based broadcast join uses distributed storage for storing broadcast table. Before broadcasting the data, driver will perform a size check to ensure that the table size is under threshold. If the table size is beyond threshold size, driver fails the query with exceeded broadcast memory limit. This threshold can be overriden using `spark_broadcast_join_max_memory_override` property and users can override it to insanely high value. This would prevent the query from failing in driver memory check but it will eventually fail on executor with JVM OOMs. This PR add two checks to prevent this kind of OOMs: 1. The first check is added in driver where threshold bytes is computed using threshold = min(spark_broadcast_join_max_memory_override, query_max_memory_per_node). This will max out the threshold to max available memory and fail the query on driver if size goes beyond `query_max_memory_per_node` 2. When hashtable is created on executors, entire data is read from storage and stored in a list<page>. This data is then deserailized which further inceased memory usage. While loading this data, memory usage is not updated which means that if the table is huge, we will keep loading the data until JVM OOMs. A memory callback is added in this logic where after loading data from each file, a callback will be made to update the memory usage so far. With this, EXECEEDED_MEMORY_LIMIT error is thrown as soon as memory usage in executor goes beyond `query_max_memory_per_node` while loading HT in memory
Switch join-distribution-type default value to AUTOMATIC. Switch optimizer.join-reordering-strategy default value to AUTOMATIC.
Code coverage will add synthetic members to classes that has been causing issues for getting tests to run with code coverage enabled. Skipping or ignoring these members is the fix.
If appendRowNumber flag is set to true, every page returned by the reader will contain an additional block in the end that stores the row numbers. If a file has n rows, row numbers range from [0 to n-1]
Cherry-pick of trinodb/trino#80 Presto doesn't maintain the quotedness of an identifier when using SqlQueryFormatter so it results in throwing parsing error when we run prepare query of the syntax [prestodb#10739]. This PR solves that above issue. Co-authored-by: praveenkrishna <praveenkrishna@tutanota.com>
Cherry-pick of trinodb/trino#6380: Identifiers with characters such as @ or : are not being treated as delimited identifiers, which causes issues when the planner creates synthetic identifiers as part of the plan IR expressions. When the IR expressions are serialized, they are not being properly quoted, which causes failures when parsing the plan on the work side. For example, for a query such as: SELECT try("a@b") FROM t The planner creates an expression of the form: "$internal$try"("$INTERNAL$BIND"("a@b", (a@b_0) -> "a@b_0")) The argument to the lambda expression is not properly quoted. Co-authored-by: Martin Traverso <mtraverso@gmail.com>
Cherry-pick of trinodb/trino#7252 Solves prestodb#16680 Co-authored-by: Shashikant Jaiswal <shashi@okera.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Missing Release Notes
Amit Dutta
Masha Basmanova
Rongrong Zhong
singcha
Extracted Release Notes
experimental.distinct-aggregation-large-block-spill-enabledto enable spilling of blocks that are larger thanexperimental.distinct-aggregation-large-block-size-thresholdbytes into a separate spill file. This can be overridden bydistinct_aggregation_large_block_spill_enabledsession property.experimental.distinct-aggregation-large-block-size-thresholdto define the threshold size beyond which the block will be spilled into a separate spill file. This can be overridden bydistinct_aggregation_large_block_size_thresholdsession property.join-max-broadcast-table-size.All Commits