Add release notes for 0.254#16141
Conversation
e7c12a3 to
4361a9e
Compare
There was a problem hiding this comment.
functions need to be written as :func:poisson_cdf
check e2e4751#diff-9652f5c5a99ec7ab2293ce1dfbee5491f95d8b02ed72259cbac24f0dfbec565fR13
cc183dd to
6106301
Compare
There was a problem hiding this comment.
I don't think we need a release note for new documentation. but if we wanted to keep it, it would go in the hive section.
There was a problem hiding this comment.
This should all be part of the Hive Changes section
There was a problem hiding this comment.
I would maybe add a new section called "Iceberg Changes", similar to other connectors. and then have the release note "Add new Iceberg Connector", with a link to the documentation.
There was a problem hiding this comment.
removed this since there's no documentation added
There was a problem hiding this comment.
How about "Add fragment result caching support for UNNEST queries"?
There was a problem hiding this comment.
is there a configuration property? what is MaxResults and why is it one word?
There was a problem hiding this comment.
@v-jizhang can you help add more details around MaxResults here. I don't see a configuration property.
There was a problem hiding this comment.
@sujay-jain @rschlussel There is no configuration property for MaxResults. Set it according to the page https://docs.aws.amazon.com/glue/latest/webapi/API_GetPartitions.html
There was a problem hiding this comment.
what is "bucket sort order" and how does the user use this?
There was a problem hiding this comment.
@v-jizhang @highker could you provide more details here.
There was a problem hiding this comment.
This is a back port from Trino to fix a bug, trinodb/trino#2450
There was a problem hiding this comment.
@rschlussel the release notes match those of Trino, and it's a backport. Do you think we should keep this as is?
There was a problem hiding this comment.
Thanks @v-jizhang. looked at that PR. We should make our release note clearer. Something like.
Fix a bug where the files would not be sorted when inserting into bucketed sorted tables with Glue
Also, since it's a bug fix, should go first in the section
There was a problem hiding this comment.
Can we add more detail in the release note here about what we mean by schema evolution? Does it mean you can query tables that have had columns added/deleted? column types changed?
There was a problem hiding this comment.
@imjalpreet @zhenxiao - could you help provide more details
There was a problem hiding this comment.
support add/delete/replace columns in partition schema
@imjalpreet could you please add more details?
There was a problem hiding this comment.
We can write something like:
Add support for allowing to match columns between table and partition schemas by names when the configuration property ``hive.parquet.use-column-names`` or the hive catalog session property ``parquet_use_column_names`` is set to true. By default they are mapped by index.
There was a problem hiding this comment.
If we want to add more:
This allows schema evolution between table and partition in addition to schema evolution between the table/partition and file. Columns can be re-ordered, added or dropped between partition and table schemas.
There was a problem hiding this comment.
What's partition cache validation? Also, what's the default value? Is it 0, which would I guess mean no validation or is it something else?
There was a problem hiding this comment.
@NikhilCollooru can you provide more details here.
There was a problem hiding this comment.
Partition cache validation means we validate the value returned from partition cache with the actual value from Metastore. Yes default value is 0.0 meaning no validation. If we set it to 50.0, then it means 50% of the get partitions calls will be validated.
There was a problem hiding this comment.
Nit: should be a period and not a comma before "spilling is triggered"
16f7c2d to
c645556
Compare
cd8a8c2 to
1186bd0
Compare
There was a problem hiding this comment.
Thanks @v-jizhang. looked at that PR. We should make our release note clearer. Something like.
Fix a bug where the files would not be sorted when inserting into bucketed sorted tables with Glue
Also, since it's a bug fix, should go first in the section
There was a problem hiding this comment.
@v-jizhang I'm still very confused about this release note. It seems like what the PR (#16012) actually does is set the batch size for how many partitions we fetch at once in each API call. But @pettyjamesm was skeptical about whether it even works? My instinct is this doesn't need a release note, but if it does it should be something like Improve efficiency of getting partitions from the Glue Metastore by batching requests. Would that be accurate? @aweisberg @pettyjamesm what do you think?
There was a problem hiding this comment.
Yeah I agree. I don't think this is turning on batching the requests it's just limiting the batch size.
Improve efficiency of partition fe5tching from Glue by settting GetPartitions MaxResults
There was a problem hiding this comment.
going with removing the release note.
There was a problem hiding this comment.
Add support for configuring the Glue endpoint URL :doc:`/connector/hive`.
There was a problem hiding this comment.
Maybe start this "Add support for" instead of "Allow", so it matches the other release notes.
1186bd0 to
2629bcf
Compare
319cc16 to
dfce5cd
Compare
|
Looks good, but we'll need to add a release note for the regression fix for maps that's going to be added to the 0.254 branch, so holding off on merging for now. |
mayankgarg1990
left a comment
There was a problem hiding this comment.
Can you copy the release note from #16073 - this was added after the cut and hence not picked up by the script that created the initial PR
There was a problem hiding this comment.
IndexOutOfBoundsException in double back quotes
There was a problem hiding this comment.
This is too big for a release note.
There was a problem hiding this comment.
@rschlussel - can we not try to get this within 200-250 chars at max - it is fine to provide people a link to the PR for further clarification
There was a problem hiding this comment.
how's this? it's 390 characters, but mostly because the property names are long. I don't think it can be shorter and still convey the relevant information.
Remove spilling strategy ``PER_QUERY_MEMORY_LIMIT`` and add configuration property ``experimental.query-limit-spill-enabled`` and session property ``query_limit_spill_enabled``. When set to ``true`` and the spill strategy is not ``PER_TASK_MEMORY_THRESHOLD``, then we will spill whenever a query uses more than the per-node total memory limit in combined revocable and non-revocable memory.
There was a problem hiding this comment.
Add support to validate the values returned from the partition cache with the actual value from Metastore. This can be enabled by setting the configuration property ``hive......``
There was a problem hiding this comment.
Should we keep the default value note?
There was a problem hiding this comment.
Period after URL
Add support for configuring the Glue endpoint URL. See :doc:`/connector/hive`.
There was a problem hiding this comment.
I don't think we need to mention the default value since that would already be there.
Add support for the S3 Intelligent-Tiering storage class writing data. This can be enabled by setting the configuration property ``hive.s3.storage-class`` to ``INTELLIGENT_TIERING``.
There was a problem hiding this comment.
Optimize Driver commit memory footprint.
34d4ac0 to
8fdee7b
Compare
c3426e4 to
503e9fe
Compare
There was a problem hiding this comment.
this is too low level. What's the user implication?
There was a problem hiding this comment.
nit: can help with investigating GC issues
nit: no comma between "coordinator" and "by"
There was a problem hiding this comment.
nit: Add support to validate -> Add support for validating
503e9fe to
1a0848b
Compare
rschlussel
left a comment
There was a problem hiding this comment.
Looks good. I'll merge once @mayankgarg1990 approves.
mayankgarg1990
left a comment
There was a problem hiding this comment.
Looks fine overall. Some changes before this can be merged
There was a problem hiding this comment.
nit: shorten the second sentence to
Introduced by :pr:`16039`.
finding which release is trivial
There was a problem hiding this comment.
Lets follow a similar syntax for reporting regressions.
Fix a CPU regression for queries using `element_at` for ``MAP``. Introduced by :pr:`16027`
There was a problem hiding this comment.
Re: the second line - the default value is 0.0 - your previous sentence says "This can be enabled by " - that indicates that this is disabled by default. Had it been enabled by default - the sentence would have read - "This can be disabled by" - and hence it is redundant and can be removed.
1a0848b to
4b07bfc
Compare
Missing Release Notes
Abhisek Gautam Saikia
Akhil Umesh Mehendale
Tal Galili
Vic Zhang
guhanjie
Extracted Release Notes
/connector/hive./connector/hive./connector/hive./connector/hive.PER_QUERY_MEMORY_LIMITand instead add configuration propertyexperimental.query-limit-spill-enabledand session propertyquery_limit_spill_enabled. When this property is set totrue, and the spill strategy is notPER_TASK_MEMORY_THRESHOLD, then we will spill whenever a query uses more than the per-node total memory limit in combined revocable and non-revocable memory, in addition to whenever the memory pool exceeds the spill threshold. This fixes an issue where using thePER_QUERY_MEMORY_LIMITspilling strategy could prevent the oom killer from running when the memory pool was full. The issue is still present for thePER_TASK_MEMORY_THRESHOLDspilling strategy.PER_QUERY_MEMORY_LIMITand instead add configuration propertyexperimental.query-limit-spill-enabledand session propertyquery_limit_spill_enabled. When this property is set totrue, and the spill strategy is notPER_TASK_MEMORY_THRESHOLD, then we will spill whenever a query uses more than the per-node total memory limit in combined revocable and non-revocable memory, in addition to whenever the memory pool exceeds the spill threshold. This fixes an issue where using thePER_QUERY_MEMORY_LIMITspilling strategy could prevent the oom killer from running when the memory pool was full. The issue is still present for thePER_TASK_MEMORY_THRESHOLDspilling strategy.PER_QUERY_MEMORY_LIMITand instead add configuration propertyexperimental.query-limit-spill-enabledand session propertyquery_limit_spill_enabled. When this property is set totrue, and the spill strategy is notPER_TASK_MEMORY_THRESHOLD, then we will spill whenever a query uses more than the per-node total memory limit in combined revocable and non-revocable memory, in addition to whenever the memory pool exceeds the spill threshold. This fixes an issue where using thePER_QUERY_MEMORY_LIMITspilling strategy could prevent the oom killer from running when the memory pool was full. The issue is still present for thePER_TASK_MEMORY_THRESHOLDspilling strategy.PER_QUERY_MEMORY_LIMITand instead add configuration propertyexperimental.query-limit-spill-enabledand session propertyquery_limit_spill_enabled. When this property is set totrue, and the spill strategy is notPER_TASK_MEMORY_THRESHOLD, then we will spill whenever a query uses more than the per-node total memory limit in combined revocable and non-revocable memory, in addition to whenever the memory pool exceeds the spill threshold. This fixes an issue where using thePER_QUERY_MEMORY_LIMITspilling strategy could prevent the oom killer from running when the memory pool was full. The issue is still present for thePER_TASK_MEMORY_THRESHOLDspilling strategy.table-finish-operator-memory-tracking-enabledconfiguration property totrue.hive.partition-cache-validation-percentageconfiguration parameter.16039.All Commits