Added support for sorted_by while creating iceberg table#12872
Added support for sorted_by while creating iceberg table#12872osscm wants to merge 1 commit intotrinodb:masterfrom
Conversation
|
cc @findepi @alexjo2144 thanks! |
|
Rather than copying the code from @mdesmet 's PR, can you cherry-pick the commit from that branch and then add work in commits on top of it? |
Thanks, @alexjo2144 The borrowed code is in
In this PR, we are adding new test methods and classes ( Not sure about the process, but thought of adding @mdesmet as the co-author if that make sense. |
I had also raised this in the PR. This would also apply to orc_bloom_filter defintions etc, i'll move this code to |
sure @mdesmet, I can pull only that commit which has the change only to the |
| } | ||
|
|
||
| @SuppressWarnings("unchecked") | ||
| public static List<String> getSortOrder(Map<String, Object> tableProperties) |
There was a problem hiding this comment.
The sort-order is stored to metadata, but it isn't used from anywhere if my understanding is correct (wrong?). What's the benefit adding this property in the current shape? Don't we need to respect the property during writes? It would be nice if you can add tests showing the benefit.
There was a problem hiding this comment.
@ebyhr this PR is the first step in supporting `sorted_by’.
So, this PR is only intended to add the support of sorted_by for the CREATE TABLE DDL syntax.
subsequently we can add for ALTER TABLE and then to support while writing.
DDL changes will help when Spark is being used for the ingestion (and that is the case we see it almost all the time). Spark uses icebergs sorting spec to write as well.
There was a problem hiding this comment.
I disagree adding sorted_by property without proper support. It's confusing to users.
@mdesmet https://github.com/trinodb/trino/pull/12227/commits seems to the commit only |
@osscm: Those commits were squashed based on the feedback in the PR. I still suggest you to cherry pick the relevant commit and reuse the provided methods in |
|
This PR has been inactive for 4 weeks. What's the status here - do we need it changed, or is it waiting on that other PR for direction? cc @ebyhr |
@colebow I have to resume on this PR if there are any pending commits. #12227 PR was using quote identifier in partitions, I have tried to use the commit from that PR, but it was not mergeable. |
|
it seems the PR adds ability to declare |
|
Sure @findpi
I can also work on implementing the writer logic as well.
…On Mon, Aug 22, 2022 at 11:52 AM Piotr Findeisen ***@***.***> wrote:
it seems the PR adds ability to declare sorted_by, but doesn't implement
actual sorting.
It's a good step, but not something we would want to release, so not
something we would want to merge.
@osscm <https://github.com/osscm> do you also want to implement sorting
writer? or I can probably find someone on my team to work on this, if you
prefer this way.
—
Reply to this email directly, view it on GitHub
<#12872 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AXQ2PYSYXYJX3SSIHNJTA2DV2PD7LANCNFSM5Y427YJQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
Sure @homar, will change it, it should be sortedOrder, and not partitioning.
…On Tue, Sep 6, 2022 at 2:53 PM Homar ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergTableProperties.java
<#12872 (comment)>:
> @@ -121,6 +131,13 @@ public static List<String> getPartitioning(Map<String, Object> tableProperties)
return partitioning == null ? ImmutableList.of() : ImmutableList.copyOf(partitioning);
}
+ @SuppressWarnings("unchecked")
+ public static List<String> getSortOrder(Map<String, Object> tableProperties)
+ {
+ List<String> partitioning = (List<String>) tableProperties.get(SORTED_BY_PROPERTY);
I am little confused by this naming, it is column order or something like
that not partitioning, or I don't understand something
—
Reply to this email directly, view it on GitHub
<#12872 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AXQ2PYUKAVL3GMDN67NYIMTV464O7ANCNFSM5Y427YJQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
@alexjo2144 is it possible to start the review for the DDL part, meanwhile I'll add the writer part as well. |
993844b to
ec2ba3f
Compare
@findepi do you think, if we can keep this PR small so that it's easy to review and merge. and then in the 2nd PR we can support Sorting Writer. If PR is bit bigger it becomes difficult to review and merge. |
fc819a8 to
bc46af5
Compare
Yes, it makes sense to keep changes separate, as in separate commits. However, we don't like to expose user-facing functionality that does nothing. If we merge this PR, we expose Trino Iceberg connector table property for defining sort, which is ignored by Trino. That makes Trino users confused and sad. @alexjo2144 could you please help here? would you be able to implement a sorting writer on top of this PR? |
sure @findepi |
|
Superseded by #14891 |
Description
This PR is to allow to provide
sorted_byas the properties like partitioning.The sorted field definition will follow these rules:
improvement, new feature
the Iceberg connector
Add
sorted_byproperties while creating an Iceberg table like the Hive table.Related issues, pull requests, and links
Fixes #12447
Documentation
( ) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
( ) No release notes entries required.
( ) Release notes entries required with the following suggested text:
Will add changes to the doc as well.