-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Clarify docs for ALTER TABLE EXECUTE #20890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change hive.schema to example.test and explain in the opening sentence
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Refer to individual [connector documentation](../connector.md) for more | |
| Refer to individual [connector documentation](connector) for more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also maybe this sentence is redundant anyway .. since you are already linking above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Executable commands are contributed by connectors, for example, to collapse | |
| files in a table that are over 128 megabytes in size with the `optimize` command | |
| Executable commands are contributed by connectors, for example, to merge | |
| files in a table that are over 128 megabytes in size with the `optimize` command |
(It does more than just merging files in Iceberg though)
1492a6d to
ed12de3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimize is used to "merge" files which are under the specified size, not over the specified size
See https://trino.io/docs/current/connector/iceberg.html#optimize
The following statement merges files in a table that are under 128 megabytes in size:
ALTER TABLE test_table EXECUTE optimize(file_size_threshold => '128MB')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Executable commands are contributed by connectors, for example, to merge files | |
| Executable commands are contributed by connectors. For example, to merge files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has to be reworded ..
its not the table that is over 128mb .. its about merging all the many data files in that table that are smaller than 128mb into fewer files that are larger than 128mb
So the initial suggestion from @findinpath is probably better .. but also a bit terse and assuming a lot of knowledge.
Lastly .. do you think 128mb file size is a reasonable example @findinpath ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lastly .. do you think 128mb file size is a reasonable example @findinpath ?
I'm missing field experience here.
@raunaqmorarka can you help on this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use an example where files are obviously small, like 16mb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
128 was chosen because people copy the examples and 128MB is ideal size for most parquet based tables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great .. so we should explain that this command is useful when a user observes many small files in the 16mb range or smaller .. and it gets it to the ideal size of 128MB .. ping me to work on wording together if you like @sheajamba
ed12de3 to
589fb49
Compare
589fb49 to
eaf4a52
Compare
mosabua
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now.
Description
This PR adds some clarifications to what procedures are available when running
ALTER TABLE EXECUTE.Additional context and related issues
Release notes
(x) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text: