Skip to content

Conversation

@sheajamba
Copy link
Member

Description

This PR adds some clarifications to what procedures are available when running ALTER TABLE EXECUTE.

Additional context and related issues

Release notes

(x) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Feb 29, 2024
@sheajamba sheajamba requested review from hashhar and mosabua February 29, 2024 18:37
@github-actions github-actions bot added the docs label Feb 29, 2024
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change hive.schema to example.test and explain in the opening sentence

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Refer to individual [connector documentation](../connector.md) for more
Refer to individual [connector documentation](connector) for more

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also maybe this sentence is redundant anyway .. since you are already linking above

Comment on lines 57 to 58
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Executable commands are contributed by connectors, for example, to collapse
files in a table that are over 128 megabytes in size with the `optimize` command
Executable commands are contributed by connectors, for example, to merge
files in a table that are over 128 megabytes in size with the `optimize` command

(It does more than just merging files in Iceberg though)

@sheajamba sheajamba force-pushed the alter-table-execute-docs branch from 1492a6d to ed12de3 Compare March 5, 2024 17:19
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimize is used to "merge" files which are under the specified size, not over the specified size

See https://trino.io/docs/current/connector/iceberg.html#optimize

The following statement merges files in a table that are under 128 megabytes in size:

ALTER TABLE test_table EXECUTE optimize(file_size_threshold => '128MB')

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Executable commands are contributed by connectors, for example, to merge files
Executable commands are contributed by connectors. For example, to merge files

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has to be reworded ..

its not the table that is over 128mb .. its about merging all the many data files in that table that are smaller than 128mb into fewer files that are larger than 128mb

So the initial suggestion from @findinpath is probably better .. but also a bit terse and assuming a lot of knowledge.

Lastly .. do you think 128mb file size is a reasonable example @findinpath ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lastly .. do you think 128mb file size is a reasonable example @findinpath ?

I'm missing field experience here.

@raunaqmorarka can you help on this one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use an example where files are obviously small, like 16mb

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

128 was chosen because people copy the examples and 128MB is ideal size for most parquet based tables.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great .. so we should explain that this command is useful when a user observes many small files in the 16mb range or smaller .. and it gets it to the ideal size of 128MB .. ping me to work on wording together if you like @sheajamba

@sheajamba sheajamba force-pushed the alter-table-execute-docs branch from ed12de3 to 589fb49 Compare March 28, 2024 21:01
@sheajamba sheajamba requested review from hashhar and mosabua March 28, 2024 21:01
@sheajamba sheajamba force-pushed the alter-table-execute-docs branch from 589fb49 to eaf4a52 Compare March 29, 2024 14:11
Copy link
Member

@mosabua mosabua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now.

@mosabua mosabua merged commit 89232e5 into trinodb:master Mar 31, 2024
@github-actions github-actions bot added this to the 444 milestone Mar 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

5 participants