-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Docs: Document compute_partition_stats procedure #13532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
docs/docs/spark-procedures.md
Outdated
| This procedure computes the stats incrementally from the last snapshot that has partition stats file until the given | ||
| snapshot (uses current snapshot if not specified) and writes the combined result into a `PartitionStatisticsFile` | ||
| after merging the partition stats. Does a full compute if previous statistics file does not exist. Also registers the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: It also registers the 'PartitionStatisticsFile' to the table metadata. (complete sentence)
docs/docs/spark-procedures.md
Outdated
| ### `compute_partition_stats` | ||
| This procedure computes the stats incrementally from the last snapshot that has partition stats file until the given |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we have 'partitionStatisticsFile' in some place and not in the first place?
docs/docs/spark-procedures.md
Outdated
| #### Examples | ||
| Collect statistics of the latest snapshot of table `my_table` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it be 'partition stats' to clarify
docs/docs/spark-procedures.md
Outdated
| | Argument Name | Required? | Type | Description | | ||
| |---------------|-----------|---------------|-------------------------------------| | ||
| | `table` | ✔️ | string | Name of the table | | ||
| | `snapshot_id` | | string | Id of the snapshot to collect stats | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
describe optional behavior
|
@szehon-ho: Thanks for the review. I have updated the document. |
szehon-ho
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks looks better, some more nits/comments
docs/docs/spark-procedures.md
Outdated
| ### `compute_partition_stats` | ||
| This procedure computes the stats incrementally from the last snapshot that has `PartitionStatisticsFile` until the given |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: a 'PartitionStatisticsFile' (need article for complete sentence)
docs/docs/spark-procedures.md
Outdated
| This procedure computes the stats incrementally from the last snapshot that has `PartitionStatisticsFile` until the given | ||
| snapshot (uses current snapshot if not specified) and writes the combined result into a `PartitionStatisticsFile` | ||
| after merging the partition stats. It performs a full compute if previous statistics file does not exist. It also registers the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should specify type of stats file, and also add article:
if the previous partition statistics file
docs/docs/spark-procedures.md
Outdated
| This procedure computes the stats incrementally from the last snapshot that has `PartitionStatisticsFile` until the given | ||
| snapshot (uses current snapshot if not specified) and writes the combined result into a `PartitionStatisticsFile` | ||
| after merging the partition stats. It performs a full compute if previous statistics file does not exist. It also registers the | ||
| `PartitionStatisticsFile` to table metadata. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the table metadata (need article)
docs/docs/spark-procedures.md
Outdated
| | Argument Name | Required? | Type | Description | | ||
| |---------------|-----------|---------------|---------------------------------------------------------------------------------------| | ||
| | `table` | ✔️ | string | Name of the table | | ||
| | `snapshot_id` | | string | Id of the snapshot to collect partition stats. By default current snapshot id is used | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'is used' sounds unnecessary. Hm how about just: defaults to current snapshot id?
docs/docs/spark-procedures.md
Outdated
| | Output Name | Type | Description | | ||
| |-------------------|--------|-----------------------------------------------------------| | ||
| | `partition_statistics_file` | string | Path to partition stats file created from by this command | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'to the partition stats file'. (add article)
Also , 'from by', only need one. Use by?
docs/docs/spark-procedures.md
Outdated
| | Argument Name | Required? | Type | Description | | ||
| |---------------|-----------|---------------|---------------------------------------------------------------------------------------| | ||
| | `table` | ✔️ | string | Name of the table | | ||
| | `snapshot_id` | | string | Id of the snapshot to collect partition stats. By default current snapshot id is used | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
collect => compute (follows the command name)
docs/docs/spark-procedures.md
Outdated
| ### `compute_partition_stats` | ||
| This procedure computes the stats incrementally from the last snapshot that has `PartitionStatisticsFile` until the given |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the stats => partitions statistics.
By the way, can we link this text the spec about partition stats? (like the above command)
docs/docs/spark-procedures.md
Outdated
| This procedure computes the stats incrementally from the last snapshot that has `PartitionStatisticsFile` until the given | ||
| snapshot (uses current snapshot if not specified) and writes the combined result into a `PartitionStatisticsFile` | ||
| after merging the partition stats. It performs a full compute if previous statistics file does not exist. It also registers the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets get rid of 'merging the partition stats'? I feel its implied from 'incrementally' and 'combined'
docs/docs/spark-procedures.md
Outdated
| This procedure computes the stats incrementally from the last snapshot that has `PartitionStatisticsFile` until the given | ||
| snapshot (uses current snapshot if not specified) and writes the combined result into a `PartitionStatisticsFile` | ||
| after merging the partition stats. It performs a full compute if previous statistics file does not exist. It also registers the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit (needs article and specify stats file): 'the previous partition statistics file'
|
Thanks again for the review @szehon-ho. I think it is better if I use some AI grammar check tool before publishing docs as english is not my first language 😅 . |
docs/docs/spark-procedures.md
Outdated
| Collect partition statistics of the snapshot with id `snap1` of table `my_table` | ||
| ```sql | ||
| CALL catalog_name.system.compute_partition_stats(table => 'my_table', snapshot_id => 'snap1' ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small nit: extra space at end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
|
Merged, thanks @ajantha-bhat ! |
Follow up from #13523