Docs: Document compute_partition_stats procedure #13532

ajantha-bhat · 2025-07-11T16:46:31Z

Follow up from #13523

szehon-ho · 2025-07-11T19:10:07Z

docs/docs/spark-procedures.md

+
+This procedure computes the stats incrementally from the last snapshot that has partition stats file until the given 
+snapshot (uses current snapshot if not specified) and writes the combined result into a `PartitionStatisticsFile`
+after merging the partition stats. Does a full compute if previous statistics file does not exist. Also registers the 


Nit: It also registers the 'PartitionStatisticsFile' to the table metadata. (complete sentence)

szehon-ho · 2025-07-11T19:13:14Z

docs/docs/spark-procedures.md

+
+### `compute_partition_stats`
+
+This procedure computes the stats incrementally from the last snapshot that has partition stats file until the given 


why do we have 'partitionStatisticsFile' in some place and not in the first place?

szehon-ho · 2025-07-11T19:13:48Z

docs/docs/spark-procedures.md

+
+#### Examples
+
+Collect statistics of the latest snapshot of table `my_table`


should it be 'partition stats' to clarify

szehon-ho · 2025-07-11T19:14:19Z

docs/docs/spark-procedures.md

+| Argument Name | Required? | Type          | Description                         |
+|---------------|-----------|---------------|-------------------------------------|
+| `table`       | ✔️        | string        | Name of the table                   |
+| `snapshot_id` |           | string        | Id of the snapshot to collect stats |


describe optional behavior

ajantha-bhat · 2025-07-12T03:41:41Z

@szehon-ho: Thanks for the review. I have updated the document.

szehon-ho

thanks looks better, some more nits/comments

szehon-ho · 2025-07-14T18:29:20Z

docs/docs/spark-procedures.md

+
+### `compute_partition_stats`
+
+This procedure computes the stats incrementally from the last snapshot that has `PartitionStatisticsFile` until the given 


nit: a 'PartitionStatisticsFile' (need article for complete sentence)

szehon-ho · 2025-07-14T18:29:50Z

docs/docs/spark-procedures.md

+
+This procedure computes the stats incrementally from the last snapshot that has `PartitionStatisticsFile` until the given 
+snapshot (uses current snapshot if not specified) and writes the combined result into a `PartitionStatisticsFile`
+after merging the partition stats. It performs a full compute if previous statistics file does not exist. It also registers the 


should specify type of stats file, and also add article:

if the previous partition statistics file

szehon-ho · 2025-07-14T18:30:13Z

docs/docs/spark-procedures.md

+This procedure computes the stats incrementally from the last snapshot that has `PartitionStatisticsFile` until the given 
+snapshot (uses current snapshot if not specified) and writes the combined result into a `PartitionStatisticsFile`
+after merging the partition stats. It performs a full compute if previous statistics file does not exist. It also registers the 
+`PartitionStatisticsFile` to table metadata.


nit: the table metadata (need article)

szehon-ho · 2025-07-14T18:31:40Z

docs/docs/spark-procedures.md

+| Argument Name | Required? | Type          | Description                                                                           |
+|---------------|-----------|---------------|---------------------------------------------------------------------------------------|
+| `table`       | ✔️        | string        | Name of the table                                                                     |
+| `snapshot_id` |           | string        | Id of the snapshot to collect partition stats. By default current snapshot id is used |


'is used' sounds unnecessary. Hm how about just: defaults to current snapshot id?

szehon-ho · 2025-07-14T18:32:20Z

docs/docs/spark-procedures.md

+
+| Output Name       | Type   | Description                                               |
+|-------------------|--------|-----------------------------------------------------------|
+| `partition_statistics_file` | string | Path to partition stats file created from by this command |


'to the partition stats file'. (add article)

Also , 'from by', only need one. Use by?

szehon-ho · 2025-07-14T18:34:03Z

docs/docs/spark-procedures.md

+| Argument Name | Required? | Type          | Description                                                                           |
+|---------------|-----------|---------------|---------------------------------------------------------------------------------------|
+| `table`       | ✔️        | string        | Name of the table                                                                     |
+| `snapshot_id` |           | string        | Id of the snapshot to collect partition stats. By default current snapshot id is used |


collect => compute (follows the command name)

szehon-ho · 2025-07-14T18:35:51Z

docs/docs/spark-procedures.md

+
+### `compute_partition_stats`
+
+This procedure computes the stats incrementally from the last snapshot that has `PartitionStatisticsFile` until the given 


the stats => partitions statistics.

By the way, can we link this text the spec about partition stats? (like the above command)

szehon-ho · 2025-07-14T18:43:12Z

docs/docs/spark-procedures.md

+
+This procedure computes the stats incrementally from the last snapshot that has `PartitionStatisticsFile` until the given 
+snapshot (uses current snapshot if not specified) and writes the combined result into a `PartitionStatisticsFile`
+after merging the partition stats. It performs a full compute if previous statistics file does not exist. It also registers the 


lets get rid of 'merging the partition stats'? I feel its implied from 'incrementally' and 'combined'

szehon-ho · 2025-07-14T18:43:39Z

docs/docs/spark-procedures.md

+
+This procedure computes the stats incrementally from the last snapshot that has `PartitionStatisticsFile` until the given 
+snapshot (uses current snapshot if not specified) and writes the combined result into a `PartitionStatisticsFile`
+after merging the partition stats. It performs a full compute if previous statistics file does not exist. It also registers the 


nit (needs article and specify stats file): 'the previous partition statistics file'

ajantha-bhat · 2025-07-15T13:55:03Z

Thanks again for the review @szehon-ho. I think it is better if I use some AI grammar check tool before publishing docs as english is not my first language 😅 .

szehon-ho · 2025-07-17T01:13:07Z

docs/docs/spark-procedures.md

+
+Collect partition statistics of the snapshot with id `snap1` of table `my_table`
+```sql
+CALL catalog_name.system.compute_partition_stats(table => 'my_table', snapshot_id => 'snap1' );


small nit: extra space at end.

szehon-ho · 2025-07-17T18:16:32Z

Merged, thanks @ajantha-bhat !

Docs: Document compute_partition_stats procedure

809e8a5

ajantha-bhat requested a review from nastra July 11, 2025 16:46

github-actions bot added the docs label Jul 11, 2025

szehon-ho reviewed Jul 11, 2025

View reviewed changes

Address comments

55baa4c

szehon-ho reviewed Jul 14, 2025

View reviewed changes

Address new comment

80cc56e

szehon-ho approved these changes Jul 17, 2025

View reviewed changes

remove space

9ab182f

szehon-ho merged commit c9154bd into apache:main Jul 17, 2025
2 checks passed


		### `compute_partition_stats`

		This procedure computes the stats incrementally from the last snapshot that has partition stats file until the given


		#### Examples

		Collect statistics of the latest snapshot of table `my_table`

Docs: Document compute_partition_stats procedure #13532

Docs: Document compute_partition_stats procedure #13532

Uh oh!

Conversation

ajantha-bhat commented Jul 11, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ajantha-bhat commented Jul 12, 2025

Uh oh!

szehon-ho left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szehon-ho Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szehon-ho Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ajantha-bhat commented Jul 15, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

szehon-ho commented Jul 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

szehon-ho Jul 14, 2025 •

edited

Loading

szehon-ho Jul 14, 2025 •

edited

Loading