Skip to content

Conversation

@ajantha-bhat
Copy link
Member

Follow up from #13523

@ajantha-bhat ajantha-bhat requested a review from nastra July 11, 2025 16:46
@github-actions github-actions bot added the docs label Jul 11, 2025
This procedure computes the stats incrementally from the last snapshot that has partition stats file until the given
snapshot (uses current snapshot if not specified) and writes the combined result into a `PartitionStatisticsFile`
after merging the partition stats. Does a full compute if previous statistics file does not exist. Also registers the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It also registers the 'PartitionStatisticsFile' to the table metadata. (complete sentence)

### `compute_partition_stats`
This procedure computes the stats incrementally from the last snapshot that has partition stats file until the given
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we have 'partitionStatisticsFile' in some place and not in the first place?

#### Examples
Collect statistics of the latest snapshot of table `my_table`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be 'partition stats' to clarify

| Argument Name | Required? | Type | Description |
|---------------|-----------|---------------|-------------------------------------|
| `table` | ✔️ | string | Name of the table |
| `snapshot_id` | | string | Id of the snapshot to collect stats |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

describe optional behavior

@ajantha-bhat
Copy link
Member Author

@szehon-ho: Thanks for the review. I have updated the document.

Copy link
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks looks better, some more nits/comments

### `compute_partition_stats`
This procedure computes the stats incrementally from the last snapshot that has `PartitionStatisticsFile` until the given
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: a 'PartitionStatisticsFile' (need article for complete sentence)

This procedure computes the stats incrementally from the last snapshot that has `PartitionStatisticsFile` until the given
snapshot (uses current snapshot if not specified) and writes the combined result into a `PartitionStatisticsFile`
after merging the partition stats. It performs a full compute if previous statistics file does not exist. It also registers the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should specify type of stats file, and also add article:

if the previous partition statistics file

This procedure computes the stats incrementally from the last snapshot that has `PartitionStatisticsFile` until the given
snapshot (uses current snapshot if not specified) and writes the combined result into a `PartitionStatisticsFile`
after merging the partition stats. It performs a full compute if previous statistics file does not exist. It also registers the
`PartitionStatisticsFile` to table metadata.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the table metadata (need article)

| Argument Name | Required? | Type | Description |
|---------------|-----------|---------------|---------------------------------------------------------------------------------------|
| `table` | ✔️ | string | Name of the table |
| `snapshot_id` | | string | Id of the snapshot to collect partition stats. By default current snapshot id is used |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'is used' sounds unnecessary. Hm how about just: defaults to current snapshot id?

| Output Name | Type | Description |
|-------------------|--------|-----------------------------------------------------------|
| `partition_statistics_file` | string | Path to partition stats file created from by this command |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'to the partition stats file'. (add article)

Also , 'from by', only need one. Use by?

| Argument Name | Required? | Type | Description |
|---------------|-----------|---------------|---------------------------------------------------------------------------------------|
| `table` | ✔️ | string | Name of the table |
| `snapshot_id` | | string | Id of the snapshot to collect partition stats. By default current snapshot id is used |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collect => compute (follows the command name)

### `compute_partition_stats`
This procedure computes the stats incrementally from the last snapshot that has `PartitionStatisticsFile` until the given
Copy link
Member

@szehon-ho szehon-ho Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the stats => partitions statistics.

By the way, can we link this text the spec about partition stats? (like the above command)

This procedure computes the stats incrementally from the last snapshot that has `PartitionStatisticsFile` until the given
snapshot (uses current snapshot if not specified) and writes the combined result into a `PartitionStatisticsFile`
after merging the partition stats. It performs a full compute if previous statistics file does not exist. It also registers the
Copy link
Member

@szehon-ho szehon-ho Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets get rid of 'merging the partition stats'? I feel its implied from 'incrementally' and 'combined'

This procedure computes the stats incrementally from the last snapshot that has `PartitionStatisticsFile` until the given
snapshot (uses current snapshot if not specified) and writes the combined result into a `PartitionStatisticsFile`
after merging the partition stats. It performs a full compute if previous statistics file does not exist. It also registers the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (needs article and specify stats file): 'the previous partition statistics file'

@ajantha-bhat
Copy link
Member Author

Thanks again for the review @szehon-ho. I think it is better if I use some AI grammar check tool before publishing docs as english is not my first language 😅 .

Collect partition statistics of the snapshot with id `snap1` of table `my_table`
```sql
CALL catalog_name.system.compute_partition_stats(table => 'my_table', snapshot_id => 'snap1' );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nit: extra space at end.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@szehon-ho szehon-ho merged commit c9154bd into apache:main Jul 17, 2025
2 checks passed
@szehon-ho
Copy link
Member

Merged, thanks @ajantha-bhat !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants