-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Procedure to compute table stats #10986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@aokolnychyi @szehon-ho Can you help review this please |
| sql("CREATE TABLE %s (id bigint NOT NULL, data string) USING iceberg", tableName); | ||
| List<Object[]> result = | ||
| sql("CALL %s.system.compute_table_stats('%s')", catalogName, tableIdent); | ||
| Assertions.assertTrue(result.isEmpty()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Assertions.assertTrue(result.isEmpty()); | |
| assertThat(result).isEmpty(); |
please use the AssertJ assertions and we generally try to avoid ussing assertTrue/assertFalse as it doesn't provide enough context about the actual/expected when an assertion fails
| sql( | ||
| "CALL %s.system.compute_table_stats(table => '%s', columns => array('id'))", | ||
| catalogName, tableIdent); | ||
| Assertions.assertNotNull(output.get(0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Assertions.assertNotNull(output.get(0)); | |
| assertThat(output.get(0)).isNotNull(); |
should this maybe also assert some more details here?
...nsions/src/test/java/org/apache/iceberg/spark/extensions/TestComputeTableStatsProcedure.java
Outdated
Show resolved
Hide resolved
| IllegalArgumentException.class, | ||
| () -> | ||
| sql( | ||
| "CALL %s.system.compute_table_stats(table => '%s', snapshot_id => %dL)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe also add a test with an invalid/non-existing table name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karuppayya can you please add a test with an invalid/non-existing table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this test.
| sql( | ||
| "CALL %s.system.compute_table_stats('%s', %dL)", | ||
| catalogName, tableIdent, snapshot.snapshotId()); | ||
| Assertions.assertNotNull(output.get(0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
|
I will take a look at the partition stats PR first by @ajantha-bhat. I want to understand if we want a single analyze procedure or different procedures for table and partition stats. |
de6f331 to
2b6e107
Compare
| sql( | ||
| "CALL %s.system.compute_table_stats('%s', %dL)", | ||
| catalogName, tableIdent, snapshot.snapshotId()); | ||
| assertThat(output.get(0)).isNotNull(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to check whether the output has some valid results here and in the other test
nastra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I left a few comments around testing that would be good to address
2b6e107 to
c2790a8
Compare
|
@aokolnychyi Can this merged now? |
...v3.5/spark/src/main/java/org/apache/iceberg/spark/procedures/ComputeTableStatsProcedure.java
Outdated
Show resolved
Hide resolved
...v3.5/spark/src/main/java/org/apache/iceberg/spark/procedures/ComputeTableStatsProcedure.java
Outdated
Show resolved
Hide resolved
...v3.5/spark/src/main/java/org/apache/iceberg/spark/procedures/ComputeTableStatsProcedure.java
Outdated
Show resolved
Hide resolved
...nsions/src/test/java/org/apache/iceberg/spark/extensions/TestComputeTableStatsProcedure.java
Show resolved
Hide resolved
c2790a8 to
1cb1ad0
Compare
(cherry picked from commit 49e668e687e0a5d77652405ba14bdc3ee592f261)
5def103 to
62c826f
Compare
...nsions/src/test/java/org/apache/iceberg/spark/extensions/TestComputeTableStatsProcedure.java
Outdated
Show resolved
Hide resolved
|
This looks good to me, will merge tomorrow if no additional comments |
I think single analyze procedure is fine. We can merge this PR. I will do a follow up to accept list of ENUMs to compute the required type of stats (default compute everything). We need to also modify spark action to compute partition stats. |
| mapBuilder.put("create_changelog_view", CreateChangelogViewProcedure::builder); | ||
| mapBuilder.put("rewrite_position_delete_files", RewritePositionDeleteFilesProcedure::builder); | ||
| mapBuilder.put("fast_forward", FastForwardBranchProcedure::builder); | ||
| mapBuilder.put("compute_table_stats", ComputeTableStatsProcedure::builder); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also document it under
https://github.com/apache/iceberg/blob/main/docs/docs/spark-procedures.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ajantha-bhat ,
I will create a follow up PR for the doc changes, just to keep this PR focussed on the procedure.
|
I'd love to review today as well. |
...v3.5/spark/src/main/java/org/apache/iceberg/spark/procedures/ComputeTableStatsProcedure.java
Outdated
Show resolved
Hide resolved
...v3.5/spark/src/main/java/org/apache/iceberg/spark/procedures/ComputeTableStatsProcedure.java
Outdated
Show resolved
Hide resolved
...v3.5/spark/src/main/java/org/apache/iceberg/spark/procedures/ComputeTableStatsProcedure.java
Show resolved
Hide resolved
aokolnychyi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor styling suggestions. LGTM.
|
Looks like all comment addressed, merged, can do a follow up if more. Thanks @karuppayya , and also @aokolnychyi @ajantha-bhat @nastra for addition reviews! |
No description provided.