Skip to content

Conversation

@karuppayya
Copy link
Contributor

No description provided.

@github-actions github-actions bot added the spark label Aug 22, 2024
@karuppayya
Copy link
Contributor Author

@aokolnychyi @szehon-ho Can you help review this please

sql("CREATE TABLE %s (id bigint NOT NULL, data string) USING iceberg", tableName);
List<Object[]> result =
sql("CALL %s.system.compute_table_stats('%s')", catalogName, tableIdent);
Assertions.assertTrue(result.isEmpty());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Assertions.assertTrue(result.isEmpty());
assertThat(result).isEmpty();

please use the AssertJ assertions and we generally try to avoid ussing assertTrue/assertFalse as it doesn't provide enough context about the actual/expected when an assertion fails

sql(
"CALL %s.system.compute_table_stats(table => '%s', columns => array('id'))",
catalogName, tableIdent);
Assertions.assertNotNull(output.get(0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Assertions.assertNotNull(output.get(0));
assertThat(output.get(0)).isNotNull();

should this maybe also assert some more details here?

IllegalArgumentException.class,
() ->
sql(
"CALL %s.system.compute_table_stats(table => '%s', snapshot_id => %dL)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe also add a test with an invalid/non-existing table name

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karuppayya can you please add a test with an invalid/non-existing table?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this test.

sql(
"CALL %s.system.compute_table_stats('%s', %dL)",
catalogName, tableIdent, snapshot.snapshotId());
Assertions.assertNotNull(output.get(0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@aokolnychyi
Copy link
Contributor

I will take a look at the partition stats PR first by @ajantha-bhat. I want to understand if we want a single analyze procedure or different procedures for table and partition stats.

@karuppayya karuppayya force-pushed the compute_stats_procedure branch from de6f331 to 2b6e107 Compare August 22, 2024 22:35
sql(
"CALL %s.system.compute_table_stats('%s', %dL)",
catalogName, tableIdent, snapshot.snapshotId());
assertThat(output.get(0)).isNotNull();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to check whether the output has some valid results here and in the other test

Copy link
Contributor

@nastra nastra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I left a few comments around testing that would be good to address

@karuppayya
Copy link
Contributor Author

@aokolnychyi Can this merged now?

@karuppayya karuppayya force-pushed the compute_stats_procedure branch from c2790a8 to 1cb1ad0 Compare November 11, 2024 22:23
@karuppayya karuppayya force-pushed the compute_stats_procedure branch from 5def103 to 62c826f Compare November 18, 2024 21:59
@szehon-ho
Copy link
Member

This looks good to me, will merge tomorrow if no additional comments

@ajantha-bhat
Copy link
Member

I will take a look at the partition stats PR first by @ajantha-bhat. I want to understand if we want a single analyze procedure or different procedures for table and partition stats.

I think single analyze procedure is fine. We can merge this PR. I will do a follow up to accept list of ENUMs to compute the required type of stats (default compute everything). We need to also modify spark action to compute partition stats.

mapBuilder.put("create_changelog_view", CreateChangelogViewProcedure::builder);
mapBuilder.put("rewrite_position_delete_files", RewritePositionDeleteFilesProcedure::builder);
mapBuilder.put("fast_forward", FastForwardBranchProcedure::builder);
mapBuilder.put("compute_table_stats", ComputeTableStatsProcedure::builder);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ajantha-bhat ,
I will create a follow up PR for the doc changes, just to keep this PR focussed on the procedure.

@aokolnychyi
Copy link
Contributor

I'd love to review today as well.

Copy link
Contributor

@aokolnychyi aokolnychyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor styling suggestions. LGTM.

@szehon-ho szehon-ho merged commit 3b5c9f7 into apache:main Nov 20, 2024
31 checks passed
@szehon-ho
Copy link
Member

szehon-ho commented Nov 20, 2024

Looks like all comment addressed, merged, can do a follow up if more. Thanks @karuppayya , and also @aokolnychyi @ajantha-bhat @nastra for addition reviews!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants