Skip to content

Exclude column schema when we fetch Glue partitions based on filter#14206

Merged
Praveen2112 merged 1 commit intomasterfrom
praveen/minor_glue_improvement
Sep 20, 2022
Merged

Exclude column schema when we fetch Glue partitions based on filter#14206
Praveen2112 merged 1 commit intomasterfrom
praveen/minor_glue_improvement

Conversation

@Praveen2112
Copy link
Copy Markdown
Member

@Praveen2112 Praveen2112 commented Sep 20, 2022

Description

getPartitionNamesByFilter requires only partition values, including column schema as a part of result will be an overhead. Additional call to get the table information is also avoided. This could improve the planning time for queries having too many columns (1000+).

We did a local testing with a glue table having 1000 data columns, 3 partition columns and 1000 partitions -

For a query like this EXPLAIN SELECT count(*) FROM GLUE_TABLE group by part_column_2 LIMIT 1 - with table_statistics disabled.

The overall execution time before this change

7-8s (multiple runs)

The overall execution time after this change.

1.1-1.7s (multiple runs)

Non-technical explanation

Improvement in planning time for glue tables.

Release notes

( ) This is not user-visible and no release notes are required.
(x) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Improve planning time for wide glue tables

@findepi
Copy link
Copy Markdown
Member

findepi commented Sep 20, 2022

@Praveen2112 see TestIcebergGlueCatalogAccessOperations failure.

@findepi
Copy link
Copy Markdown
Member

findepi commented Sep 20, 2022

cc @alexjo2144 @findinpath @homar

@Praveen2112
Copy link
Copy Markdown
Member Author

I think TestIcebergGlueCatalogAccessOperations will be fixed by this PR - #14207

Copy link
Copy Markdown
Member

@skrzypo987 skrzypo987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

`getPartitionNamesByFilter` requires only partition values, including column schema as a part of result will be an overhead.
Additional call to get the table information is also avoided.
@Praveen2112 Praveen2112 force-pushed the praveen/minor_glue_improvement branch from 20d8dea to e669529 Compare September 20, 2022 10:56
@Praveen2112 Praveen2112 merged commit 5e066e2 into master Sep 20, 2022
@Praveen2112 Praveen2112 deleted the praveen/minor_glue_improvement branch September 20, 2022 15:35
@github-actions github-actions bot added this to the 397 milestone Sep 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

5 participants