Skip to content

Add Delta dynamic filtering tests#11549

Merged
findepi merged 1 commit intotrinodb:masterfrom
findinpath:oss-delta-dynamic-filtering-tests
Mar 25, 2022
Merged

Add Delta dynamic filtering tests#11549
findepi merged 1 commit intotrinodb:masterfrom
findinpath:oss-delta-dynamic-filtering-tests

Conversation

@findinpath
Copy link
Copy Markdown
Contributor

@findinpath findinpath commented Mar 17, 2022

Description

Test showcasing the dynamic filtering functionality in the Delta Lake connector

Is this change a fix, improvement, new feature, refactoring, or other?

Test

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Delta Lake connector test

How would you describe this change to a non-technical end user or system administrator?

Related issues, pull requests, and links

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

(x) No release notes entries required.
( ) Release notes entries required with the following suggested text:

@cla-bot cla-bot bot added the cla-signed label Mar 17, 2022
@findinpath findinpath requested review from findepi and homar March 17, 2022 18:17
@findepi findepi changed the title Open source delta dynamic filtering tests Add Delta dynamic filtering tests Mar 18, 2022
@findepi
Copy link
Copy Markdown
Member

findepi commented Mar 18, 2022

i renamed the PR, please rename the commit accordingly.

@findepi findepi added test no-release-notes This pull request does not require release notes entry labels Mar 18, 2022
@findinpath findinpath force-pushed the oss-delta-dynamic-filtering-tests branch from e03a9a1 to 6e95c64 Compare March 18, 2022 12:01
@raunaqmorarka
Copy link
Copy Markdown
Member

Can we add a test for delta lake that extends BaseDynamicPartitionPruningTest similar to the way it's done for Hive and Iceberg ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We end up needing this and the large table on build side as we don't have a way to make delta lake split manager wait on DF (similar to dynamic-filtering.wait-timeout in hive and iceberg). It would be great to have that as it would simplify the testing and make it less prone to being flaky.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compared to Iceberg, DeltaLakeSplitSource produces the splits asynchronously. Does the dynamicFilteringWaitTimeoutMillis concept still apply in this case?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, HiveSplitSource produces splits asynchronously as well.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #11600 follow up issue

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use searchScanFilterAndProjectOperatorStats from AbstractTestQueryFramework instead ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried now using searchScanFilterAndProjectOperatorStats. However, in case of dealing with the unfilteredStats it seems that there is no corresponding TableScanNode for the table lineitem.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not working for the unfilteredStats case because searchScanFilterAndProjectOperatorStats always looks for a filter on top of the scan. We can keep the existing code here.

@findinpath findinpath force-pushed the oss-delta-dynamic-filtering-tests branch from 6e95c64 to 2ee1539 Compare March 18, 2022 21:22
@findinpath
Copy link
Copy Markdown
Contributor Author

Can we add a test for delta lake that extends BaseDynamicPartitionPruningTest similar to the way it's done for Hive and Iceberg ?

I will try adding such a test within this PR.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be parametrised as
@Test(timeOut = 60_000, dataProvider = "joinDistributionTypes")

@DataProvider
    public Object[][] joinDistributionTypes()
    {
        return Stream.of(JoinDistributionType.values())
                .collect(toDataProvider());
    }

@findinpath findinpath force-pushed the oss-delta-dynamic-filtering-tests branch from 2ee1539 to 30d4ef4 Compare March 21, 2022 15:44
@findepi findepi merged commit 15a422d into trinodb:master Mar 25, 2022
@github-actions github-actions bot added this to the 375 milestone Mar 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed no-release-notes This pull request does not require release notes entry test

Development

Successfully merging this pull request may close these issues.

3 participants