Skip to content

Attach weights to DeltaLakeSplits#12755

Merged
ebyhr merged 1 commit intotrinodb:masterfrom
ebyhr:ebi/delta-weights
Jun 14, 2022
Merged

Attach weights to DeltaLakeSplits#12755
ebyhr merged 1 commit intotrinodb:masterfrom
ebyhr:ebi/delta-weights

Conversation

@ebyhr
Copy link
Copy Markdown
Member

@ebyhr ebyhr commented Jun 9, 2022

Description

Attach weights to DeltaLakeSplits.

Documentation

(x) Sufficient documentation is included in this PR.

Release notes

(x) Release notes entries required with the following suggested text:

# Delta Lake
* Use split weights to improve scheduling and query performance on tables with many small files. ({issue}`12755`)

@cla-bot cla-bot bot added the cla-signed label Jun 9, 2022
@github-actions github-actions bot added the docs label Jun 9, 2022
@ebyhr ebyhr marked this pull request as ready for review June 9, 2022 06:32
@findepi findepi requested a review from alexjo2144 June 9, 2022 09:41
The weight is equal to the split size divided by the target split size.
@ebyhr ebyhr force-pushed the ebi/delta-weights branch from 4426bca to 59cda7c Compare June 10, 2022 01:20
@ebyhr ebyhr requested a review from findepi June 14, 2022 02:39
fileSize,
addFileEntry.getModificationTime(),
ImmutableList.of(),
SplitWeight.fromProportion(Math.min(Math.max((double) splitSize / maxSplitSize, minimumAssignedSplitWeight), 1.0)),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Math.min( ..., 1) is redundant per my understanding.

nit: static import min & max

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Math.min( ..., 1) is redundant per my understanding.

ie if we somehow end up creating an oversized split, it's fine to let the engine know about that.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findepi
Copy link
Copy Markdown
Member

findepi commented Jun 14, 2022

@ebyhr you can merge this as is and i will update #12832 to cover Delta too.

@ebyhr
Copy link
Copy Markdown
Member Author

ebyhr commented Jun 14, 2022

Let me merge this as-is. Thanks for your follow-up PR.

@ebyhr ebyhr merged commit 8de0cb6 into trinodb:master Jun 14, 2022
@ebyhr ebyhr deleted the ebi/delta-weights branch June 14, 2022 08:02
@ebyhr ebyhr mentioned this pull request Jun 14, 2022
@github-actions github-actions bot added this to the 386 milestone Jun 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants