Skip to content

Conversation

@wypoon
Copy link
Contributor

@wypoon wypoon commented Sep 17, 2021

Follow-up to #3038. Fixes #3108.

Use (estimated) row size * number of rows to estimate the size instead of adding up file sizes.
The row size is estimated from the pruned schema if we prune columns.

Follow-up to apache#3038.
Use (estimated) row size * number of rows to estimate the size instead of adding up file sizes.
The row size is estimated from the pruned schema if we prune columns.
@wypoon wypoon force-pushed the estimate_statistics2 branch from b1090df to 7d503a6 Compare September 17, 2021 17:02
@rdblue rdblue merged commit ec2716e into apache:master Sep 17, 2021
@rdblue
Copy link
Contributor

rdblue commented Sep 17, 2021

Thanks, @wypoon!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize stats estimation in Spark 2

2 participants