Spark: Better statistics estimation for spark2 Reader. #3134

wypoon · 2021-09-17T04:48:25Z

Follow-up to #3038. Fixes #3108.

Use (estimated) row size * number of rows to estimate the size instead of adding up file sizes.
The row size is estimated from the pruned schema if we prune columns.

Follow-up to apache#3038. Use (estimated) row size * number of rows to estimate the size instead of adding up file sizes. The row size is estimated from the pruned schema if we prune columns.

rdblue · 2021-09-17T18:37:54Z

Thanks, @wypoon!

github-actions bot added the spark label Sep 17, 2021

wypoon mentioned this pull request Sep 17, 2021

Optimize stats estimation in Spark 2 #3108

Closed

Spark: Better statistics estimation for spark2 Reader.

7d503a6

Follow-up to apache#3038. Use (estimated) row size * number of rows to estimate the size instead of adding up file sizes. The row size is estimated from the pruned schema if we prune columns.

wypoon force-pushed the estimate_statistics2 branch from b1090df to 7d503a6 Compare September 17, 2021 17:02

rdblue merged commit ec2716e into apache:master Sep 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Spark: Better statistics estimation for spark2 Reader. #3134

Spark: Better statistics estimation for spark2 Reader. #3134

Uh oh!

wypoon commented Sep 17, 2021 •

edited

Loading

Uh oh!

rdblue commented Sep 17, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Spark: Better statistics estimation for spark2 Reader. #3134

Spark: Better statistics estimation for spark2 Reader. #3134

Uh oh!

Conversation

wypoon commented Sep 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rdblue commented Sep 17, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wypoon commented Sep 17, 2021 •

edited

Loading