feat: improve read performance by 7x with prebuffer by ion-elgreco · Pull Request #1709 · delta-io/delta-rs

ion-elgreco · 2023-10-08T21:26:06Z

Description

Enable prebuffer in the pyarrow.dataset.ParquetFragmentScanOptions. Relevant PR in Arrow repo, where they changed it to be default behavior. However, this won't be the case for older versions for PyArrow, so we need to set it to True.:

It improves read speed by 6-7x on Azure in one dataset that I have.

Before:
1min 4s ± 3.48 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

After:
8.99 s ± 786 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Related Issue(s)

Closes ##1569

rtyler

There is a memory tradeoff, but I think upstream defaulting indicates it's well worth the tradeoff for a default behavior.

Going to approve, thanks for another solid improvement @ion-elgreco

Enable prebuffer

94b41b7

ion-elgreco requested review from fvaleye, roeap and wjones127 as code owners October 8, 2023 21:26

github-actions Bot added the binding/python Issues for the Python package label Oct 8, 2023

rtyler enabled auto-merge October 9, 2023 15:16

rtyler approved these changes Oct 9, 2023

View reviewed changes

rtyler merged commit ab6b0cf into delta-io:main Oct 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: improve read performance by 7x with prebuffer#1709

feat: improve read performance by 7x with prebuffer#1709
rtyler merged 1 commit intodelta-io:mainfrom
ion-elgreco:feat/enable_prebuffer_pyarrow

ion-elgreco commented Oct 8, 2023 •

edited

Loading

Uh oh!

rtyler left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ion-elgreco commented Oct 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue(s)

Uh oh!

rtyler left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ion-elgreco commented Oct 8, 2023 •

edited

Loading