Fix incorrect OFFSET during LIMIT pushdown. #12399

wiedld · 2024-09-09T17:37:51Z

Which issue does this PR close?

First commit demonstrates the bug.

Rationale for this change

First commit demonstrates the current, incorrect behavior where the offset is not applied correctly during limit pushdown.

Followup commits add the fix, as well as a few doc comments.

What changes are included in this PR?

Slight change in offset handling during one of the helper functions with the limit pushdown.
Also added some docs to help explain existing code.

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

… multiple input streams

alamb · 2024-09-09T19:51:03Z

cc @itsjunetime

wiedld · 2024-09-09T21:34:10Z

@mertak-synnada this is a fix to what we believe is a bug from this (very excellent) change. We would appreciate your review 🙏🏼 .

itsjunetime

I think this may negate some of the performance improvements gained by the initial PR that introduced these bugs, but I think once this is merged, we can refactor the pushdown_limit_helper function slightly to keep the correct behavior while pulling in the improvements again. I think it's just more important to get a fix merged first since this did break existing behavior.

datafusion/sqllogictest/test_files/order.slt

datafusion/physical-optimizer/src/limit_pushdown.rs

…hdown

alamb · 2024-09-10T20:41:10Z

I have filed #12423 to track this issue and updated this PR description

alamb

Thank you @wiedld and @itsjunetime and @mertak-synnada

I think this code is looking good to me. I had a few suggestions about comments, but the code and testing seems 👍 to me

datafusion/sqllogictest/test_files/limit.slt

alamb · 2024-09-10T20:44:42Z

datafusion/physical-optimizer/src/limit_pushdown.rs

@@ -256,21 +265,24 @@ pub(crate) fn pushdown_limits(
    pushdown_plan: Arc<dyn ExecutionPlan>,
    global_state: GlobalRequirements,
 ) -> Result<Arc<dyn ExecutionPlan>> {
+    // Call pushdown_limit_helper.
+    // This will either extract the limit node (returning the child), or apply the limit pushdown.


this might be a good comment to add to the pushdown_limit_helper function as well

datafusion/physical-optimizer/src/limit_pushdown.rs

datafusion/sqllogictest/test_files/limit.slt

alamb · 2024-09-11T14:02:52Z

I am going to make the comment suggestions to this PR so we can merge it in.

…ushdown-with-offsey

alamb · 2024-09-11T14:06:46Z

I think this may negate some of the performance improvements gained by the initial PR that introduced these bugs, but I think once this is merged, we can refactor the pushdown_limit_helper function slightly to keep the correct behavior while pulling in the improvements again

@itsjunetime I wonder if you can elaborate on this or file a ticket. In order to avoid the final GlobalLimitExec I believe we would have to add offset support into SortPreservingMerge and Sort(TopK) -- which we could do, but I think the benefit might be relatively low

The implementation of Limit is pretty straightforward:

datafusion/datafusion/physical-plan/src/limit.rs

Lines 342 to 404 in f24f2cb

    
           pub struct LimitStream { 
        
               /// The remaining number of rows to skip 
        
               skip: usize, 
        
               /// The remaining number of rows to produce 
        
               fetch: usize, 
        
               /// The input to read from. This is set to None once the limit is 
        
               /// reached to enable early termination 
        
               input: Option<SendableRecordBatchStream>, 
        
               /// Copy of the input schema 
        
               schema: SchemaRef, 
        
               /// Execution time metrics 
        
               baseline_metrics: BaselineMetrics, 
        
           } 
        
           impl LimitStream { 
        
               pub fn new( 
        
                   input: SendableRecordBatchStream, 
        
                   skip: usize, 
        
                   fetch: Option<usize>, 
        
                   baseline_metrics: BaselineMetrics, 
        
               ) -> Self { 
        
                   let schema = input.schema(); 
        
                   Self { 
        
                       skip, 
        
                       fetch: fetch.unwrap_or(usize::MAX), 
        
                       input: Some(input), 
        
                       schema, 
        
                       baseline_metrics, 
        
                   } 
        
               } 
        
               fn poll_and_skip( 
        
                   &mut self, 
        
                   cx: &mut Context<'_>, 
        
               ) -> Poll<Option<Result<RecordBatch>>> { 
        
                   let input = self.input.as_mut().unwrap(); 
        
                   loop { 
        
                       let poll = input.poll_next_unpin(cx); 
        
                       let poll = poll.map_ok(|batch| { 
        
                           if batch.num_rows() <= self.skip { 
        
                               self.skip -= batch.num_rows(); 
        
                               RecordBatch::new_empty(input.schema()) 
        
                           } else { 
        
                               let new_batch = batch.slice(self.skip, batch.num_rows() - self.skip); 
        
                               self.skip = 0; 
        
                               new_batch 
        
                           } 
        
                       }); 
        
                       match &poll { 
        
                           Poll::Ready(Some(Ok(batch))) => { 
        
                               if batch.num_rows() > 0 { 
        
                                   break poll; 
        
                               } else { 
        
                                   // continue to poll input stream 
        
                               } 
        
                           } 
        
                           Poll::Ready(Some(Err(_e))) => break poll, 
        
                           Poll::Ready(None) => break poll, 
        
                           Poll::Pending => break poll, 
        
                       } 
        
                   } 
        
               }

mertak-synnada · 2024-09-11T14:22:59Z

I think this may negate some of the performance improvements gained by the initial PR that introduced these bugs, but I think once this is merged, we can refactor the pushdown_limit_helper function slightly to keep the correct behavior while pulling in the improvements again

@itsjunetime I wonder if you can elaborate on this or file a ticket. In order to avoid the final GlobalLimitExec I believe we would have to add offset support into SortPreservingMerge and Sort(TopK) -- which we could do, but I think the benefit might be relatively low

The implementation of Limit is pretty straightforward:

datafusion/datafusion/physical-plan/src/limit.rs

Lines 342 to 404 in f24f2cb

pub struct LimitStream {

/// The remaining number of rows to skip

skip: usize,

/// The remaining number of rows to produce

fetch: usize,

/// The input to read from. This is set to None once the limit is

/// reached to enable early termination

input: Option<SendableRecordBatchStream>,

/// Copy of the input schema

schema: SchemaRef,

/// Execution time metrics

baseline_metrics: BaselineMetrics,

}

impl LimitStream {

pub fn new(

input: SendableRecordBatchStream,

skip: usize,

fetch: Option<usize>,

baseline_metrics: BaselineMetrics,

) -> Self {

let schema = input.schema();

Self {

skip,

fetch: fetch.unwrap_or(usize::MAX),

input: Some(input),

schema,

baseline_metrics,

}

}

fn poll_and_skip(

&mut self,

cx: &mut Context<'_>,

) -> Poll<Option<Result<RecordBatch>>> {

let input = self.input.as_mut().unwrap();

loop {

let poll = input.poll_next_unpin(cx);

let poll = poll.map_ok(|batch| {

if batch.num_rows() <= self.skip {

self.skip -= batch.num_rows();

RecordBatch::new_empty(input.schema())

} else {

let new_batch = batch.slice(self.skip, batch.num_rows() - self.skip);

self.skip = 0;

new_batch

}

});

match &poll {

Poll::Ready(Some(Ok(batch))) => {

if batch.num_rows() > 0 {

break poll;

} else {

// continue to poll input stream

}

}

Poll::Ready(Some(Err(_e))) => break poll,

Poll::Ready(None) => break poll,

Poll::Pending => break poll,

}

}

}

I believe @itsjunetime mentioned for the first commit, but after my change suggestion, the gains should be preserved, imo.

alamb · 2024-09-11T19:56:08Z

Thanks everyone for your help getting this done!

github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Sep 9, 2024

test: demonstrate offset not applied correctly with limit pushdown on…

b6fd751

… multiple input streams

wiedld force-pushed the iox-12102/fix-limit-pushdown-with-offsey branch from fd0f609 to b6fd751 Compare September 9, 2024 18:23

wiedld added 2 commits September 9, 2024 14:08

fix: do not pushdown when skip is applied

2fcad95

test: update tests after fix

30b21ca

github-actions bot added the optimizer Optimizer rules label Sep 9, 2024

chore: more doc cleanup

30145cf

wiedld marked this pull request as ready for review September 9, 2024 21:34

itsjunetime approved these changes Sep 9, 2024

View reviewed changes

wiedld commented Sep 9, 2024

View reviewed changes

datafusion/sqllogictest/test_files/order.slt Outdated Show resolved Hide resolved

chore: move LIMIT+OFFSET tests to proper sqllogic test case

34b94f0

wiedld force-pushed the iox-12102/fix-limit-pushdown-with-offsey branch from d125bc2 to 34b94f0 Compare September 9, 2024 22:03

mertak-synnada reviewed Sep 10, 2024

View reviewed changes

datafusion/physical-optimizer/src/limit_pushdown.rs Outdated Show resolved Hide resolved

refactor: add global limit back (if there is a skip) during limit pus…

570590c

…hdown

alamb approved these changes Sep 10, 2024

View reviewed changes

alamb mentioned this pull request Sep 11, 2024

DataFusion weekly project plan (Andrew Lamb) - Sep 9, 2024 #12391

Closed

5 tasks

Apply suggestions from code review

0b3053b

Merge remote-tracking branch 'apache/main' into iox-12102/fix-limit-p…

d5c3e0a

…ushdown-with-offsey

Add comment explaining why

cb7ded2

alamb merged commit 9025c1c into apache:main Sep 11, 2024
24 checks passed

alamb deleted the iox-12102/fix-limit-pushdown-with-offsey branch September 11, 2024 19:56

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix incorrect OFFSET during LIMIT pushdown. #12399

Fix incorrect OFFSET during LIMIT pushdown. #12399

wiedld commented Sep 9, 2024 •

edited by alamb

Loading

alamb commented Sep 9, 2024

wiedld commented Sep 9, 2024

itsjunetime left a comment

alamb commented Sep 10, 2024

alamb left a comment

alamb Sep 10, 2024

alamb commented Sep 11, 2024

alamb commented Sep 11, 2024 •

edited

Loading

mertak-synnada commented Sep 11, 2024

alamb commented Sep 11, 2024

Fix incorrect OFFSET during LIMIT pushdown. #12399

Fix incorrect OFFSET during LIMIT pushdown. #12399

Conversation

wiedld commented Sep 9, 2024 • edited by alamb Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

alamb commented Sep 9, 2024

wiedld commented Sep 9, 2024

itsjunetime left a comment

Choose a reason for hiding this comment

alamb commented Sep 10, 2024

alamb left a comment

Choose a reason for hiding this comment

alamb Sep 10, 2024

Choose a reason for hiding this comment

alamb commented Sep 11, 2024

alamb commented Sep 11, 2024 • edited Loading

mertak-synnada commented Sep 11, 2024

alamb commented Sep 11, 2024

wiedld commented Sep 9, 2024 •

edited by alamb

Loading

alamb commented Sep 11, 2024 •

edited

Loading