Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract CoalesceBatchesStream to a struct #11610

Merged
merged 1 commit into from
Aug 2, 2024

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jul 22, 2024

Note most of this PR is additional documentation and tests. There is no new functionality

Which issue does this PR close?

Related to #9370
Related to #7957

Rationale for this change

  1. I want to be able to write tests more easily for PRs like (GC StringViewArray in CoalesceBatchesStream #11587)
  2. I also harbor plans to improve repartitioning and filtering (to avoid a copy -- see Avoid extra copies in CoalesceBatchesExec to improve performance #7957) this is the first step

As an example of what I hope to do with this, today we have a FilterExec which copies all rows that pass the filter into a new RecordBatch followed by CoalesceBatches which will likely copy the batch again into a larger set
With a struct like this, I think we have the possibility of the filter producing the output batch directly, thus skipping
the intermediate copy. See #7957 and #7957 (comment) for details

What changes are included in this PR?

  1. Move the logic for coalescing batches into a struct CoalesceBatchesExec
  2. Add tests for the new struct
  3. Improve documentation

I also happen to think this change makes the code easier to read as it separates
the coalescing logic from the stream handling logic.

Are these changes tested?

Yes, by existing CI and new unit tests

Are there any user-facing changes?

No changes, this is just an internal code reorganization

Benchmark results show now significant changes

Details

--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃ main_base ┃ alamb_coalsece_batches_in_struct ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  336.31ms │                         334.12ms │     no change │
│ QQuery 2     │  132.50ms │                         130.44ms │     no change │
│ QQuery 3     │  171.55ms │                         163.97ms │     no change │
│ QQuery 4     │  103.14ms │                         101.11ms │     no change │
│ QQuery 5     │  206.06ms │                         206.75ms │     no change │
│ QQuery 6     │   90.73ms │                          86.85ms │     no change │
│ QQuery 7     │  280.56ms │                         283.90ms │     no change │
│ QQuery 8     │  209.03ms │                         209.75ms │     no change │
│ QQuery 9     │  322.83ms │                         314.44ms │     no change │
│ QQuery 10    │  285.55ms │                         273.64ms │     no change │
│ QQuery 11    │  102.55ms │                         106.14ms │     no change │
│ QQuery 12    │  156.14ms │                         154.16ms │     no change │
│ QQuery 13    │  328.09ms │                         322.20ms │     no change │
│ QQuery 14    │  122.36ms │                         113.86ms │ +1.07x faster │
│ QQuery 15    │  174.65ms │                         171.91ms │     no change │
│ QQuery 16    │   98.16ms │                          98.39ms │     no change │
│ QQuery 17    │  328.06ms │                         322.44ms │     no change │
│ QQuery 18    │  451.03ms │                         438.80ms │     no change │
│ QQuery 19    │  199.06ms │                         200.66ms │     no change │
│ QQuery 20    │  193.02ms │                         198.21ms │     no change │
│ QQuery 21    │  341.11ms │                         343.05ms │     no change │
│ QQuery 22    │   75.44ms │                          71.36ms │ +1.06x faster │
└──────────────┴───────────┴──────────────────────────────────┴───────────────┘

--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃ main_base ┃ alamb_coalsece_batches_in_struct ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │ 3921.00ms │                        4013.39ms │     no change │
│ QQuery 1     │ 1620.96ms │                        1561.27ms │     no change │
│ QQuery 2     │ 3379.19ms │                        3177.36ms │ +1.06x faster │
└──────────────┴───────────┴──────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                               ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (main_base)                          │ 8921.15ms │
│ Total Time (alamb_coalsece_batches_in_struct)   │ 8752.02ms │
│ Average Time (main_base)                        │ 2973.72ms │
│ Average Time (alamb_coalsece_batches_in_struct) │ 2917.34ms │
│ Queries Faster                                  │         1 │
│ Queries Slower                                  │         0 │
│ Queries with No Change                          │         2 │
└─────────────────────────────────────────────────┴───────────┘

--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ alamb_coalsece_batches_in_struct ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 0     │     0.82ms │                           0.86ms │ 1.05x slower │
│ QQuery 1     │    98.96ms │                         100.49ms │    no change │
│ QQuery 2     │   206.13ms │                         204.16ms │    no change │
│ QQuery 3     │   201.66ms │                         207.14ms │    no change │
│ QQuery 4     │  2279.50ms │                        2242.69ms │    no change │
│ QQuery 5     │  2032.10ms │                        2095.51ms │    no change │
│ QQuery 6     │    89.81ms │                          87.66ms │    no change │
│ QQuery 7     │   105.83ms │                         106.21ms │    no change │
│ QQuery 8     │  3306.67ms │                        3271.35ms │    no change │
│ QQuery 9     │  2486.23ms │                        2463.97ms │    no change │
│ QQuery 10    │   846.34ms │                         868.00ms │    no change │
│ QQuery 11    │   945.26ms │                         942.57ms │    no change │
│ QQuery 12    │  2201.69ms │                        2183.83ms │    no change │
│ QQuery 13    │  4682.18ms │                        4687.81ms │    no change │
│ QQuery 14    │  2937.50ms │                        2932.22ms │    no change │
│ QQuery 15    │  2494.34ms │                        2476.97ms │    no change │
│ QQuery 16    │  6000.61ms │                        6074.24ms │    no change │
│ QQuery 17    │  5915.76ms │                        5978.01ms │    no change │
│ QQuery 18    │ 12147.41ms │                       12245.43ms │    no change │
│ QQuery 19    │   173.83ms │                         176.01ms │    no change │
│ QQuery 20    │  2772.44ms │                        2732.58ms │    no change │
│ QQuery 21    │  3573.59ms │                        3536.36ms │    no change │
│ QQuery 22    │  9672.75ms │                        9617.31ms │    no change │
│ QQuery 23    │ 22739.93ms │                       22735.77ms │    no change │
│ QQuery 24    │  1426.12ms │                        1417.39ms │    no change │
│ QQuery 25    │  1189.16ms │                        1190.12ms │    no change │
│ QQuery 26    │  1524.24ms │                        1531.46ms │    no change │
│ QQuery 27    │  4069.20ms │                        4061.53ms │    no change │
│ QQuery 28    │ 28517.26ms │                       28507.06ms │    no change │
│ QQuery 29    │  1035.81ms │                        1039.16ms │    no change │
│ QQuery 30    │  2612.55ms │                        2563.26ms │    no change │
│ QQuery 31    │  3328.59ms │                        3301.66ms │    no change │
│ QQuery 32    │ 17443.34ms │                       17993.00ms │    no change │
│ QQuery 33    │  9777.24ms │                        9713.21ms │    no change │
│ QQuery 34    │  9708.11ms │                        9670.55ms │    no change │
│ QQuery 35    │  3859.54ms │                        3874.81ms │    no change │
│ QQuery 36    │   373.39ms │                         360.72ms │    no change │
│ QQuery 37    │   232.22ms │                         234.41ms │    no change │
│ QQuery 38    │   203.17ms │                         202.32ms │    no change │
│ QQuery 39    │  1183.33ms │                        1183.65ms │    no change │
│ QQuery 40    │    96.68ms │                          97.02ms │    no change │
│ QQuery 41    │    86.76ms │                          85.72ms │    no change │
│ QQuery 42    │   105.67ms │                         102.31ms │    no change │
└──────────────┴────────────┴──────────────────────────────────┴──────────────┘

/// Buffered row count
buffered_rows: usize,
/// Buffer for combining batches
coalescer: BatchCoalescer,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole point of the PR is to extract these fields and their management into a struct.

@ozankabak
Copy link
Contributor

We plan to submit fetching support to CoalesceBatchesExec in a couple days. It'd be good to do the reorg after that to minimize total effort

@alamb alamb changed the title Extract CoalesceBatchesStream to a struct Extract CoalesceBatchesStream to a struct Jul 23, 2024
@alamb
Copy link
Contributor Author

alamb commented Jul 23, 2024

We plan to submit fetching support to CoalesceBatchesExec in a couple days. It'd be good to do the reorg after that to minimize total effort

@ozankabak Sounds good to me -- I don't think there is any particular rush for this PR. I'll get this PR ready and then I can rebase it after limit is added.

Though now that I think about it, perhaps adding limit would be more straightforward if it was encapsulated in a struct like this 🤔

@alamb alamb force-pushed the alamb/coalsece_batches_in_struct branch 2 times, most recently from 117901b to 8ef0d83 Compare July 23, 2024 10:52
@ozankabak
Copy link
Contributor

ozankabak commented Jul 23, 2024

For reference, here is the PR we are preparing (and will submit to the upstream repo soon).

@alamb
Copy link
Contributor Author

alamb commented Jul 23, 2024

For reference, here is the PR we are preparing (and will submit to the upstream repo soon).

I reviewed that PR and I believe it will be straightforward to resolve the conflicts once it is merged

assert_eq!(24, batches[1].num_rows());
assert_eq!(24, batches[2].num_rows());
assert_eq!(8, batches[3].num_rows());
// Handle fetch limit:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just moved this code out of the main CoalesceBatchesExec poll loop and into a function.

Other than the is_closed() handling this is literally a copy/paste

#[test]
fn test_coalesce() {
let batch = uint32_batch(0..8);
Test::new()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the tests to actually check the contents of the coalesced batches, in addition to checking the sizes.

I also factored out much of the test repetition and and made it easier to understand I think

@alamb
Copy link
Contributor Author

alamb commented Jul 29, 2024

This will likely conflict with #11667 so I would like to merge that one first

@alamb alamb force-pushed the alamb/coalsece_batches_in_struct branch from 352d521 to 40ac302 Compare July 29, 2024 21:26
@alamb alamb marked this pull request as ready for review July 30, 2024 09:26
/// 2. The output is a sequence of batches, with all but the last being at least
/// `target_batch_size` rows.
///
/// 3. Eventually this may also be able to handle other optimizations such as a
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is my long term ambition -- to apply filter + coalesce in a single operation (and thus save a copy)

@alamb
Copy link
Contributor Author

alamb commented Jul 30, 2024

This PR is now ready for review when someone has a chance

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @alamb

@andygrove
Copy link
Member

PR

@ozankabak Now that synnada-ai#27 is merged, are we good to merge this PR?

Copy link
Contributor

@ozankabak ozankabak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep -- good to go

@alamb
Copy link
Contributor Author

alamb commented Aug 2, 2024

Thank you @ozankabak and @andygrove for the review.

@alamb alamb merged commit 5ca4ec3 into apache:main Aug 2, 2024
26 of 27 checks passed
@alamb alamb deleted the alamb/coalsece_batches_in_struct branch August 2, 2024 12:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants