Skip to content

Proposed edits to section 2.2#4

Merged
hhhizzz merged 1 commit intohhhizzz:lm-pipeline-blogfrom
alamb:alamb/sec2.2
Dec 7, 2025
Merged

Proposed edits to section 2.2#4
hhhizzz merged 1 commit intohhhizzz:lm-pipeline-blogfrom
alamb:alamb/sec2.2

Conversation

@alamb
Copy link

@alamb alamb commented Dec 5, 2025

Here is some proposed "wordsmithing" changes for

I'll comment inline with the rationale

@github-actions
Copy link

github-actions bot commented Dec 5, 2025

Preview URL: https://alamb.github.io/arrow-site

If the preview URL doesn't work, you may forget to configure your fork repository for preview.
See https://github.com/apache/arrow-site/blob/main/README.md#forks how to configure.

### 2.2 Combining row selectors (`RowSelection::and_then`)

`RowSelection`—defined in `selection.rs`—is the token that every stage passes around. It mostly uses RLE (`RowSelector::select/skip(len)`) to describe sparse ranges. `and_then` is the core operator for "apply one selection to another": left-hand side is "rows already allowed," right-hand side further filters those rows, and the output is their boolean AND.
[`RowSelection`] represents the set of rows that will eventually be produced. It currently uses RLE (`RowSelector::select/skip(len)`) to describe sparse ranges. [`RowSelection::and_then`] is the core operator for "apply one selection to another": the left-hand argument is "rows already passed" and the right-hand argument is "which of the passed rows also pass the second filter." The output is their boolean AND.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to make it more clear what this code was referring to

</figure>


This keeps narrowing the filter while touching only lightweight metadata—no data copies. The current implementation of `and_then` is a two-pointer linear scan; complexity is linear in selector segments. The sooner predicates shrink the selection, the cheaper later scans become.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose moving this paragraph below the diagram so the text that describes the diagram is immediately above it

@hhhizzz hhhizzz merged commit dbc10db into hhhizzz:lm-pipeline-blog Dec 7, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants