Skip to content

Proposed Updates to section 2.1#3

Merged
hhhizzz merged 2 commits intohhhizzz:lm-pipeline-blogfrom
alamb:alamb/sec2
Dec 7, 2025
Merged

Proposed Updates to section 2.1#3
hhhizzz merged 2 commits intohhhizzz:lm-pipeline-blogfrom
alamb:alamb/sec2

Conversation

@alamb
Copy link

@alamb alamb commented Dec 5, 2025

Here is some proposed "wordsmithing" changes for

I'll comment inline with the rationale

@github-actions
Copy link

github-actions bot commented Dec 5, 2025

Preview URL: https://alamb.github.io/arrow-site

If the preview URL doesn't work, you may forget to configure your fork repository for preview.
See https://github.com/apache/arrow-site/blob/main/README.md#forks how to configure.

- **[ReadPlan](https://github.com/apache/arrow-rs/blob/bab30ae3d61509aa8c73db33010844d440226af2/parquet/src/arrow/arrow_reader/read_plan.rs#L302) / [ReadPlanBuilder](https://github.com/apache/arrow-rs/blob/bab30ae3d61509aa8c73db33010844d440226af2/parquet/src/arrow/arrow_reader/read_plan.rs#L34)**: Encodes "which columns to read and with what row subset" into a plan. It does not pre-read all predicate columns. It reads one, tightens the selection, and then moves on.
- **[RowSelection](https://github.com/apache/arrow-rs/blob/bab30ae3d61509aa8c73db33010844d440226af2/parquet/src/arrow/arrow_reader/selection.rs#L139)**: Describes "skip/select N rows" via RLE (`RowSelector::select/skip`) or a bitmask. This is the core mechanism that carries sparsity through the pipeline.
- **[ArrayReader](https://github.com/apache/arrow-rs/blob/bab30ae3d61509aa8c73db33010844d440226af2/parquet/src/arrow/array_reader/mod.rs#L85)**: The component responsible for I/O and decoding. It receives a `RowSelection` and decides which pages to read and which values to decode.
- **[RowSelection](https://github.com/apache/arrow-rs/blob/bab30ae3d61509aa8c73db33010844d440226af2/parquet/src/arrow/arrow_reader/selection.rs#L139)**: Describes "skip/select N rows" using either [Run-length encoding] (RLE) (called a [`RowSelector`]) or a bitmask. This is the core mechanism that carries sparsity through the pipeline.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few more links

@hhhizzz hhhizzz merged commit d3f1a7b into hhhizzz:lm-pipeline-blog Dec 7, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants