Skip to content

Conversation

@alanprot
Copy link
Collaborator

@alanprot alanprot commented Apr 16, 2025

This PR introduces a tsdbRowReader inspired by the Cloudflare PoC, but with a key design change.

Instead of using a fixed number of encoded data columns, this implementation proposes a more flexible approach: configure only the duration of each data column. This allows the format to be more flexible and adapt blocks of varying time ranges.

Ex:

  • Block duration: 24h

  • Configured column duration: 8h
    → Result: 3 data columns

  • Block duration: 48h
    → Result: 6 data columns

Timestamp Layout

Each data column starts at a calculated offset from the block's minimum timestamp (min_ts). Ex:

  • min_ts = x, duration = 8h
    data_col_1 = (x, x + 8h]
    data_col_2 = (x + 8h, x + 16h]
    data_col_3 = (x + 16h, x + 24h]
    

The minTs, maxTs and duration can be stored on the parquet metadata so we can use thins info to know what data cols to open when running a query.

Another change is that we are re-encoding the chunks to make sure they fit perfectly on the data cols boundaries.

PS:I wanna add more tests in this PR but im creating just to start the discussion.

Signed-off-by: alanprot <[email protected]>
@MichaHoffmann
Copy link
Collaborator

Great stuff! I have one proposal - right now we cannot define how long a parquet file should be - its always as long as the range that all blocks we use to create it cover. We could add parameters to the tsdb row reader that define this range "minT, maxT uint64" - that way we can break down 14d blocks ( that thanos or prometheus could compact to ) into many parquet files of length 1d if we want - or anything inbetween!

@alanprot
Copy link
Collaborator Author

Great stuff! I have one proposal - right now we cannot define how long a parquet file should be - its always as long as the range that all blocks we use to create it cover. We could add parameters to the tsdb row reader that define this range "minT, maxT uint64" - that way we can break down 14d blocks ( that thanos or prometheus could compact to ) into many parquet files of length 1d if we want - or anything inbetween!

Make sense!! I will change the PR to add this filter.

@jesusvazquez jesusvazquez changed the base branch from readme to main April 16, 2025 07:58
Signed-off-by: alanprot <[email protected]>
Signed-off-by: alanprot <[email protected]>
@alanprot alanprot force-pushed the converter branch 2 times, most recently from a8498ff to 1983967 Compare April 16, 2025 18:05
@alanprot alanprot merged commit 4b6b605 into prometheus-community:main Apr 16, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants