Implement TSDB->Parquet RowReader #2
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces a
tsdbRowReaderinspired by the Cloudflare PoC, but with a key design change.Instead of using a fixed number of encoded data columns, this implementation proposes a more flexible approach: configure only the duration of each data column. This allows the format to be more flexible and adapt blocks of varying time ranges.
Ex:
Block duration: 24h
Configured column duration: 8h
→ Result: 3 data columns
Block duration: 48h
→ Result: 6 data columns
Timestamp Layout
Each data column starts at a calculated offset from the block's minimum timestamp (
min_ts). Ex:min_ts = x,duration = 8hThe
minTs,maxTsanddurationcan be stored on the parquet metadata so we can use thins info to know what data cols to open when running a query.Another change is that we are re-encoding the chunks to make sure they fit perfectly on the data cols boundaries.
PS:I wanna add more tests in this PR but im creating just to start the discussion.