Skip to content

feat: Add support for searching and marshalling the new Timestamp column type.#54

Merged
gibber9809 merged 15 commits intoy-scope:presto-0.297-edge-10-clp-connectorfrom
gibber9809:timestamp-integration-0.297
Mar 2, 2026
Merged

feat: Add support for searching and marshalling the new Timestamp column type.#54
gibber9809 merged 15 commits intoy-scope:presto-0.297-edge-10-clp-connectorfrom
gibber9809:timestamp-integration-0.297

Conversation

@gibber9809
Copy link

@gibber9809 gibber9809 commented Mar 2, 2026

Description

This PR adds support for searching against and marshalling the new Timestamp column type, as well as updating the kv-ir and archive search path such that timestamp() literals in KQL can be used to correctly search older archives and IR streams.

For backwards compatibility, timestamp() literals are treated as millisecond precision when searching kv-ir as well as archives older than v0.5.0.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

SELECT timestamp, CLP_GET_JSON_STRING() from clp.default.default  
WHERE "timestamp" BETWEEN TIMESTAMP '2023-03-27 00:41:39.863'
                  AND TIMESTAMP '2023-03-27 00:41:39.880' 
limit 100

Summary by CodeRabbit

  • New Features

    • Nanosecond-precision epoch → Timestamp conversion and expanded timestamp column support.
    • New timestamp-precision pass to normalize timestamp literals before evaluation.
  • Bug Fixes

    • Stricter handling when no timestamp ranges or schemas match; now returns explicit errors.
  • Tests

    • Added tests validating the new timestamp literal format and pushdown behavior.
  • Chores

    • Updated a dependency version.

@coderabbitai
Copy link

coderabbitai bot commented Mar 2, 2026

📝 Walkthrough

Walkthrough

Updated CLP dependency and added nanosecond‑precision timestamp support across utils, cursors, vector loader, and tests; cursors now apply a timestamp-literal precision pass before evaluation.

Changes

Cohort / File(s) Summary
Dependency Update
CMake/resolve_dependency_modules/clp.cmake
Updated CLP FetchContent GIT_TAG to a new commit revision.
Timestamp Utilities
velox/connectors/clp/search_lib/ClpTimestampsUtils.h
Added convertNanosecondEpochToVeloxTimestamp(clp_s::epochtime_t) to convert nanosecond-precision epoch times to Velox Timestamp, normalizing negative nanoseconds.
Cursor Precision Passes
velox/connectors/clp/search_lib/archive/ClpArchiveCursor.cpp, velox/connectors/clp/search_lib/ir/ClpIrCursor.cpp
Apply SetTimestampLiteralPrecision to expressions before evaluation; archive cursor chooses default precision from archive format and tightens timestamp-range/schema error handling.
Vector Loader Timestamp Support
velox/connectors/clp/search_lib/archive/ClpArchiveVectorLoader.cpp
Add handling for NodeType::Timestamp via TimestampColumnReader and route nanosecond conversion; adjust load fallbacks and add explicit template instantiations for new node-type variants (Timestamp, FormattedFloat, DictionaryFloat, etc.).
Tests — New/Updated Timestamp Formats
velox/connectors/clp/tests/ClpConnectorTest.cpp
Replace inline timestamp constants with timestamp("...", "\L") literal form, expand test data for finer fractions, and add tests validating the new timestamp literal format.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Cursor as Archive/IR Cursor
    participant PrecisionPass as SetTimestampLiteralPrecision
    participant ArchiveReader
    participant VectorLoader
    participant Converter as convertNanosecondEpochToVeloxTimestamp
    participant Velox as Velox Timestamp Vector

    Client->>Cursor: submit query with timestamp literals
    Cursor->>PrecisionPass: run on expr_ (choose Millis/Nanos)
    PrecisionPass-->>Cursor: return updated expr_
    Cursor->>ArchiveReader: load split / execute query
    ArchiveReader->>VectorLoader: provide column reader & node types
    VectorLoader->>Converter: read nanoseconds via TimestampColumnReader
    Converter-->>Velox: populate Timestamp vector values
    Velox-->>Client: return result rows
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Possibly related PRs

Suggested reviewers

  • kirkrodrigues
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding support for a new Timestamp column type, which is reflected across all modified files including helper functions, vector loaders, and tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@velox/connectors/clp/search_lib/archive/ClpArchiveVectorLoader.cpp`:
- Around line 92-98: The Timestamp branch in ClpArchiveVectorLoader uses an
undeclared identifier `message_index`; replace it with the correct loop variable
`messageIndex` so the call to reader->get_encoded_time(...) uses messageIndex;
edit the branch handling clp_s::NodeType::Timestamp (which casts to
clp_s::TimestampColumnReader* and calls convertNanosecondEpochToVeloxTimestamp)
to pass messageIndex to reader->get_encoded_time and then vector->set as before.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b288479 and 8bb62b3.

📒 Files selected for processing (7)
  • CMake/resolve_dependency_modules/clp.cmake
  • velox/connectors/clp/search_lib/ClpTimestampsUtils.h
  • velox/connectors/clp/search_lib/archive/ClpArchiveCursor.cpp
  • velox/connectors/clp/search_lib/archive/ClpArchiveVectorLoader.cpp
  • velox/connectors/clp/search_lib/ir/ClpIrCursor.cpp
  • velox/connectors/clp/tests/ClpConnectorTest.cpp
  • velox/connectors/clp/tests/examples/test_5.v0.5.0.clps

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
velox/connectors/clp/tests/ClpConnectorTest.cpp (1)

662-758: 🧹 Nitpick | 🔵 Trivial

Consider parameterizing the two float timestamp pushdown tests.

test5FloatTimestampPushdown and test5NewTimestampFormatFloatTimestampPushdown are mostly identical except input file/version. A parameterized helper would reduce maintenance drift.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@velox/connectors/clp/tests/ClpConnectorTest.cpp` around lines 662 - 758, Both
tests duplicate logic; extract a single helper or parameterized test to run the
same plan against different input files. Create a helper method (e.g.,
runFloatTimestampPushdownTest(const std::string& fileName)) that builds the
PlanBuilder plan (reuse the same assignments, outputType, orderBy, etc.), calls
getResults with makeClpSplit(getExampleFilePath(fileName),
ClpConnectorSplit::SplitType::kArchive, kqlQuery), and asserts against the
identical expected vector; then replace TEST_F(ClpConnectorTest,
test5FloatTimestampPushdown) and TEST_F(ClpConnectorTest,
test5NewTimestampFormatFloatTimestampPushdown) with two thin calls to that
helper (or convert to a parameterized TEST_P that passes "test_5.clps" and
"test_5.v0.5.0.clps") ensuring you keep kqlQuery, expected vector, and
references to makeClpSplit/getExampleFilePath/ClpConnectorSplit consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@velox/connectors/clp/tests/ClpConnectorTest.cpp`:
- Around line 662-758: Both tests duplicate logic; extract a single helper or
parameterized test to run the same plan against different input files. Create a
helper method (e.g., runFloatTimestampPushdownTest(const std::string& fileName))
that builds the PlanBuilder plan (reuse the same assignments, outputType,
orderBy, etc.), calls getResults with makeClpSplit(getExampleFilePath(fileName),
ClpConnectorSplit::SplitType::kArchive, kqlQuery), and asserts against the
identical expected vector; then replace TEST_F(ClpConnectorTest,
test5FloatTimestampPushdown) and TEST_F(ClpConnectorTest,
test5NewTimestampFormatFloatTimestampPushdown) with two thin calls to that
helper (or convert to a parameterized TEST_P that passes "test_5.clps" and
"test_5.v0.5.0.clps") ensuring you keep kqlQuery, expected vector, and
references to makeClpSplit/getExampleFilePath/ClpConnectorSplit consistent.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8bb62b3 and 9215375.

📒 Files selected for processing (2)
  • velox/connectors/clp/search_lib/archive/ClpArchiveVectorLoader.cpp
  • velox/connectors/clp/tests/ClpConnectorTest.cpp

Copy link

@20001020ycx 20001020ycx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as all of the comments in #51 are well addressed!

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@velox/connectors/clp/tests/ClpConnectorTest.cpp`:
- Around line 583-586: The IR currently treats integer timestamps as
microseconds while timestamp() literals are assumed milliseconds, causing
incorrect predicate pushdown; update the precision normalization code path so
integer IR timestamps (microsecond precision) are converted to millisecond
precision before any comparison with timestamp() literals, and change the test
in ClpConnectorTest.cpp to use timestamp() literals for assertions (verifying
timestamp() behavior directly rather than a raw integer literal) so the test
validates the corrected normalization and prevents silent mis-filtering. Ensure
you modify the normalization logic where IR timestamp literals are handled and
update the test expectations to use timestamp(...) calls.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9215375 and 8f673c6.

📒 Files selected for processing (1)
  • velox/connectors/clp/tests/ClpConnectorTest.cpp

Comment on lines +583 to +586
// This test can not use the `timestamp()` literal, since the integer
// timestamps are in microsecond precision, and we currently assume all IR
// timestamps are millisecond precision when comparing against timestamp
// literals.
Copy link

@coderabbitai coderabbitai bot Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

IR timestamp literal precision mismatch still implies incorrect filtering risk

Line 583-Line 586 documents that IR integer timestamps are microseconds, while timestamp() literal comparisons assume milliseconds. That is a correctness gap for IR pushdown predicates on microsecond data and can silently mis-filter rows. Please fix the precision normalization path and convert this test to assert timestamp() behaviour directly instead of relying on a raw integer literal.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@velox/connectors/clp/tests/ClpConnectorTest.cpp` around lines 583 - 586, The
IR currently treats integer timestamps as microseconds while timestamp()
literals are assumed milliseconds, causing incorrect predicate pushdown; update
the precision normalization code path so integer IR timestamps (microsecond
precision) are converted to millisecond precision before any comparison with
timestamp() literals, and change the test in ClpConnectorTest.cpp to use
timestamp() literals for assertions (verifying timestamp() behavior directly
rather than a raw integer literal) so the test validates the corrected
normalization and prevents silent mis-filtering. Ensure you modify the
normalization logic where IR timestamp literals are handled and update the test
expectations to use timestamp(...) calls.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguably the issue is more with the existing test than the pushdown logic. Nominally, timestamps are in millisecond precision for existing kv-ir, though technically this isn't guaranteed.

Medium-term we'll fix for the kv-ir format like we did for the archive format by updating the format to support a timestamp encoding with well-defined precision.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gibber9809, thank you for the clarification! That makes sense—documenting the current limitation in the test is reasonable, and having a clear plan to update the kv-ir format with well-defined timestamp precision (similar to the archive format approach) is the right long-term solution. I appreciate you explaining the nominal millisecond precision assumption for existing kv-ir.

Looking forward to seeing that format update when you tackle it! 👍

🐰✨
[Taking notes for future reviews...]

@gibber9809 gibber9809 requested a review from 20001020ycx March 2, 2026 21:04
@gibber9809 gibber9809 merged commit 739a3aa into y-scope:presto-0.297-edge-10-clp-connector Mar 2, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants