feat: Add support for searching and marshalling the new Timestamp column type.#54
Conversation
📝 WalkthroughWalkthroughUpdated CLP dependency and added nanosecond‑precision timestamp support across utils, cursors, vector loader, and tests; cursors now apply a timestamp-literal precision pass before evaluation. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Cursor as Archive/IR Cursor
participant PrecisionPass as SetTimestampLiteralPrecision
participant ArchiveReader
participant VectorLoader
participant Converter as convertNanosecondEpochToVeloxTimestamp
participant Velox as Velox Timestamp Vector
Client->>Cursor: submit query with timestamp literals
Cursor->>PrecisionPass: run on expr_ (choose Millis/Nanos)
PrecisionPass-->>Cursor: return updated expr_
Cursor->>ArchiveReader: load split / execute query
ArchiveReader->>VectorLoader: provide column reader & node types
VectorLoader->>Converter: read nanoseconds via TimestampColumnReader
Converter-->>Velox: populate Timestamp vector values
Velox-->>Client: return result rows
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@velox/connectors/clp/search_lib/archive/ClpArchiveVectorLoader.cpp`:
- Around line 92-98: The Timestamp branch in ClpArchiveVectorLoader uses an
undeclared identifier `message_index`; replace it with the correct loop variable
`messageIndex` so the call to reader->get_encoded_time(...) uses messageIndex;
edit the branch handling clp_s::NodeType::Timestamp (which casts to
clp_s::TimestampColumnReader* and calls convertNanosecondEpochToVeloxTimestamp)
to pass messageIndex to reader->get_encoded_time and then vector->set as before.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (7)
CMake/resolve_dependency_modules/clp.cmakevelox/connectors/clp/search_lib/ClpTimestampsUtils.hvelox/connectors/clp/search_lib/archive/ClpArchiveCursor.cppvelox/connectors/clp/search_lib/archive/ClpArchiveVectorLoader.cppvelox/connectors/clp/search_lib/ir/ClpIrCursor.cppvelox/connectors/clp/tests/ClpConnectorTest.cppvelox/connectors/clp/tests/examples/test_5.v0.5.0.clps
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
velox/connectors/clp/tests/ClpConnectorTest.cpp (1)
662-758: 🧹 Nitpick | 🔵 TrivialConsider parameterizing the two float timestamp pushdown tests.
test5FloatTimestampPushdownandtest5NewTimestampFormatFloatTimestampPushdownare mostly identical except input file/version. A parameterized helper would reduce maintenance drift.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@velox/connectors/clp/tests/ClpConnectorTest.cpp` around lines 662 - 758, Both tests duplicate logic; extract a single helper or parameterized test to run the same plan against different input files. Create a helper method (e.g., runFloatTimestampPushdownTest(const std::string& fileName)) that builds the PlanBuilder plan (reuse the same assignments, outputType, orderBy, etc.), calls getResults with makeClpSplit(getExampleFilePath(fileName), ClpConnectorSplit::SplitType::kArchive, kqlQuery), and asserts against the identical expected vector; then replace TEST_F(ClpConnectorTest, test5FloatTimestampPushdown) and TEST_F(ClpConnectorTest, test5NewTimestampFormatFloatTimestampPushdown) with two thin calls to that helper (or convert to a parameterized TEST_P that passes "test_5.clps" and "test_5.v0.5.0.clps") ensuring you keep kqlQuery, expected vector, and references to makeClpSplit/getExampleFilePath/ClpConnectorSplit consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@velox/connectors/clp/tests/ClpConnectorTest.cpp`:
- Around line 662-758: Both tests duplicate logic; extract a single helper or
parameterized test to run the same plan against different input files. Create a
helper method (e.g., runFloatTimestampPushdownTest(const std::string& fileName))
that builds the PlanBuilder plan (reuse the same assignments, outputType,
orderBy, etc.), calls getResults with makeClpSplit(getExampleFilePath(fileName),
ClpConnectorSplit::SplitType::kArchive, kqlQuery), and asserts against the
identical expected vector; then replace TEST_F(ClpConnectorTest,
test5FloatTimestampPushdown) and TEST_F(ClpConnectorTest,
test5NewTimestampFormatFloatTimestampPushdown) with two thin calls to that
helper (or convert to a parameterized TEST_P that passes "test_5.clps" and
"test_5.v0.5.0.clps") ensuring you keep kqlQuery, expected vector, and
references to makeClpSplit/getExampleFilePath/ClpConnectorSplit consistent.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
velox/connectors/clp/search_lib/archive/ClpArchiveVectorLoader.cppvelox/connectors/clp/tests/ClpConnectorTest.cpp
20001020ycx
left a comment
There was a problem hiding this comment.
LGTM as all of the comments in #51 are well addressed!
…mestamps are in microsecond precision.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@velox/connectors/clp/tests/ClpConnectorTest.cpp`:
- Around line 583-586: The IR currently treats integer timestamps as
microseconds while timestamp() literals are assumed milliseconds, causing
incorrect predicate pushdown; update the precision normalization code path so
integer IR timestamps (microsecond precision) are converted to millisecond
precision before any comparison with timestamp() literals, and change the test
in ClpConnectorTest.cpp to use timestamp() literals for assertions (verifying
timestamp() behavior directly rather than a raw integer literal) so the test
validates the corrected normalization and prevents silent mis-filtering. Ensure
you modify the normalization logic where IR timestamp literals are handled and
update the test expectations to use timestamp(...) calls.
| // This test can not use the `timestamp()` literal, since the integer | ||
| // timestamps are in microsecond precision, and we currently assume all IR | ||
| // timestamps are millisecond precision when comparing against timestamp | ||
| // literals. |
There was a problem hiding this comment.
IR timestamp literal precision mismatch still implies incorrect filtering risk
Line 583-Line 586 documents that IR integer timestamps are microseconds, while timestamp() literal comparisons assume milliseconds. That is a correctness gap for IR pushdown predicates on microsecond data and can silently mis-filter rows. Please fix the precision normalization path and convert this test to assert timestamp() behaviour directly instead of relying on a raw integer literal.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@velox/connectors/clp/tests/ClpConnectorTest.cpp` around lines 583 - 586, The
IR currently treats integer timestamps as microseconds while timestamp()
literals are assumed milliseconds, causing incorrect predicate pushdown; update
the precision normalization code path so integer IR timestamps (microsecond
precision) are converted to millisecond precision before any comparison with
timestamp() literals, and change the test in ClpConnectorTest.cpp to use
timestamp() literals for assertions (verifying timestamp() behavior directly
rather than a raw integer literal) so the test validates the corrected
normalization and prevents silent mis-filtering. Ensure you modify the
normalization logic where IR timestamp literals are handled and update the test
expectations to use timestamp(...) calls.
There was a problem hiding this comment.
Arguably the issue is more with the existing test than the pushdown logic. Nominally, timestamps are in millisecond precision for existing kv-ir, though technically this isn't guaranteed.
Medium-term we'll fix for the kv-ir format like we did for the archive format by updating the format to support a timestamp encoding with well-defined precision.
There was a problem hiding this comment.
@gibber9809, thank you for the clarification! That makes sense—documenting the current limitation in the test is reasonable, and having a clear plan to update the kv-ir format with well-defined timestamp precision (similar to the archive format approach) is the right long-term solution. I appreciate you explaining the nominal millisecond precision assumption for existing kv-ir.
Looking forward to seeing that format update when you tackle it! 👍
🐰✨
[Taking notes for future reviews...]
739a3aa
into
y-scope:presto-0.297-edge-10-clp-connector
Description
This PR adds support for searching against and marshalling the new Timestamp column type, as well as updating the kv-ir and archive search path such that timestamp() literals in KQL can be used to correctly search older archives and IR streams.
For backwards compatibility, timestamp() literals are treated as millisecond precision when searching kv-ir as well as archives older than v0.5.0.
Checklist
breaking change.
Validation performed
Timestampcolumn type to replaceDateStringcolumn type; Bump the archive version to0.5.0. clp#1788Summary by CodeRabbit
New Features
Bug Fixes
Tests
Chores