feat: Add support for searching and marshalling the new `Timestamp` column type. by gibber9809 · Pull Request #51 · y-scope/velox

gibber9809 · 2026-02-18T20:39:33Z

Description

This PR adds support for searching against and marshalling the new Timestamp column type, as well as updating the kv-ir and archive search path such that timestamp() literals in KQL can be used to correctly search older archives and IR streams.

For backwards compatibility, timestamp() literals are treated as millisecond precision when searching kv-ir as well as archives older than v0.5.0.

This PR will require follow-up work in presto to handle pushing down timestamps as timestamp() literals, as well as treating the new timestamp column type (node ID 14) as a database TIMESTAMP type.

Checklist

The PR satisfies the contribution guidelines.
This is a breaking change and that has been indicated in the PR title, OR this isn't a
breaking change.
Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Updated existing tests to use timestamp() literals for KQL filtering
- Note: this allows us to remove some behaviour differences in the pushdown/non-pushdown cases for test 5
Added test for filtering timestamps in a v0.5.0 archive
E2E test set-up:
- Presto with 0.297 upgrade + the changes in this PR
- Velox with 0.297 upgrade + the changes in feat: Add support for searching and marshalling the new Timestamp column type. #51
- Clp Package: compiled from scratch at commit hash (13b47f069e16e1c70e2575080edbd970c82f9c8f), this includes the Timestamp type PR feat(clp-s)!: Add Timestamp column type to replaceDateString column type; Bump the archive version to 0.5.0. clp#1788
- Below Presto SQL were made and the correct data are returned

SELECT CLP_GET_JSON_STRING() from clp.default.default  
WHERE "timestamp" BETWEEN TIMESTAMP '2023-03-27 00:41:39.863'
                  AND TIMESTAMP '2023-03-27 00:41:39.880' 
limit 100

 SELECT CLP_GET_JSON_STRING()
  FROM clp.default.default
  WHERE "timestamp" >= TIMESTAMP '2023-03-27 00:41:39.863'
  LIMIT 100

Summary by CodeRabbit

New Features
- KQL timestamp(...) expressions are now supported for pushdown, improving explicit-timestamp query handling.
Bug Fixes
- Normalized timestamp literal precision across archive formats for consistent interpretation.
- Added nanosecond-precision timestamp conversion with correct negative-nanosecond handling.
- Improved error logging to include archive path when opening fails.
Tests
- Added and updated tests to cover timestamp pushdown and precision behavior.
Chores
- Updated several dependency pins and bumped the formatting library version.

…tamps.

…earch.

…amp type

coderabbitai · 2026-02-18T20:40:01Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Pins CLP, log_surgeon, fmt, and spdlog dependencies to specific versions/commits; removes TimestampPattern init; adds nanosecond→Velox Timestamp conversion; normalizes timestamp-literal precision in IR and archive load paths; switches DateString handling to DeprecatedDateString; updates tests to use explicit KQL timestamp(...) literals.

Changes

Cohort / File(s)	Summary
Dependency Management `CMake/resolve_dependency_modules/clp.cmake`, `CMake/resolve_dependency_modules/log_surgeon.cmake`, `CMake/resolve_dependency_modules/fmt.cmake`, `CMake/resolve_dependency_modules/spdlog.cmake`, `CMakeLists.txt`	Pinned `clp` and `log_surgeon` to specific commit hashes; bumped `fmt` to 11.2.0 (bundled) and updated checksum; bumped `spdlog` build version and checksum; moved `log_surgeon` resolution order in CMakeLists.
Connector Initialization `velox/connectors/clp/ClpConnector.cpp`	Removed inclusion of `clp_s/TimestampPattern.hpp` and removed calls to `clp_s::TimestampPattern::init()` from constructors.
Timestamp Utilities `velox/connectors/clp/search_lib/ClpTimestampsUtils.h`	Added `clp_s/Defs.hpp` include and helper `convertNanosecondEpochToVeloxTimestamp(clp_s::epochtime_t)` to convert nanosecond epoch values (handling negative nanos) to Velox `Timestamp`.
Archive Cursor & Normalization `velox/connectors/clp/search_lib/archive/ClpArchiveCursor.cpp`	Improve archive-open error logging; determine default timestamp precision from archive header (ms for older format, ns for newer) and apply `SetTimestampLiteralPrecision` to `expr_` before evaluation.
Vector Loading `velox/connectors/clp/search_lib/archive/ClpArchiveVectorLoader.cpp`	Replace `DateString` handling with `DeprecatedDateString`: update switch-case, reader casts, template instantiations; add explicit instantiations for `FormattedFloat` and `DictionaryFloat`.
IR Loading `velox/connectors/clp/search_lib/ir/ClpIrCursor.cpp`	Inject `SetTimestampLiteralPrecision` (Milliseconds) pass into IR load path so timestamp literals are normalized before creating QueryHandler.
Tests `velox/connectors/clp/tests/ClpConnectorTest.cpp`	Updated tests to use `timestamp("...", "\L")` KQL literal forms, adjusted expected outputs, and added `test5v050FloatTimestampPushdown`.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant ArchiveCursor
  participant ArchiveReader
  participant Expr
  participant PrecisionPass
  participant QueryHandler
  participant VectorLoader
  participant Util

  Client->>ArchiveCursor: request query with parsed Expr
  ArchiveCursor->>ArchiveReader: open archive & read header (version)
  ArchiveCursor->>ArchiveCursor: determine defaultTimestampPrecision (ms or ns)
  ArchiveCursor->>PrecisionPass: construct with chosen precision
  PrecisionPass->>Expr: apply precision normalization to timestamp literals
  ArchiveCursor->>QueryHandler: create with normalized Expr
  QueryHandler->>VectorLoader: evaluate timestamp index / load vectors
  VectorLoader->>Util: convert nano-epoch -> Velox Timestamp (convertNanosecondEpochToVeloxTimestamp)
  VectorLoader->>QueryHandler: return populated timestamp vectors
  QueryHandler->>Client: return results

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Support Timestamp Type in IR clp#1960: Implements IR-side timestamp-literal precision handling, archive normalization, and timestamp conversion utilities consistent with the issue's requested support for timestamp(...) expression handling.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main objective of the pull request: adding support for searching and marshalling a new Timestamp column type. It is specific, clear, and reflects the primary changes across all modified files.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

velox/connectors/clp/search_lib/ClpTimestampsUtils.h (1)
1-137: ⚠️ Potential issue | 🟡 Minor

clang-format failure reported by CI.

The pipeline reports that clang-format modified this file. Please run clang-format and commit the result.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@velox/connectors/clp/search_lib/ClpTimestampsUtils.h` around lines 1 - 137,
Clang-format changed ClpTimestampsUtils.h; run clang-format on the header and
commit the formatted result to fix the CI failure—apply formatting to the file
containing estimatePrecision, both convertToVeloxTimestamp overloads, and
convertNanosecondEpochToVeloxTimestamp so the definitions and includes match
project style, then add and push the updated file.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CMake/resolve_dependency_modules/clp.cmake`:
- Line 19: Add a comment next to the GIT_TAG pin for commit
f82e6114160a6addd4727259906bcf621ac9912c explaining this temporary pin and when
it can be replaced: note that the codebase relies on CLP APIs
SetTimestampLiteralPrecision, TimestampLiteral, SingleFileArchiveDefs,
cNewTimestampFormatVersion, and DeprecatedDateString which are not yet in a
tagged release, and indicate the target CLP release version (or ">= vX.Y.Z"
placeholder) to switch to once those APIs are available; place the comment
immediately above or beside the GIT_TAG line so future maintainers can easily
find and update it.

In `@velox/connectors/clp/search_lib/ClpTimestampsUtils.h`:
- Around line 129-137: The implementation of
convertNanosecondEpochToVeloxTimestamp duplicates the Nanoseconds branch of
convertToVeloxTimestamp(int64_t); replace the body to delegate to
convertToVeloxTimestamp by calling
convertToVeloxTimestamp(static_cast<int64_t>(timestamp)) to remove duplication,
and if there is an intentional reason to bypass the heuristic in
convertToVeloxTimestamp, instead keep the implementation but add a brief comment
explaining why the heuristic must be avoided for clp_s::epochtime_t inputs
(reference functions: convertNanosecondEpochToVeloxTimestamp and
convertToVeloxTimestamp).

In `@velox/connectors/clp/tests/ClpConnectorTest.cpp`:
- Around line 704-750: The test name TEST_F(ClpConnectorTest,
test5FloatTimestampPushdown) is duplicated; rename the second occurrence (the
one that uses "test_5.v0.5.0.clps") to a unique name (e.g.,
test5FloatTimestampPushdown_v0_5_0 or test5FloatTimestampPushdownArchive) so
Google Test won't generate conflicting fixtures, and ensure any mentions in that
test (the TEST_F line and any comments referring to the test) are updated to the
new name.

---

Outside diff comments:
In `@velox/connectors/clp/search_lib/ClpTimestampsUtils.h`:
- Around line 1-137: Clang-format changed ClpTimestampsUtils.h; run clang-format
on the header and commit the formatted result to fix the CI failure—apply
formatting to the file containing estimatePrecision, both
convertToVeloxTimestamp overloads, and convertNanosecondEpochToVeloxTimestamp so
the definitions and includes match project style, then add and push the updated
file.

coderabbitai · 2026-02-18T20:47:39Z

CMake/resolve_dependency_modules/clp.cmake

  clp
  GIT_REPOSITORY https://github.com/y-scope/clp.git
-  GIT_TAG v0.8.0)
+  GIT_TAG f82e6114160a6addd4727259906bcf621ac9912c)


🧹 Nitpick | 🔵 Trivial

🧩 Analysis chain

🏁 Script executed:

# Verify the commit hash exists in the clp repository git ls-remote https://github.com/y-scope/clp.git | grep f82e6114160a6addd4727259906bcf621ac9912c

Repository: y-scope/velox

Length of output: 159

🏁 Script executed:

# Read the clp.cmake file to see full context cat -n CMake/resolve_dependency_modules/clp.cmake

Repository: y-scope/velox

Length of output: 1729

🏁 Script executed:

# Search for the APIs mentioned in the review comment in the codebase rg "DeprecatedDateString|SetTimestampLiteralPrecision|TimestampLiteral|SingleFileArchiveDefs|cNewTimestampFormatVersion" --type cpp --type h -l

Repository: y-scope/velox

Length of output: 235

🏁 Script executed:

# Search for the specific APIs mentioned in the review comment rg "DeprecatedDateString|SetTimestampLiteralPrecision|TimestampLiteral|SingleFileArchiveDefs|cNewTimestampFormatVersion" -A 2 -B 2

Repository: y-scope/velox

Length of output: 5337

🏁 Script executed:

# Also check if there's any documentation or comments about why this specific commit is needed rg -i "commit|pin|tag|clp" CMake/resolve_dependency_modules/ -A 2 -B 2

Repository: y-scope/velox

Length of output: 17902

Add a comment documenting when this CLP commit pin can be replaced with a tagged release.

The CLP dependency is correctly pinned to commit f82e6114160a6addd4727259906bcf621ac9912c because the required APIs (SetTimestampLiteralPrecision, TimestampLiteral, SingleFileArchiveDefs, cNewTimestampFormatVersion, DeprecatedDateString) are used throughout the codebase but not yet available in a tagged release. To improve maintainability and ensure this pin isn't forgotten, add a comment in the CMake file noting the target CLP release version for this upgrade.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@CMake/resolve_dependency_modules/clp.cmake` at line 19, Add a comment next to the GIT_TAG pin for commit f82e6114160a6addd4727259906bcf621ac9912c explaining this temporary pin and when it can be replaced: note that the codebase relies on CLP APIs SetTimestampLiteralPrecision, TimestampLiteral, SingleFileArchiveDefs, cNewTimestampFormatVersion, and DeprecatedDateString which are not yet in a tagged release, and indicate the target CLP release version (or ">= vX.Y.Z" placeholder) to switch to once those APIs are available; place the comment immediately above or beside the GIT_TAG line so future maintainers can easily find and update it.

velox/connectors/clp/search_lib/ClpTimestampsUtils.h

velox/connectors/clp/tests/ClpConnectorTest.cpp

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

velox/connectors/clp/search_lib/archive/ClpArchiveVectorLoader.cpp (1)
212-238: ⚠️ Potential issue | 🟠 Major

Verify CLP Timestamp type dispatch for new nanosecond-precision archive format.

The code does not call convertNanosecondEpochToVeloxTimestamp in any branch, and lacks dispatch for a potential TimestampColumnReader. Two issues need addressing:

Int64ColumnReader branch (line 215): Currently calls convertToVeloxTimestamp(int64_t), which uses estimatePrecision() to heuristically classify precision. For nanosecond-precision timestamps with absolute value ≤ kEpochNanoseconds1971 (~1 year from epoch), the heuristic will misclassify them as seconds/milliseconds/microseconds. If the new Timestamp node type (ID 14) uses Int64ColumnReader with nanosecond precision, this branch should call convertNanosecondEpochToVeloxTimestamp() instead.

Missing TimestampColumnReader dispatch: If CLP introduces a dedicated TimestampColumnReader for the new Timestamp type, the current dispatch has no matching check—the else clause (line 236) falls through to populateTimestampData<clp_s::NodeType::Unknown>, which silently sets all rows to null.

Confirm what reader type CLP uses for the new Timestamp node type and whether a TimestampColumnReader exists. If Int64ColumnReader is used, add a branch that calls convertNanosecondEpochToVeloxTimestamp() for nanosecond data. If a TimestampColumnReader exists, add a dispatch check for it.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@velox/connectors/clp/search_lib/archive/ClpArchiveVectorLoader.cpp` around
lines 212 - 238, The Timestamp dispatch for ColumnType::Timestamp
misclassifies/ignores nanosecond-precision archive data: update the branching in
the Timestamp case in ClpArchiveVectorLoader.cpp so that when columnReader_ is
an Int64ColumnReader and the CLP node type is the new Timestamp node
(clp_s::NodeType::Timestamp) you call convertNanosecondEpochToVeloxTimestamp via
populateTimestampData (instead of convertToVeloxTimestamp/estimatePrecision),
and if CLP exposes a dedicated TimestampColumnReader add an explicit
dynamic_cast check for clp_s::TimestampColumnReader* to dispatch to the same
nanosecond converter; leave the Unknown fallback only for truly unknown node
types. Use the existing symbols populateTimestampData,
convertNanosecondEpochToVeloxTimestamp, convertToVeloxTimestamp, columnReader_,
clp_s::Int64ColumnReader, clp_s::TimestampColumnReader,
clp_s::NodeType::Timestamp and clp_s::NodeType::Unknown to locate and implement
the fixes.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@velox/connectors/clp/search_lib/archive/ClpArchiveVectorLoader.cpp`:
- Around line 212-238: The Timestamp dispatch for ColumnType::Timestamp
misclassifies/ignores nanosecond-precision archive data: update the branching in
the Timestamp case in ClpArchiveVectorLoader.cpp so that when columnReader_ is
an Int64ColumnReader and the CLP node type is the new Timestamp node
(clp_s::NodeType::Timestamp) you call convertNanosecondEpochToVeloxTimestamp via
populateTimestampData (instead of convertToVeloxTimestamp/estimatePrecision),
and if CLP exposes a dedicated TimestampColumnReader add an explicit
dynamic_cast check for clp_s::TimestampColumnReader* to dispatch to the same
nanosecond converter; leave the Unknown fallback only for truly unknown node
types. Use the existing symbols populateTimestampData,
convertNanosecondEpochToVeloxTimestamp, convertToVeloxTimestamp, columnReader_,
clp_s::Int64ColumnReader, clp_s::TimestampColumnReader,
clp_s::NodeType::Timestamp and clp_s::NodeType::Unknown to locate and implement
the fixes.

---

Duplicate comments:
In `@velox/connectors/clp/search_lib/ClpTimestampsUtils.h`:
- Around line 129-138: The body of convertNanosecondEpochToVeloxTimestamp
duplicates the Nanoseconds branch of convertToVeloxTimestamp(int64_t); refactor
by delegating to that function to remove duplication: have
convertNanosecondEpochToVeloxTimestamp convert the clp_s::epochtime_t to the
same int64_t nanoseconds value and call convertToVeloxTimestamp(int64_t)
(referencing Timestamp::kNanosInSecond for the conversion) or, if bypassing the
heuristic is intentional, replace the duplicate logic with a brief clarifying
comment explaining why the direct nanoseconds handling is required (e.g., to
correctly handle pre-1971 timestamps).

In `@velox/connectors/clp/tests/ClpConnectorTest.cpp`:
- Around line 704-750: The test name TEST_F(ClpConnectorTest,
test5FloatTimestampPushdown) is duplicated; rename the second occurrence (the
one using "test_5.v0.5.0.clps") to a unique identifier (e.g.,
test5FloatTimestampPushdown_v0_5_0 or test5FloatTimestampPushdownArchive) so
GTest doesn't generate conflicting test classes; update only the TEST_F macro's
second argument in that test block (leave the fixture ClpConnectorTest and the
test body unchanged).

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@velox/connectors/clp/tests/ClpConnectorTest.cpp`:
- Around line 579-580: The timestamp literal in test4's kqlQuery is using a
16-digit value ("1756003005000000") which contradicts CLP IR's millisecond
precision (SetTimestampLiteralPrecision::Milliseconds in ClpIrCursor.cpp);
update the kqlQuery string in ClpConnectorTest.cpp (symbol: kqlQuery used in
test4) to the 13-digit millisecond value "1756003005000" so it matches test5's
format, and scan the file for any other 16-digit timestamp literals (e.g., the
similar value around line 659) and convert them to millisecond precision as
well.

coderabbitai · 2026-02-18T21:35:29Z

velox/connectors/clp/tests/ClpConnectorTest.cpp

+  const std::shared_ptr<std::string> kqlQuery = std::make_shared<std::string>(
+      R"(timestamp < timestamp("1756003005000000", "\L"))");


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

CLP KQL timestamp format specifier "\L" milliseconds precision

💡 Result:

In CrowdStrike LogScale / Humio “KQL” time formatting, the milliseconds component is the L time modifier:

L = milliseconds within the second, always zero‑padded to 3 digits (000–999)—i.e., millisecond precision only. [1]

If you need more digits, the specifier is N = nanoseconds, zero‑padded to 9 digits (000000000–999999999). [1]

Example (milliseconds shown as .SSS):

| eval ts_str = formatTime("%H:%M:%S.%L", field=@timestamp)

Note: In LogScale, the time modifiers like L are used inside formatTime() patterns as %...%L (or via format() using %tL), not as a standalone \L. [1]

Sources
[1] LogScale docs: format() / date-time modifiers (L milliseconds, N nanoseconds) (library.humio.com)

Citations:

1: https://library.humio.com/data-analysis-1.171/functions-format.html

🏁 Script executed:

# First, let's locate and examine the test file mentioned fd ClpConnectorTest.cpp

Repository: y-scope/velox

Length of output: 104

🏁 Script executed:

# Check the exact content at lines 579-580 and 659-660 if fd ClpConnectorTest.cpp; then echo "=== Lines 579-580 ===" && \ sed -n '575,585p' $(fd ClpConnectorTest.cpp) echo -e "\n=== Lines 659-660 ===" && \ sed -n '655,665p' $(fd ClpConnectorTest.cpp) fi

Repository: y-scope/velox

Length of output: 1262

🌐 Web query:

CLP KQL timestamp() function format specifiers documentation

💡 Result:

In YScope/CLP, the “timestamp formatting” you’re probably looking for is the YScope Log Viewer format-string timestamp formatter, which takes a Day.js format string as its “options”:

Syntax: {field:timestamp[:options]} where options is a Day.js format string. Default is YYYY-MM-DDTHH:mm:ssZ. [1]

Example: {ts:timestamp:YYYY-MM-DD} → 2024-11-27 [1] and a longer example appears in the format-string syntax docs. [2]

Day.js token reference (format specifiers): Day.js format() documentation lists tokens like YYYY, MM, DD, HH, mm, ss, SSS, Z, etc. [3]

If instead you meant CLP’s KQL-like search (e.g., clp-s), the docs/examples show time range filtering via numeric timestamps and a date("...") helper, not a timestamp() formatting function. [4]

Sources: [1] [2] [3] [4]

🌐 Web query:

"Compressed Log Processing" CLP KQL time format

💡 Result:

In CLP’s KQL (used by clp-s), time filtering is typically done on the ts field in either of these formats:

Unix epoch seconds (integer):

ts >= 1649923037 AND ts <= 1649923038 [1]

date("...") with an ISO-8601-like datetime string:

ts >= date("2022-04-14T07:57:17") AND ts <= date("2022-04-14T07:57:18") [1]

Note: CLP docs also state time zone information is not preserved, so avoid relying on timezone offsets in stored/queried timestamps. [1]

Source: CLP clp-s user guide (search examples + limitations). [1]

🏁 Script executed:

# Search for CLP-related documentation or comments in the codebase rg -i "kql.*timestamp|format.*specifier|\\\L" velox/connectors/clp/ -A 3 -B 3 | head -100

Repository: y-scope/velox

Length of output: 2215

🏁 Script executed:

# Look for any CLP documentation or README fd -e md -e txt -e rst velox/connectors/clp/ | xargs cat 2>/dev/null | head -200

Repository: y-scope/velox

Length of output: 446

🏁 Script executed:

# Search for references to "\L" format or timestamp parsing logic in CLP connector rg "\\\\L|format.*timestamp|timestamp.*precision" velox/connectors/clp/ -i | head -50

Repository: y-scope/velox

Length of output: 3517

🏁 Script executed:

# Search for SetTimestampLiteralPrecision and how it handles format specifiers rg "SetTimestampLiteralPrecision|TimestampLiteral.*Precision" velox/connectors/clp/ -B 5 -A 10

Repository: y-scope/velox

Length of output: 6629

🏁 Script executed:

# Look for handling of "\L" format specifier or how timestamp format is parsed rg -i "format.*\\\\L|\\\\L.*format|timestamp.*format" velox/connectors/clp/ -B 3 -A 3

Repository: y-scope/velox

Length of output: 2943

🏁 Script executed:

# Search for estimatePrecision usage and implementation rg "estimatePrecision" velox/connectors/clp/ -B 5 -A 10

Repository: y-scope/velox

Length of output: 4583

🏁 Script executed:

# Check if there's any logic that treats "\L" as a special case rg "\\\\L|format.*specifier" velox/connectors/clp/search_lib/ -B 2 -A 2

Repository: y-scope/velox

Length of output: 39

🏁 Script executed:

# Look at the test file to see if there are any assertions or comments about expected behaviour sed -n '575,610p' $(fd ClpConnectorTest.cpp)

Repository: y-scope/velox

Length of output: 1522

🏁 Script executed:

# Check the full test5 content to see the complete query and assertion sed -n '655,690p' $(fd ClpConnectorTest.cpp)

Repository: y-scope/velox

Length of output: 1707

🏁 Script executed:

# Search for any comments or documentation about format specifiers in test file rg -i "format|specifier|precision|\\\\L" $(fd ClpConnectorTest.cpp) -B 2 -A 2 | head -80

Repository: y-scope/velox

Length of output: 2202

🏁 Script executed:

# Check ClpTimestampsUtils.h in full to understand the thresholds better cat velox/connectors/clp/search_lib/ClpTimestampsUtils.h | head -100

Repository: y-scope/velox

Length of output: 3740

🏁 Script executed:

# Search for how SetTimestampLiteralPrecision works - does it force or suggest precision rg -i "SetTimestampLiteralPrecision" --type cpp -A 5 -B 2 | head -60

Repository: y-scope/velox

Length of output: 3387

🏁 Script executed:

# Check if there's a comments in code about why test4 might work despite the large value sed -n '568,620p' $(fd ClpConnectorTest.cpp)

Repository: y-scope/velox

Length of output: 2154

🏁 Script executed:

# Look for what kTestTimestampSeconds is set to rg "kTestTimestampSeconds|kTestTimestampNanoseconds" $(fd ClpConnectorTest.cpp) -B 2 -A 1

Repository: y-scope/velox

Length of output: 1726

Fix test4's timestamp precision to match test5's format and CLP IR millisecond specification.

The test4 timestamp query uses "1756003005000000" (16 digits) while test5 uses "1746003070000" (13 digits). With identical "\L" format specifiers, this inconsistency indicates a semantic error.

Given that CLP IR explicitly treats timestamp literals as millisecond precision (via SetTimestampLiteralPrecision::Milliseconds in ClpIrCursor.cpp), the correct millisecond representation for ~1756003005 seconds is 1756003005000 (13 digits, matching test5's format), not 1756003005000000 (16 digits).

test4 currently passes only because the archive contains sparse data with a single event well below any reasonable threshold. The 16-digit value either gets misinterpreted as nanoseconds by auto-detection (exceeding the 31.5 trillion nanosecond threshold) or happens to work due to the lax filter boundary. This becomes a latent bug in production when timestamp comparisons matter with edge-case or boundary values.

🔍 Suggested fix

- R"(timestamp < timestamp("1756003005000000", "\L"))"); + R"(timestamp < timestamp("1756003005000", "\L"))");

Also apply to line 659 if using similar inconsistent 16-digit values (verify test5 uses only 13-digit format).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@velox/connectors/clp/tests/ClpConnectorTest.cpp` around lines 579 - 580, The timestamp literal in test4's kqlQuery is using a 16-digit value ("1756003005000000") which contradicts CLP IR's millisecond precision (SetTimestampLiteralPrecision::Milliseconds in ClpIrCursor.cpp); update the kqlQuery string in ClpConnectorTest.cpp (symbol: kqlQuery used in test4) to the 13-digit millisecond value "1756003005000" so it matches test5's format, and scan the file for any other 16-digit timestamp literals (e.g., the similar value around line 659) and convert them to millisecond precision as well.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CMakeLists.txt`:
- Around line 497-498: You bumped fmt to 11.2.0 unconditionally which breaks
code that includes fmt/ostream.h (used in velox/common/base/Exceptions.h and
velox/common/base/Status.h); either revert or gate the version bump and migrate
the ostream-dependent formatting: in Exceptions.h and Status.h replace uses of
ostream-based fallbacks with fmt::streamed(...) where streaming semantics are
needed or provide explicit fmt::formatter<T> specializations (or implement
formatter<T> : fmt::ostream_formatter for types relying on operator<<), and
ensure the CMake change (set(fmt_SOURCE BUNDLED) / velox_resolve_dependency(fmt
...)) is only applied after all consumers are rebuilt/binary-compatible or the
migration is complete.

coderabbitai · 2026-02-18T22:21:36Z

CMakeLists.txt

+set(fmt_SOURCE BUNDLED)
+velox_resolve_dependency(fmt 11.2.0)


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check which fmt APIs are used across the codebase to assess breakage risk from 9→11 echo "=== fmt includes ===" rg -n '#include.*fmt' --type=cpp --type=h -g '!third_party' -g '!_build' | head -60 echo "" echo "=== Usage of deprecated fmt APIs (ostream.h, printf.h, format_int, BasicFormatter, BasicStringRef) ===" rg -n 'fmt/ostream\.h|fmt/printf\.h|fmt::format_int|fmt::BasicFormatter|fmt::BasicStringRef' --type=cpp --type=h -g '!third_party' -g '!_build' echo "" echo "=== Check spdlog dependency handling in CMakeLists.txt ===" rg -B3 -A3 'spdlog' CMakeLists.txt | head -40

Repository: y-scope/velox

Length of output: 4587

🏁 Script executed:

#!/bin/bash # Check the context around lines 497-498 in CMakeLists.txt echo "=== Context around fmt dependency (lines 490-510) ===" sed -n '490,510p' CMakeLists.txt echo "" echo "=== Check if fmt_SOURCE is set conditionally elsewhere ===" rg -n 'fmt_SOURCE' CMakeLists.txt echo "" echo "=== Check CLP connector block boundaries ===" rg -B5 -A5 'VELOX_ENABLE_CLP_CONNECTOR' CMakeLists.txt | head -80

Repository: y-scope/velox

Length of output: 1985

🌐 Web query:

fmt library 11.2.0 breaking changes compatibility fmt 9.0.0

💡 Result:

ABI / binary compatibility (shared library)

Not ABI-compatible: anything linked against libfmt.so.9 (fmt 9.x) will not run with fmt 11.2.0 (it ships libfmt.so.11), so you must rebuild dependents or install a distro “compat” package. (packages.fedoraproject.org)

Source compatibility: notable breaking / behavior changes from 9.0.0 → 11.2.0

fmt 9.0.0 (big one if you relied on ostream fallback)

When fmt/ostream.h is included, automatic operator<< discovery is disabled (to avoid ODR violations). Old behavior required FMT_DEPRECATED_OSTREAM (deprecated) and you should migrate to fmt::streamed or implement formatter<T> : fmt::ostream_formatter. (github.com)

fmt 10.0.0

Removed deprecated implicit enum conversions / conversions to primitive types (for std::format compatibility). Use format_as instead. (github.com)

Removed FMT_DEPRECATED_OSTREAM (so any code still depending on that macro will break). (github.com)

fmt 11.0.0

Introduced fmt/base.h and changed some internals/perf characteristics; most user code just keeps working, but it’s a new header split (useful if you want minimal includes). (raw.githubusercontent.com)

fmt::format_to when writing into a fixed array no longer overflows; output can be truncated (behavior change). (raw.githubusercontent.com)

fmt 11.2.0

Removed fmt_detail namespace (breaks code that (incorrectly) used that internal namespace). (github.com)

fmt::localtime is deprecated in favor of std::localtime. (github.com)

Some chrono timezone formatting now produces compile-time diagnostics when timezone info isn’t available (e.g. {:Z} with std::chrono::local_seconds). (github.com)

Citations:

1: https://packages.fedoraproject.org/pkgs/fmt/fmt/fedora-rawhide.html?utm_source=openai

2: https://github.com/fmtlib/fmt/releases/tag/9.0.0

3: https://github.com/fmtlib/fmt/releases/tag/10.0.0

4: https://github.com/fmtlib/fmt/releases/tag/10.0.0

5: https://raw.githubusercontent.com/fmtlib/fmt/master/ChangeLog.md

6: https://raw.githubusercontent.com/fmtlib/fmt/master/ChangeLog.md

7: https://github.com/fmtlib/fmt/releases/tag/11.2.0

8: https://github.com/fmtlib/fmt/releases/tag/11.2.0

9: https://github.com/fmtlib/fmt/releases/tag/11.2.0

Critical: fmt version bump to 11.2.0 will break code using fmt/ostream.h — must update migration pattern before merging.

The codebase includes fmt/ostream.h in velox/common/base/Exceptions.h and velox/common/base/Status.h, which are core files. The upgrade from fmt 9.0.0 to 11.2.0 introduces breaking changes to how ostream formatting works:

fmt 9.0.0 disabled automatic operator<< discovery when fmt/ostream.h is included

fmt 10.0.0 removed the FMT_DEPRECATED_OSTREAM macro entirely

Code relying on the old ostream fallback pattern will not compile or function correctly with fmt 11.2.0

The change is unconditional (not restricted to the CLP connector block), so it affects all build configurations. Before merging, either:

Migrate the ostream-dependent code in Exceptions.h and Status.h to use fmt::streamed() or implement formatter<T> : fmt::ostream_formatter, or

Confirm this version bump is intended and all necessary code migration has been completed.

Additionally, fmt 11.2.0 requires binary rebuild of all dependents (incompatible with libfmt.so.9).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@CMakeLists.txt` around lines 497 - 498, You bumped fmt to 11.2.0 unconditionally which breaks code that includes fmt/ostream.h (used in velox/common/base/Exceptions.h and velox/common/base/Status.h); either revert or gate the version bump and migrate the ostream-dependent formatting: in Exceptions.h and Status.h replace uses of ostream-based fallbacks with fmt::streamed(...) where streaming semantics are needed or provide explicit fmt::formatter<T> specializations (or implement formatter<T> : fmt::ostream_formatter for types relying on operator<<), and ensure the CMake change (set(fmt_SOURCE BUNDLED) / velox_resolve_dependency(fmt ...)) is only applied after all consumers are rebuilt/binary-compatible or the migration is complete.

20001020ycx

I have a few comments about this PR, mostly regarding to the functionality when I was doing e2e testing.

Essentially, the below SQL query has no problem and the data returns successfully:

SELECT CLP_GET_JSON_STRING() from clp.default.default  
WHERE "timestamp" BETWEEN TIMESTAMP '2023-03-27 00:41:39.863'
                  AND TIMESTAMP '2023-03-27 00:41:39.880' 
limit 100

But this query would leave the timestamp column empty, I imagine it has to do with some of my comments specified in the review, could you take a closer look if that's the case?

SELECT timestamp, CLP_GET_JSON_STRING() from clp.default.default  
WHERE "timestamp" BETWEEN TIMESTAMP '2023-03-27 00:41:39.863'
                  AND TIMESTAMP '2023-03-27 00:41:39.880' 
limit 100

The set-up of my e2e testing is listed in the validation section of this PR. Happy to help with the e2e testing if you figure out the problem, or anything to do with my set up.

20001020ycx · 2026-02-26T21:40:11Z

velox/connectors/clp/search_lib/archive/ClpArchiveCursor.cpp

  auto schemaMap = archiveReader_->get_schema_map();

+  auto const defaultTimestampPrecision{
+      archiveReader_->get_header().version < clp_s::cNewTimestampFormatVersion


Suggested change

archiveReader_->get_header().version < clp_s::cNewTimestampFormatVersion

archiveReader_->get_header().version < clp_s::cDeprecatedDateStringFormatVersionMarker

20001020ycx · 2026-02-26T21:42:08Z

velox/connectors/clp/search_lib/archive/ClpArchiveVectorLoader.cpp

 ClpArchiveVectorLoader::populateTimestampData<clp_s::NodeType::Unknown>(
    RowSet rows,
    FlatVector<facebook::velox::Timestamp>* vector);
+template void


I think the changes make sense for DeprecatedDateString. But what about the deserialization of Timestamp type, I didn't see this being added in the vector loader file.

I imagine there must be some handler for Timestamp right?

20001020ycx · 2026-02-26T21:43:22Z

velox/connectors/clp/search_lib/ClpTimestampsUtils.h

+///
+/// @param timestamp the input timestamp as an integer
+/// @return the corresponding Velox timestamp
+inline auto convertNanosecondEpochToVeloxTimestamp(clp_s::epochtime_t timestamp)


Is this called anywhere? This looks like the handler I am looking for, but probably need to call it in the VectorLoader if I understand correctly.

gibber9809 · 2026-03-02T19:27:50Z

I have a few comments about this PR, mostly regarding to the functionality when I was doing e2e testing.

Essentially, the below SQL query has no problem and the data returns successfully:
SELECT CLP_GET_JSON_STRING() from clp.default.default  
WHERE "timestamp" BETWEEN TIMESTAMP '2023-03-27 00:41:39.863'
                  AND TIMESTAMP '2023-03-27 00:41:39.880' 
limit 100
But this query would leave the timestamp column empty, I imagine it has to do with some of my comments specified in the review, could you take a closer look if that's the case?
SELECT timestamp, CLP_GET_JSON_STRING() from clp.default.default  
WHERE "timestamp" BETWEEN TIMESTAMP '2023-03-27 00:41:39.863'
                  AND TIMESTAMP '2023-03-27 00:41:39.880' 
limit 100
The set-up of my e2e testing is listed in the validation section of this PR. Happy to help with the e2e testing if you figure out the problem, or anything to do with my set up.

Yeah seems to be that I just missed cherry-picking the code that does the actual marshalling in this PR.

The new PR against the 0.297 branch is up as #54, and should fix that.

gibber9809 and others added 9 commits February 18, 2026 14:47

Update clp dependency to pull in support for new Timestamp column.

b108580

Add utility for converting nanosecond epoch timestamps to velox times…

1671230

…tamps.

Remove usage of deprecated TimestampPattern::init() method

4473c26

Handle timestamp literal precision special cases for IR and archive s…

46dedae

…earch.

Handle rename of DateString -> DeprecatedDateString.

31a5253

fix: bug fixes for e2e kql query with milisecond epoch against timest…

d81779b

…amp type

Update existing tests to use timestamp literal in pushdown.

45d166a

Update out of date query.

57ee741

Add timestamp filtering and marshalling test for v0.5.0 archives.

fd51806

gibber9809 requested a review from 20001020ycx February 18, 2026 20:39

Lint fix

aaf5d96

coderabbitai bot reviewed Feb 18, 2026

View reviewed changes

Address comment about test name.

460a88f

coderabbitai bot reviewed Feb 18, 2026

View reviewed changes

gibber9809 added 4 commits February 18, 2026 21:10

Update log-surgeon dependency to match version currently used in clp

c78f992

Fix incorrect commit hash

91ecd2f

Attempt to use fmt 11.2.0 to satisfy log-surgeon dependency.

19c7b17

Update incorrect checksum.

aa083a4

coderabbitai bot reviewed Feb 18, 2026

View reviewed changes

gibber9809 added 5 commits February 18, 2026 21:41

Attempt to get github to pick up previous commit.

59af129

Attempt to force velox to build the correct version of fmt.

845a75e

Resolve ystdlib before log surgeon.

3b7153f

Resolve ystdlib before log surgeon.

1244c47

Set spdlog version to 1.15.3

646d7b3

coderabbitai bot reviewed Feb 18, 2026

View reviewed changes

20001020ycx mentioned this pull request Feb 26, 2026

feat: Add timestamp() KQL filter pushdown; sync DeprecatedDateString rename with clp-s upstream. y-scope/presto#148

Merged

3 tasks

20001020ycx reviewed Feb 26, 2026

View reviewed changes

gibber9809 closed this Mar 2, 2026

20001020ycx mentioned this pull request Mar 2, 2026

feat: Add support for searching and marshalling the new Timestamp column type. #54

Merged

3 tasks

		const std::shared_ptr<std::string> kqlQuery = std::make_shared<std::string>(
		R"(timestamp < timestamp("1756003005000000", "\L"))");

	archiveReader_->get_header().version < clp_s::cNewTimestampFormatVersion
	archiveReader_->get_header().version < clp_s::cDeprecatedDateStringFormatVersionMarker

Conversation

gibber9809 commented Feb 18, 2026 • edited by 20001020ycx Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Validation performed

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 18, 2026

Choose a reason for hiding this comment

ABI / binary compatibility (shared library)

Source compatibility: notable breaking / behavior changes from 9.0.0 → 11.2.0

fmt 9.0.0 (big one if you relied on ostream fallback)

fmt 10.0.0

fmt 11.0.0

fmt 11.2.0

Uh oh!

20001020ycx left a comment

Choose a reason for hiding this comment

Uh oh!

20001020ycx Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

20001020ycx Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

20001020ycx Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

gibber9809 commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gibber9809 commented Feb 18, 2026 •

edited by 20001020ycx

Loading

coderabbitai bot commented Feb 18, 2026 •

edited

Loading