Skip to content

feat: Add support for searching and marshalling the new Timestamp column type.#51

Closed
gibber9809 wants to merge 20 commits intoy-scope:presto-0.293-clp-connectorfrom
gibber9809:timestamp-integration
Closed

feat: Add support for searching and marshalling the new Timestamp column type.#51
gibber9809 wants to merge 20 commits intoy-scope:presto-0.293-clp-connectorfrom
gibber9809:timestamp-integration

Conversation

@gibber9809
Copy link

@gibber9809 gibber9809 commented Feb 18, 2026

Description

This PR adds support for searching against and marshalling the new Timestamp column type, as well as updating the kv-ir and archive search path such that timestamp() literals in KQL can be used to correctly search older archives and IR streams.

For backwards compatibility, timestamp() literals are treated as millisecond precision when searching kv-ir as well as archives older than v0.5.0.

This PR will require follow-up work in presto to handle pushing down timestamps as timestamp() literals, as well as treating the new timestamp column type (node ID 14) as a database TIMESTAMP type.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

SELECT CLP_GET_JSON_STRING() from clp.default.default  
WHERE "timestamp" BETWEEN TIMESTAMP '2023-03-27 00:41:39.863'
                  AND TIMESTAMP '2023-03-27 00:41:39.880' 
limit 100
 SELECT CLP_GET_JSON_STRING()
  FROM clp.default.default
  WHERE "timestamp" >= TIMESTAMP '2023-03-27 00:41:39.863'
  LIMIT 100

Summary by CodeRabbit

  • New Features

    • KQL timestamp(...) expressions are now supported for pushdown, improving explicit-timestamp query handling.
  • Bug Fixes

    • Normalized timestamp literal precision across archive formats for consistent interpretation.
    • Added nanosecond-precision timestamp conversion with correct negative-nanosecond handling.
    • Improved error logging to include archive path when opening fails.
  • Tests

    • Added and updated tests to cover timestamp pushdown and precision behavior.
  • Chores

    • Updated several dependency pins and bumped the formatting library version.

@coderabbitai
Copy link

coderabbitai bot commented Feb 18, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Pins CLP, log_surgeon, fmt, and spdlog dependencies to specific versions/commits; removes TimestampPattern init; adds nanosecond→Velox Timestamp conversion; normalizes timestamp-literal precision in IR and archive load paths; switches DateString handling to DeprecatedDateString; updates tests to use explicit KQL timestamp(...) literals.

Changes

Cohort / File(s) Summary
Dependency Management
CMake/resolve_dependency_modules/clp.cmake, CMake/resolve_dependency_modules/log_surgeon.cmake, CMake/resolve_dependency_modules/fmt.cmake, CMake/resolve_dependency_modules/spdlog.cmake, CMakeLists.txt
Pinned clp and log_surgeon to specific commit hashes; bumped fmt to 11.2.0 (bundled) and updated checksum; bumped spdlog build version and checksum; moved log_surgeon resolution order in CMakeLists.
Connector Initialization
velox/connectors/clp/ClpConnector.cpp
Removed inclusion of clp_s/TimestampPattern.hpp and removed calls to clp_s::TimestampPattern::init() from constructors.
Timestamp Utilities
velox/connectors/clp/search_lib/ClpTimestampsUtils.h
Added clp_s/Defs.hpp include and helper convertNanosecondEpochToVeloxTimestamp(clp_s::epochtime_t) to convert nanosecond epoch values (handling negative nanos) to Velox Timestamp.
Archive Cursor & Normalization
velox/connectors/clp/search_lib/archive/ClpArchiveCursor.cpp
Improve archive-open error logging; determine default timestamp precision from archive header (ms for older format, ns for newer) and apply SetTimestampLiteralPrecision to expr_ before evaluation.
Vector Loading
velox/connectors/clp/search_lib/archive/ClpArchiveVectorLoader.cpp
Replace DateString handling with DeprecatedDateString: update switch-case, reader casts, template instantiations; add explicit instantiations for FormattedFloat and DictionaryFloat.
IR Loading
velox/connectors/clp/search_lib/ir/ClpIrCursor.cpp
Inject SetTimestampLiteralPrecision (Milliseconds) pass into IR load path so timestamp literals are normalized before creating QueryHandler.
Tests
velox/connectors/clp/tests/ClpConnectorTest.cpp
Updated tests to use timestamp("...", "\L") KQL literal forms, adjusted expected outputs, and added test5v050FloatTimestampPushdown.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant ArchiveCursor
  participant ArchiveReader
  participant Expr
  participant PrecisionPass
  participant QueryHandler
  participant VectorLoader
  participant Util

  Client->>ArchiveCursor: request query with parsed Expr
  ArchiveCursor->>ArchiveReader: open archive & read header (version)
  ArchiveCursor->>ArchiveCursor: determine defaultTimestampPrecision (ms or ns)
  ArchiveCursor->>PrecisionPass: construct with chosen precision
  PrecisionPass->>Expr: apply precision normalization to timestamp literals
  ArchiveCursor->>QueryHandler: create with normalized Expr
  QueryHandler->>VectorLoader: evaluate timestamp index / load vectors
  VectorLoader->>Util: convert nano-epoch -> Velox Timestamp (convertNanosecondEpochToVeloxTimestamp)
  VectorLoader->>QueryHandler: return populated timestamp vectors
  QueryHandler->>Client: return results
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

  • Support Timestamp Type in IR clp#1960: Implements IR-side timestamp-literal precision handling, archive normalization, and timestamp conversion utilities consistent with the issue's requested support for timestamp(...) expression handling.
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main objective of the pull request: adding support for searching and marshalling a new Timestamp column type. It is specific, clear, and reflects the primary changes across all modified files.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
velox/connectors/clp/search_lib/ClpTimestampsUtils.h (1)

1-137: ⚠️ Potential issue | 🟡 Minor

clang-format failure reported by CI.

The pipeline reports that clang-format modified this file. Please run clang-format and commit the result.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@velox/connectors/clp/search_lib/ClpTimestampsUtils.h` around lines 1 - 137,
Clang-format changed ClpTimestampsUtils.h; run clang-format on the header and
commit the formatted result to fix the CI failure—apply formatting to the file
containing estimatePrecision, both convertToVeloxTimestamp overloads, and
convertNanosecondEpochToVeloxTimestamp so the definitions and includes match
project style, then add and push the updated file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CMake/resolve_dependency_modules/clp.cmake`:
- Line 19: Add a comment next to the GIT_TAG pin for commit
f82e6114160a6addd4727259906bcf621ac9912c explaining this temporary pin and when
it can be replaced: note that the codebase relies on CLP APIs
SetTimestampLiteralPrecision, TimestampLiteral, SingleFileArchiveDefs,
cNewTimestampFormatVersion, and DeprecatedDateString which are not yet in a
tagged release, and indicate the target CLP release version (or ">= vX.Y.Z"
placeholder) to switch to once those APIs are available; place the comment
immediately above or beside the GIT_TAG line so future maintainers can easily
find and update it.

In `@velox/connectors/clp/search_lib/ClpTimestampsUtils.h`:
- Around line 129-137: The implementation of
convertNanosecondEpochToVeloxTimestamp duplicates the Nanoseconds branch of
convertToVeloxTimestamp(int64_t); replace the body to delegate to
convertToVeloxTimestamp by calling
convertToVeloxTimestamp(static_cast<int64_t>(timestamp)) to remove duplication,
and if there is an intentional reason to bypass the heuristic in
convertToVeloxTimestamp, instead keep the implementation but add a brief comment
explaining why the heuristic must be avoided for clp_s::epochtime_t inputs
(reference functions: convertNanosecondEpochToVeloxTimestamp and
convertToVeloxTimestamp).

In `@velox/connectors/clp/tests/ClpConnectorTest.cpp`:
- Around line 704-750: The test name TEST_F(ClpConnectorTest,
test5FloatTimestampPushdown) is duplicated; rename the second occurrence (the
one that uses "test_5.v0.5.0.clps") to a unique name (e.g.,
test5FloatTimestampPushdown_v0_5_0 or test5FloatTimestampPushdownArchive) so
Google Test won't generate conflicting fixtures, and ensure any mentions in that
test (the TEST_F line and any comments referring to the test) are updated to the
new name.

---

Outside diff comments:
In `@velox/connectors/clp/search_lib/ClpTimestampsUtils.h`:
- Around line 1-137: Clang-format changed ClpTimestampsUtils.h; run clang-format
on the header and commit the formatted result to fix the CI failure—apply
formatting to the file containing estimatePrecision, both
convertToVeloxTimestamp overloads, and convertNanosecondEpochToVeloxTimestamp so
the definitions and includes match project style, then add and push the updated
file.

clp
GIT_REPOSITORY https://github.com/y-scope/clp.git
GIT_TAG v0.8.0)
GIT_TAG f82e6114160a6addd4727259906bcf621ac9912c)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

🧩 Analysis chain

🏁 Script executed:

# Verify the commit hash exists in the clp repository
git ls-remote https://github.com/y-scope/clp.git | grep f82e6114160a6addd4727259906bcf621ac9912c

Repository: y-scope/velox

Length of output: 159


🏁 Script executed:

# Read the clp.cmake file to see full context
cat -n CMake/resolve_dependency_modules/clp.cmake

Repository: y-scope/velox

Length of output: 1729


🏁 Script executed:

# Search for the APIs mentioned in the review comment in the codebase
rg "DeprecatedDateString|SetTimestampLiteralPrecision|TimestampLiteral|SingleFileArchiveDefs|cNewTimestampFormatVersion" --type cpp --type h -l

Repository: y-scope/velox

Length of output: 235


🏁 Script executed:

# Search for the specific APIs mentioned in the review comment
rg "DeprecatedDateString|SetTimestampLiteralPrecision|TimestampLiteral|SingleFileArchiveDefs|cNewTimestampFormatVersion" -A 2 -B 2

Repository: y-scope/velox

Length of output: 5337


🏁 Script executed:

# Also check if there's any documentation or comments about why this specific commit is needed
rg -i "commit|pin|tag|clp" CMake/resolve_dependency_modules/ -A 2 -B 2

Repository: y-scope/velox

Length of output: 17902


Add a comment documenting when this CLP commit pin can be replaced with a tagged release.

The CLP dependency is correctly pinned to commit f82e6114160a6addd4727259906bcf621ac9912c because the required APIs (SetTimestampLiteralPrecision, TimestampLiteral, SingleFileArchiveDefs, cNewTimestampFormatVersion, DeprecatedDateString) are used throughout the codebase but not yet available in a tagged release. To improve maintainability and ensure this pin isn't forgotten, add a comment in the CMake file noting the target CLP release version for this upgrade.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CMake/resolve_dependency_modules/clp.cmake` at line 19, Add a comment next to
the GIT_TAG pin for commit f82e6114160a6addd4727259906bcf621ac9912c explaining
this temporary pin and when it can be replaced: note that the codebase relies on
CLP APIs SetTimestampLiteralPrecision, TimestampLiteral, SingleFileArchiveDefs,
cNewTimestampFormatVersion, and DeprecatedDateString which are not yet in a
tagged release, and indicate the target CLP release version (or ">= vX.Y.Z"
placeholder) to switch to once those APIs are available; place the comment
immediately above or beside the GIT_TAG line so future maintainers can easily
find and update it.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
velox/connectors/clp/search_lib/archive/ClpArchiveVectorLoader.cpp (1)

212-238: ⚠️ Potential issue | 🟠 Major

Verify CLP Timestamp type dispatch for new nanosecond-precision archive format.

The code does not call convertNanosecondEpochToVeloxTimestamp in any branch, and lacks dispatch for a potential TimestampColumnReader. Two issues need addressing:

  1. Int64ColumnReader branch (line 215): Currently calls convertToVeloxTimestamp(int64_t), which uses estimatePrecision() to heuristically classify precision. For nanosecond-precision timestamps with absolute value ≤ kEpochNanoseconds1971 (~1 year from epoch), the heuristic will misclassify them as seconds/milliseconds/microseconds. If the new Timestamp node type (ID 14) uses Int64ColumnReader with nanosecond precision, this branch should call convertNanosecondEpochToVeloxTimestamp() instead.

  2. Missing TimestampColumnReader dispatch: If CLP introduces a dedicated TimestampColumnReader for the new Timestamp type, the current dispatch has no matching check—the else clause (line 236) falls through to populateTimestampData<clp_s::NodeType::Unknown>, which silently sets all rows to null.

Confirm what reader type CLP uses for the new Timestamp node type and whether a TimestampColumnReader exists. If Int64ColumnReader is used, add a branch that calls convertNanosecondEpochToVeloxTimestamp() for nanosecond data. If a TimestampColumnReader exists, add a dispatch check for it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@velox/connectors/clp/search_lib/archive/ClpArchiveVectorLoader.cpp` around
lines 212 - 238, The Timestamp dispatch for ColumnType::Timestamp
misclassifies/ignores nanosecond-precision archive data: update the branching in
the Timestamp case in ClpArchiveVectorLoader.cpp so that when columnReader_ is
an Int64ColumnReader and the CLP node type is the new Timestamp node
(clp_s::NodeType::Timestamp) you call convertNanosecondEpochToVeloxTimestamp via
populateTimestampData (instead of convertToVeloxTimestamp/estimatePrecision),
and if CLP exposes a dedicated TimestampColumnReader add an explicit
dynamic_cast check for clp_s::TimestampColumnReader* to dispatch to the same
nanosecond converter; leave the Unknown fallback only for truly unknown node
types. Use the existing symbols populateTimestampData,
convertNanosecondEpochToVeloxTimestamp, convertToVeloxTimestamp, columnReader_,
clp_s::Int64ColumnReader, clp_s::TimestampColumnReader,
clp_s::NodeType::Timestamp and clp_s::NodeType::Unknown to locate and implement
the fixes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@velox/connectors/clp/search_lib/archive/ClpArchiveVectorLoader.cpp`:
- Around line 212-238: The Timestamp dispatch for ColumnType::Timestamp
misclassifies/ignores nanosecond-precision archive data: update the branching in
the Timestamp case in ClpArchiveVectorLoader.cpp so that when columnReader_ is
an Int64ColumnReader and the CLP node type is the new Timestamp node
(clp_s::NodeType::Timestamp) you call convertNanosecondEpochToVeloxTimestamp via
populateTimestampData (instead of convertToVeloxTimestamp/estimatePrecision),
and if CLP exposes a dedicated TimestampColumnReader add an explicit
dynamic_cast check for clp_s::TimestampColumnReader* to dispatch to the same
nanosecond converter; leave the Unknown fallback only for truly unknown node
types. Use the existing symbols populateTimestampData,
convertNanosecondEpochToVeloxTimestamp, convertToVeloxTimestamp, columnReader_,
clp_s::Int64ColumnReader, clp_s::TimestampColumnReader,
clp_s::NodeType::Timestamp and clp_s::NodeType::Unknown to locate and implement
the fixes.

---

Duplicate comments:
In `@velox/connectors/clp/search_lib/ClpTimestampsUtils.h`:
- Around line 129-138: The body of convertNanosecondEpochToVeloxTimestamp
duplicates the Nanoseconds branch of convertToVeloxTimestamp(int64_t); refactor
by delegating to that function to remove duplication: have
convertNanosecondEpochToVeloxTimestamp convert the clp_s::epochtime_t to the
same int64_t nanoseconds value and call convertToVeloxTimestamp(int64_t)
(referencing Timestamp::kNanosInSecond for the conversion) or, if bypassing the
heuristic is intentional, replace the duplicate logic with a brief clarifying
comment explaining why the direct nanoseconds handling is required (e.g., to
correctly handle pre-1971 timestamps).

In `@velox/connectors/clp/tests/ClpConnectorTest.cpp`:
- Around line 704-750: The test name TEST_F(ClpConnectorTest,
test5FloatTimestampPushdown) is duplicated; rename the second occurrence (the
one using "test_5.v0.5.0.clps") to a unique identifier (e.g.,
test5FloatTimestampPushdown_v0_5_0 or test5FloatTimestampPushdownArchive) so
GTest doesn't generate conflicting test classes; update only the TEST_F macro's
second argument in that test block (leave the fixture ClpConnectorTest and the
test body unchanged).

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@velox/connectors/clp/tests/ClpConnectorTest.cpp`:
- Around line 579-580: The timestamp literal in test4's kqlQuery is using a
16-digit value ("1756003005000000") which contradicts CLP IR's millisecond
precision (SetTimestampLiteralPrecision::Milliseconds in ClpIrCursor.cpp);
update the kqlQuery string in ClpConnectorTest.cpp (symbol: kqlQuery used in
test4) to the 13-digit millisecond value "1756003005000" so it matches test5's
format, and scan the file for any other 16-digit timestamp literals (e.g., the
similar value around line 659) and convert them to millisecond precision as
well.

Comment on lines +579 to +580
const std::shared_ptr<std::string> kqlQuery = std::make_shared<std::string>(
R"(timestamp < timestamp("1756003005000000", "\L"))");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

CLP KQL timestamp format specifier "\L" milliseconds precision

💡 Result:

In CrowdStrike LogScale / Humio “KQL” time formatting, the milliseconds component is the L time modifier:

  • L = milliseconds within the second, always zero‑padded to 3 digits (000999)—i.e., millisecond precision only. [1]
  • If you need more digits, the specifier is N = nanoseconds, zero‑padded to 9 digits (000000000999999999). [1]

Example (milliseconds shown as .SSS):

| eval ts_str = formatTime("%H:%M:%S.%L", field=@timestamp)

Note: In LogScale, the time modifiers like L are used inside formatTime() patterns as %...%L (or via format() using %tL), not as a standalone \L. [1]

Sources
[1] LogScale docs: format() / date-time modifiers (L milliseconds, N nanoseconds) (library.humio.com)

Citations:


🏁 Script executed:

# First, let's locate and examine the test file mentioned
fd ClpConnectorTest.cpp

Repository: y-scope/velox

Length of output: 104


🏁 Script executed:

# Check the exact content at lines 579-580 and 659-660
if fd ClpConnectorTest.cpp; then
  echo "=== Lines 579-580 ===" && \
  sed -n '575,585p' $(fd ClpConnectorTest.cpp)
  echo -e "\n=== Lines 659-660 ===" && \
  sed -n '655,665p' $(fd ClpConnectorTest.cpp)
fi

Repository: y-scope/velox

Length of output: 1262


🌐 Web query:

CLP KQL timestamp() function format specifiers documentation

💡 Result:

In YScope/CLP, the “timestamp formatting” you’re probably looking for is the YScope Log Viewer format-string timestamp formatter, which takes a Day.js format string as its “options”:

  • Syntax: {field:timestamp[:options]} where options is a Day.js format string. Default is YYYY-MM-DDTHH:mm:ssZ. [1]
  • Example: {ts:timestamp:YYYY-MM-DD}2024-11-27 [1] and a longer example appears in the format-string syntax docs. [2]
  • Day.js token reference (format specifiers): Day.js format() documentation lists tokens like YYYY, MM, DD, HH, mm, ss, SSS, Z, etc. [3]

If instead you meant CLP’s KQL-like search (e.g., clp-s), the docs/examples show time range filtering via numeric timestamps and a date("...") helper, not a timestamp() formatting function. [4]

Sources: [1] [2] [3] [4]


🌐 Web query:

"Compressed Log Processing" CLP KQL time format

💡 Result:

In CLP’s KQL (used by clp-s), time filtering is typically done on the ts field in either of these formats:

  • Unix epoch seconds (integer):
    • ts >= 1649923037 AND ts <= 1649923038 [1]
  • date("...") with an ISO-8601-like datetime string:
    • ts >= date("2022-04-14T07:57:17") AND ts <= date("2022-04-14T07:57:18") [1]

Note: CLP docs also state time zone information is not preserved, so avoid relying on timezone offsets in stored/queried timestamps. [1]

Source: CLP clp-s user guide (search examples + limitations). [1]


🏁 Script executed:

# Search for CLP-related documentation or comments in the codebase
rg -i "kql.*timestamp|format.*specifier|\\\L" velox/connectors/clp/ -A 3 -B 3 | head -100

Repository: y-scope/velox

Length of output: 2215


🏁 Script executed:

# Look for any CLP documentation or README
fd -e md -e txt -e rst velox/connectors/clp/ | xargs cat 2>/dev/null | head -200

Repository: y-scope/velox

Length of output: 446


🏁 Script executed:

# Search for references to "\L" format or timestamp parsing logic in CLP connector
rg "\\\\L|format.*timestamp|timestamp.*precision" velox/connectors/clp/ -i | head -50

Repository: y-scope/velox

Length of output: 3517


🏁 Script executed:

# Search for SetTimestampLiteralPrecision and how it handles format specifiers
rg "SetTimestampLiteralPrecision|TimestampLiteral.*Precision" velox/connectors/clp/ -B 5 -A 10

Repository: y-scope/velox

Length of output: 6629


🏁 Script executed:

# Look for handling of "\L" format specifier or how timestamp format is parsed
rg -i "format.*\\\\L|\\\\L.*format|timestamp.*format" velox/connectors/clp/ -B 3 -A 3

Repository: y-scope/velox

Length of output: 2943


🏁 Script executed:

# Search for estimatePrecision usage and implementation
rg "estimatePrecision" velox/connectors/clp/ -B 5 -A 10

Repository: y-scope/velox

Length of output: 4583


🏁 Script executed:

# Check if there's any logic that treats "\L" as a special case
rg "\\\\L|format.*specifier" velox/connectors/clp/search_lib/ -B 2 -A 2

Repository: y-scope/velox

Length of output: 39


🏁 Script executed:

# Look at the test file to see if there are any assertions or comments about expected behaviour
sed -n '575,610p' $(fd ClpConnectorTest.cpp)

Repository: y-scope/velox

Length of output: 1522


🏁 Script executed:

# Check the full test5 content to see the complete query and assertion
sed -n '655,690p' $(fd ClpConnectorTest.cpp)

Repository: y-scope/velox

Length of output: 1707


🏁 Script executed:

# Search for any comments or documentation about format specifiers in test file
rg -i "format|specifier|precision|\\\\L" $(fd ClpConnectorTest.cpp) -B 2 -A 2 | head -80

Repository: y-scope/velox

Length of output: 2202


🏁 Script executed:

# Check ClpTimestampsUtils.h in full to understand the thresholds better
cat velox/connectors/clp/search_lib/ClpTimestampsUtils.h | head -100

Repository: y-scope/velox

Length of output: 3740


🏁 Script executed:

# Search for how SetTimestampLiteralPrecision works - does it force or suggest precision
rg -i "SetTimestampLiteralPrecision" --type cpp -A 5 -B 2 | head -60

Repository: y-scope/velox

Length of output: 3387


🏁 Script executed:

# Check if there's a comments in code about why test4 might work despite the large value
sed -n '568,620p' $(fd ClpConnectorTest.cpp)

Repository: y-scope/velox

Length of output: 2154


🏁 Script executed:

# Look for what kTestTimestampSeconds is set to
rg "kTestTimestampSeconds|kTestTimestampNanoseconds" $(fd ClpConnectorTest.cpp) -B 2 -A 1

Repository: y-scope/velox

Length of output: 1726


Fix test4's timestamp precision to match test5's format and CLP IR millisecond specification.

The test4 timestamp query uses "1756003005000000" (16 digits) while test5 uses "1746003070000" (13 digits). With identical "\L" format specifiers, this inconsistency indicates a semantic error.

Given that CLP IR explicitly treats timestamp literals as millisecond precision (via SetTimestampLiteralPrecision::Milliseconds in ClpIrCursor.cpp), the correct millisecond representation for ~1756003005 seconds is 1756003005000 (13 digits, matching test5's format), not 1756003005000000 (16 digits).

test4 currently passes only because the archive contains sparse data with a single event well below any reasonable threshold. The 16-digit value either gets misinterpreted as nanoseconds by auto-detection (exceeding the 31.5 trillion nanosecond threshold) or happens to work due to the lax filter boundary. This becomes a latent bug in production when timestamp comparisons matter with edge-case or boundary values.

🔍 Suggested fix
-      R"(timestamp < timestamp("1756003005000000", "\L"))");
+      R"(timestamp < timestamp("1756003005000", "\L"))");

Also apply to line 659 if using similar inconsistent 16-digit values (verify test5 uses only 13-digit format).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@velox/connectors/clp/tests/ClpConnectorTest.cpp` around lines 579 - 580, The
timestamp literal in test4's kqlQuery is using a 16-digit value
("1756003005000000") which contradicts CLP IR's millisecond precision
(SetTimestampLiteralPrecision::Milliseconds in ClpIrCursor.cpp); update the
kqlQuery string in ClpConnectorTest.cpp (symbol: kqlQuery used in test4) to the
13-digit millisecond value "1756003005000" so it matches test5's format, and
scan the file for any other 16-digit timestamp literals (e.g., the similar value
around line 659) and convert them to millisecond precision as well.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CMakeLists.txt`:
- Around line 497-498: You bumped fmt to 11.2.0 unconditionally which breaks
code that includes fmt/ostream.h (used in velox/common/base/Exceptions.h and
velox/common/base/Status.h); either revert or gate the version bump and migrate
the ostream-dependent formatting: in Exceptions.h and Status.h replace uses of
ostream-based fallbacks with fmt::streamed(...) where streaming semantics are
needed or provide explicit fmt::formatter<T> specializations (or implement
formatter<T> : fmt::ostream_formatter for types relying on operator<<), and
ensure the CMake change (set(fmt_SOURCE BUNDLED) / velox_resolve_dependency(fmt
...)) is only applied after all consumers are rebuilt/binary-compatible or the
migration is complete.

Comment on lines +497 to +498
set(fmt_SOURCE BUNDLED)
velox_resolve_dependency(fmt 11.2.0)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check which fmt APIs are used across the codebase to assess breakage risk from 9→11
echo "=== fmt includes ==="
rg -n '#include.*fmt' --type=cpp --type=h -g '!third_party' -g '!_build' | head -60

echo ""
echo "=== Usage of deprecated fmt APIs (ostream.h, printf.h, format_int, BasicFormatter, BasicStringRef) ==="
rg -n 'fmt/ostream\.h|fmt/printf\.h|fmt::format_int|fmt::BasicFormatter|fmt::BasicStringRef' --type=cpp --type=h -g '!third_party' -g '!_build'

echo ""
echo "=== Check spdlog dependency handling in CMakeLists.txt ==="
rg -B3 -A3 'spdlog' CMakeLists.txt | head -40

Repository: y-scope/velox

Length of output: 4587


🏁 Script executed:

#!/bin/bash
# Check the context around lines 497-498 in CMakeLists.txt
echo "=== Context around fmt dependency (lines 490-510) ==="
sed -n '490,510p' CMakeLists.txt

echo ""
echo "=== Check if fmt_SOURCE is set conditionally elsewhere ==="
rg -n 'fmt_SOURCE' CMakeLists.txt

echo ""
echo "=== Check CLP connector block boundaries ==="
rg -B5 -A5 'VELOX_ENABLE_CLP_CONNECTOR' CMakeLists.txt | head -80

Repository: y-scope/velox

Length of output: 1985


🌐 Web query:

fmt library 11.2.0 breaking changes compatibility fmt 9.0.0

💡 Result:

ABI / binary compatibility (shared library)

  • Not ABI-compatible: anything linked against libfmt.so.9 (fmt 9.x) will not run with fmt 11.2.0 (it ships libfmt.so.11), so you must rebuild dependents or install a distro “compat” package. (packages.fedoraproject.org)

Source compatibility: notable breaking / behavior changes from 9.0.0 → 11.2.0

fmt 9.0.0 (big one if you relied on ostream fallback)

  • When fmt/ostream.h is included, automatic operator<< discovery is disabled (to avoid ODR violations). Old behavior required FMT_DEPRECATED_OSTREAM (deprecated) and you should migrate to fmt::streamed or implement formatter<T> : fmt::ostream_formatter. (github.com)

fmt 10.0.0

fmt 11.0.0

  • Introduced fmt/base.h and changed some internals/perf characteristics; most user code just keeps working, but it’s a new header split (useful if you want minimal includes). (raw.githubusercontent.com)
  • fmt::format_to when writing into a fixed array no longer overflows; output can be truncated (behavior change). (raw.githubusercontent.com)

fmt 11.2.0

Citations:


Critical: fmt version bump to 11.2.0 will break code using fmt/ostream.h — must update migration pattern before merging.

The codebase includes fmt/ostream.h in velox/common/base/Exceptions.h and velox/common/base/Status.h, which are core files. The upgrade from fmt 9.0.0 to 11.2.0 introduces breaking changes to how ostream formatting works:

  • fmt 9.0.0 disabled automatic operator<< discovery when fmt/ostream.h is included
  • fmt 10.0.0 removed the FMT_DEPRECATED_OSTREAM macro entirely
  • Code relying on the old ostream fallback pattern will not compile or function correctly with fmt 11.2.0

The change is unconditional (not restricted to the CLP connector block), so it affects all build configurations. Before merging, either:

  1. Migrate the ostream-dependent code in Exceptions.h and Status.h to use fmt::streamed() or implement formatter<T> : fmt::ostream_formatter, or
  2. Confirm this version bump is intended and all necessary code migration has been completed.

Additionally, fmt 11.2.0 requires binary rebuild of all dependents (incompatible with libfmt.so.9).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CMakeLists.txt` around lines 497 - 498, You bumped fmt to 11.2.0
unconditionally which breaks code that includes fmt/ostream.h (used in
velox/common/base/Exceptions.h and velox/common/base/Status.h); either revert or
gate the version bump and migrate the ostream-dependent formatting: in
Exceptions.h and Status.h replace uses of ostream-based fallbacks with
fmt::streamed(...) where streaming semantics are needed or provide explicit
fmt::formatter<T> specializations (or implement formatter<T> :
fmt::ostream_formatter for types relying on operator<<), and ensure the CMake
change (set(fmt_SOURCE BUNDLED) / velox_resolve_dependency(fmt ...)) is only
applied after all consumers are rebuilt/binary-compatible or the migration is
complete.

Copy link

@20001020ycx 20001020ycx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few comments about this PR, mostly regarding to the functionality when I was doing e2e testing.

Essentially, the below SQL query has no problem and the data returns successfully:

SELECT CLP_GET_JSON_STRING() from clp.default.default  
WHERE "timestamp" BETWEEN TIMESTAMP '2023-03-27 00:41:39.863'
                  AND TIMESTAMP '2023-03-27 00:41:39.880' 
limit 100

But this query would leave the timestamp column empty, I imagine it has to do with some of my comments specified in the review, could you take a closer look if that's the case?

SELECT timestamp, CLP_GET_JSON_STRING() from clp.default.default  
WHERE "timestamp" BETWEEN TIMESTAMP '2023-03-27 00:41:39.863'
                  AND TIMESTAMP '2023-03-27 00:41:39.880' 
limit 100

The set-up of my e2e testing is listed in the validation section of this PR. Happy to help with the e2e testing if you figure out the problem, or anything to do with my set up.

auto schemaMap = archiveReader_->get_schema_map();

auto const defaultTimestampPrecision{
archiveReader_->get_header().version < clp_s::cNewTimestampFormatVersion

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
archiveReader_->get_header().version < clp_s::cNewTimestampFormatVersion
archiveReader_->get_header().version < clp_s::cDeprecatedDateStringFormatVersionMarker

ClpArchiveVectorLoader::populateTimestampData<clp_s::NodeType::Unknown>(
RowSet rows,
FlatVector<facebook::velox::Timestamp>* vector);
template void

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the changes make sense for DeprecatedDateString. But what about the deserialization of Timestamp type, I didn't see this being added in the vector loader file.

I imagine there must be some handler for Timestamp right?

///
/// @param timestamp the input timestamp as an integer
/// @return the corresponding Velox timestamp
inline auto convertNanosecondEpochToVeloxTimestamp(clp_s::epochtime_t timestamp)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this called anywhere? This looks like the handler I am looking for, but probably need to call it in the VectorLoader if I understand correctly.

@gibber9809
Copy link
Author

I have a few comments about this PR, mostly regarding to the functionality when I was doing e2e testing.

Essentially, the below SQL query has no problem and the data returns successfully:

SELECT CLP_GET_JSON_STRING() from clp.default.default  
WHERE "timestamp" BETWEEN TIMESTAMP '2023-03-27 00:41:39.863'
                  AND TIMESTAMP '2023-03-27 00:41:39.880' 
limit 100

But this query would leave the timestamp column empty, I imagine it has to do with some of my comments specified in the review, could you take a closer look if that's the case?

SELECT timestamp, CLP_GET_JSON_STRING() from clp.default.default  
WHERE "timestamp" BETWEEN TIMESTAMP '2023-03-27 00:41:39.863'
                  AND TIMESTAMP '2023-03-27 00:41:39.880' 
limit 100

The set-up of my e2e testing is listed in the validation section of this PR. Happy to help with the e2e testing if you figure out the problem, or anything to do with my set up.

Yeah seems to be that I just missed cherry-picking the code that does the actual marshalling in this PR.

The new PR against the 0.297 branch is up as #54, and should fix that.

@gibber9809 gibber9809 closed this Mar 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants