feat: Add support for searching and marshalling the new Timestamp column type.#51
feat: Add support for searching and marshalling the new Timestamp column type.#51gibber9809 wants to merge 20 commits intoy-scope:presto-0.293-clp-connectorfrom
Timestamp column type.#51Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughPins CLP, log_surgeon, fmt, and spdlog dependencies to specific versions/commits; removes TimestampPattern init; adds nanosecond→Velox Timestamp conversion; normalizes timestamp-literal precision in IR and archive load paths; switches DateString handling to DeprecatedDateString; updates tests to use explicit KQL timestamp(...) literals. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant ArchiveCursor
participant ArchiveReader
participant Expr
participant PrecisionPass
participant QueryHandler
participant VectorLoader
participant Util
Client->>ArchiveCursor: request query with parsed Expr
ArchiveCursor->>ArchiveReader: open archive & read header (version)
ArchiveCursor->>ArchiveCursor: determine defaultTimestampPrecision (ms or ns)
ArchiveCursor->>PrecisionPass: construct with chosen precision
PrecisionPass->>Expr: apply precision normalization to timestamp literals
ArchiveCursor->>QueryHandler: create with normalized Expr
QueryHandler->>VectorLoader: evaluate timestamp index / load vectors
VectorLoader->>Util: convert nano-epoch -> Velox Timestamp (convertNanosecondEpochToVeloxTimestamp)
VectorLoader->>QueryHandler: return populated timestamp vectors
QueryHandler->>Client: return results
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
velox/connectors/clp/search_lib/ClpTimestampsUtils.h (1)
1-137:⚠️ Potential issue | 🟡 Minorclang-format failure reported by CI.
The pipeline reports that clang-format modified this file. Please run
clang-formatand commit the result.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@velox/connectors/clp/search_lib/ClpTimestampsUtils.h` around lines 1 - 137, Clang-format changed ClpTimestampsUtils.h; run clang-format on the header and commit the formatted result to fix the CI failure—apply formatting to the file containing estimatePrecision, both convertToVeloxTimestamp overloads, and convertNanosecondEpochToVeloxTimestamp so the definitions and includes match project style, then add and push the updated file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@CMake/resolve_dependency_modules/clp.cmake`:
- Line 19: Add a comment next to the GIT_TAG pin for commit
f82e6114160a6addd4727259906bcf621ac9912c explaining this temporary pin and when
it can be replaced: note that the codebase relies on CLP APIs
SetTimestampLiteralPrecision, TimestampLiteral, SingleFileArchiveDefs,
cNewTimestampFormatVersion, and DeprecatedDateString which are not yet in a
tagged release, and indicate the target CLP release version (or ">= vX.Y.Z"
placeholder) to switch to once those APIs are available; place the comment
immediately above or beside the GIT_TAG line so future maintainers can easily
find and update it.
In `@velox/connectors/clp/search_lib/ClpTimestampsUtils.h`:
- Around line 129-137: The implementation of
convertNanosecondEpochToVeloxTimestamp duplicates the Nanoseconds branch of
convertToVeloxTimestamp(int64_t); replace the body to delegate to
convertToVeloxTimestamp by calling
convertToVeloxTimestamp(static_cast<int64_t>(timestamp)) to remove duplication,
and if there is an intentional reason to bypass the heuristic in
convertToVeloxTimestamp, instead keep the implementation but add a brief comment
explaining why the heuristic must be avoided for clp_s::epochtime_t inputs
(reference functions: convertNanosecondEpochToVeloxTimestamp and
convertToVeloxTimestamp).
In `@velox/connectors/clp/tests/ClpConnectorTest.cpp`:
- Around line 704-750: The test name TEST_F(ClpConnectorTest,
test5FloatTimestampPushdown) is duplicated; rename the second occurrence (the
one that uses "test_5.v0.5.0.clps") to a unique name (e.g.,
test5FloatTimestampPushdown_v0_5_0 or test5FloatTimestampPushdownArchive) so
Google Test won't generate conflicting fixtures, and ensure any mentions in that
test (the TEST_F line and any comments referring to the test) are updated to the
new name.
---
Outside diff comments:
In `@velox/connectors/clp/search_lib/ClpTimestampsUtils.h`:
- Around line 1-137: Clang-format changed ClpTimestampsUtils.h; run clang-format
on the header and commit the formatted result to fix the CI failure—apply
formatting to the file containing estimatePrecision, both
convertToVeloxTimestamp overloads, and convertNanosecondEpochToVeloxTimestamp so
the definitions and includes match project style, then add and push the updated
file.
| clp | ||
| GIT_REPOSITORY https://github.com/y-scope/clp.git | ||
| GIT_TAG v0.8.0) | ||
| GIT_TAG f82e6114160a6addd4727259906bcf621ac9912c) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
🧩 Analysis chain
🏁 Script executed:
# Verify the commit hash exists in the clp repository
git ls-remote https://github.com/y-scope/clp.git | grep f82e6114160a6addd4727259906bcf621ac9912cRepository: y-scope/velox
Length of output: 159
🏁 Script executed:
# Read the clp.cmake file to see full context
cat -n CMake/resolve_dependency_modules/clp.cmakeRepository: y-scope/velox
Length of output: 1729
🏁 Script executed:
# Search for the APIs mentioned in the review comment in the codebase
rg "DeprecatedDateString|SetTimestampLiteralPrecision|TimestampLiteral|SingleFileArchiveDefs|cNewTimestampFormatVersion" --type cpp --type h -lRepository: y-scope/velox
Length of output: 235
🏁 Script executed:
# Search for the specific APIs mentioned in the review comment
rg "DeprecatedDateString|SetTimestampLiteralPrecision|TimestampLiteral|SingleFileArchiveDefs|cNewTimestampFormatVersion" -A 2 -B 2Repository: y-scope/velox
Length of output: 5337
🏁 Script executed:
# Also check if there's any documentation or comments about why this specific commit is needed
rg -i "commit|pin|tag|clp" CMake/resolve_dependency_modules/ -A 2 -B 2Repository: y-scope/velox
Length of output: 17902
Add a comment documenting when this CLP commit pin can be replaced with a tagged release.
The CLP dependency is correctly pinned to commit f82e6114160a6addd4727259906bcf621ac9912c because the required APIs (SetTimestampLiteralPrecision, TimestampLiteral, SingleFileArchiveDefs, cNewTimestampFormatVersion, DeprecatedDateString) are used throughout the codebase but not yet available in a tagged release. To improve maintainability and ensure this pin isn't forgotten, add a comment in the CMake file noting the target CLP release version for this upgrade.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@CMake/resolve_dependency_modules/clp.cmake` at line 19, Add a comment next to
the GIT_TAG pin for commit f82e6114160a6addd4727259906bcf621ac9912c explaining
this temporary pin and when it can be replaced: note that the codebase relies on
CLP APIs SetTimestampLiteralPrecision, TimestampLiteral, SingleFileArchiveDefs,
cNewTimestampFormatVersion, and DeprecatedDateString which are not yet in a
tagged release, and indicate the target CLP release version (or ">= vX.Y.Z"
placeholder) to switch to once those APIs are available; place the comment
immediately above or beside the GIT_TAG line so future maintainers can easily
find and update it.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
velox/connectors/clp/search_lib/archive/ClpArchiveVectorLoader.cpp (1)
212-238:⚠️ Potential issue | 🟠 MajorVerify CLP Timestamp type dispatch for new nanosecond-precision archive format.
The code does not call
convertNanosecondEpochToVeloxTimestampin any branch, and lacks dispatch for a potentialTimestampColumnReader. Two issues need addressing:
Int64ColumnReader branch (line 215): Currently calls
convertToVeloxTimestamp(int64_t), which usesestimatePrecision()to heuristically classify precision. For nanosecond-precision timestamps with absolute value ≤kEpochNanoseconds1971(~1 year from epoch), the heuristic will misclassify them as seconds/milliseconds/microseconds. If the new Timestamp node type (ID 14) usesInt64ColumnReaderwith nanosecond precision, this branch should callconvertNanosecondEpochToVeloxTimestamp()instead.Missing TimestampColumnReader dispatch: If CLP introduces a dedicated
TimestampColumnReaderfor the new Timestamp type, the current dispatch has no matching check—the else clause (line 236) falls through topopulateTimestampData<clp_s::NodeType::Unknown>, which silently sets all rows to null.Confirm what reader type CLP uses for the new Timestamp node type and whether a
TimestampColumnReaderexists. If Int64ColumnReader is used, add a branch that callsconvertNanosecondEpochToVeloxTimestamp()for nanosecond data. If aTimestampColumnReaderexists, add a dispatch check for it.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@velox/connectors/clp/search_lib/archive/ClpArchiveVectorLoader.cpp` around lines 212 - 238, The Timestamp dispatch for ColumnType::Timestamp misclassifies/ignores nanosecond-precision archive data: update the branching in the Timestamp case in ClpArchiveVectorLoader.cpp so that when columnReader_ is an Int64ColumnReader and the CLP node type is the new Timestamp node (clp_s::NodeType::Timestamp) you call convertNanosecondEpochToVeloxTimestamp via populateTimestampData (instead of convertToVeloxTimestamp/estimatePrecision), and if CLP exposes a dedicated TimestampColumnReader add an explicit dynamic_cast check for clp_s::TimestampColumnReader* to dispatch to the same nanosecond converter; leave the Unknown fallback only for truly unknown node types. Use the existing symbols populateTimestampData, convertNanosecondEpochToVeloxTimestamp, convertToVeloxTimestamp, columnReader_, clp_s::Int64ColumnReader, clp_s::TimestampColumnReader, clp_s::NodeType::Timestamp and clp_s::NodeType::Unknown to locate and implement the fixes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@velox/connectors/clp/search_lib/archive/ClpArchiveVectorLoader.cpp`:
- Around line 212-238: The Timestamp dispatch for ColumnType::Timestamp
misclassifies/ignores nanosecond-precision archive data: update the branching in
the Timestamp case in ClpArchiveVectorLoader.cpp so that when columnReader_ is
an Int64ColumnReader and the CLP node type is the new Timestamp node
(clp_s::NodeType::Timestamp) you call convertNanosecondEpochToVeloxTimestamp via
populateTimestampData (instead of convertToVeloxTimestamp/estimatePrecision),
and if CLP exposes a dedicated TimestampColumnReader add an explicit
dynamic_cast check for clp_s::TimestampColumnReader* to dispatch to the same
nanosecond converter; leave the Unknown fallback only for truly unknown node
types. Use the existing symbols populateTimestampData,
convertNanosecondEpochToVeloxTimestamp, convertToVeloxTimestamp, columnReader_,
clp_s::Int64ColumnReader, clp_s::TimestampColumnReader,
clp_s::NodeType::Timestamp and clp_s::NodeType::Unknown to locate and implement
the fixes.
---
Duplicate comments:
In `@velox/connectors/clp/search_lib/ClpTimestampsUtils.h`:
- Around line 129-138: The body of convertNanosecondEpochToVeloxTimestamp
duplicates the Nanoseconds branch of convertToVeloxTimestamp(int64_t); refactor
by delegating to that function to remove duplication: have
convertNanosecondEpochToVeloxTimestamp convert the clp_s::epochtime_t to the
same int64_t nanoseconds value and call convertToVeloxTimestamp(int64_t)
(referencing Timestamp::kNanosInSecond for the conversion) or, if bypassing the
heuristic is intentional, replace the duplicate logic with a brief clarifying
comment explaining why the direct nanoseconds handling is required (e.g., to
correctly handle pre-1971 timestamps).
In `@velox/connectors/clp/tests/ClpConnectorTest.cpp`:
- Around line 704-750: The test name TEST_F(ClpConnectorTest,
test5FloatTimestampPushdown) is duplicated; rename the second occurrence (the
one using "test_5.v0.5.0.clps") to a unique identifier (e.g.,
test5FloatTimestampPushdown_v0_5_0 or test5FloatTimestampPushdownArchive) so
GTest doesn't generate conflicting test classes; update only the TEST_F macro's
second argument in that test block (leave the fixture ClpConnectorTest and the
test body unchanged).
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@velox/connectors/clp/tests/ClpConnectorTest.cpp`:
- Around line 579-580: The timestamp literal in test4's kqlQuery is using a
16-digit value ("1756003005000000") which contradicts CLP IR's millisecond
precision (SetTimestampLiteralPrecision::Milliseconds in ClpIrCursor.cpp);
update the kqlQuery string in ClpConnectorTest.cpp (symbol: kqlQuery used in
test4) to the 13-digit millisecond value "1756003005000" so it matches test5's
format, and scan the file for any other 16-digit timestamp literals (e.g., the
similar value around line 659) and convert them to millisecond precision as
well.
| const std::shared_ptr<std::string> kqlQuery = std::make_shared<std::string>( | ||
| R"(timestamp < timestamp("1756003005000000", "\L"))"); |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
CLP KQL timestamp format specifier "\L" milliseconds precision
💡 Result:
In CrowdStrike LogScale / Humio “KQL” time formatting, the milliseconds component is the L time modifier:
L= milliseconds within the second, always zero‑padded to 3 digits (000–999)—i.e., millisecond precision only. [1]- If you need more digits, the specifier is
N= nanoseconds, zero‑padded to 9 digits (000000000–999999999). [1]
Example (milliseconds shown as .SSS):
| eval ts_str = formatTime("%H:%M:%S.%L", field=@timestamp)
Note: In LogScale, the time modifiers like L are used inside formatTime() patterns as %...%L (or via format() using %tL), not as a standalone \L. [1]
Sources
[1] LogScale docs: format() / date-time modifiers (L milliseconds, N nanoseconds) (library.humio.com)
Citations:
🏁 Script executed:
# First, let's locate and examine the test file mentioned
fd ClpConnectorTest.cppRepository: y-scope/velox
Length of output: 104
🏁 Script executed:
# Check the exact content at lines 579-580 and 659-660
if fd ClpConnectorTest.cpp; then
echo "=== Lines 579-580 ===" && \
sed -n '575,585p' $(fd ClpConnectorTest.cpp)
echo -e "\n=== Lines 659-660 ===" && \
sed -n '655,665p' $(fd ClpConnectorTest.cpp)
fiRepository: y-scope/velox
Length of output: 1262
🌐 Web query:
CLP KQL timestamp() function format specifiers documentation
💡 Result:
In YScope/CLP, the “timestamp formatting” you’re probably looking for is the YScope Log Viewer format-string timestamp formatter, which takes a Day.js format string as its “options”:
- Syntax:
{field:timestamp[:options]}whereoptionsis a Day.js format string. Default isYYYY-MM-DDTHH:mm:ssZ. [1] - Example:
{ts:timestamp:YYYY-MM-DD}→2024-11-27[1] and a longer example appears in the format-string syntax docs. [2] - Day.js token reference (format specifiers): Day.js
format()documentation lists tokens likeYYYY,MM,DD,HH,mm,ss,SSS,Z, etc. [3]
If instead you meant CLP’s KQL-like search (e.g., clp-s), the docs/examples show time range filtering via numeric timestamps and a date("...") helper, not a timestamp() formatting function. [4]
Sources: [1] [2] [3] [4]
🌐 Web query:
"Compressed Log Processing" CLP KQL time format
💡 Result:
In CLP’s KQL (used by clp-s), time filtering is typically done on the ts field in either of these formats:
- Unix epoch seconds (integer):
ts >= 1649923037 AND ts <= 1649923038[1]
date("...")with an ISO-8601-like datetime string:ts >= date("2022-04-14T07:57:17") AND ts <= date("2022-04-14T07:57:18")[1]
Note: CLP docs also state time zone information is not preserved, so avoid relying on timezone offsets in stored/queried timestamps. [1]
Source: CLP clp-s user guide (search examples + limitations). [1]
🏁 Script executed:
# Search for CLP-related documentation or comments in the codebase
rg -i "kql.*timestamp|format.*specifier|\\\L" velox/connectors/clp/ -A 3 -B 3 | head -100Repository: y-scope/velox
Length of output: 2215
🏁 Script executed:
# Look for any CLP documentation or README
fd -e md -e txt -e rst velox/connectors/clp/ | xargs cat 2>/dev/null | head -200Repository: y-scope/velox
Length of output: 446
🏁 Script executed:
# Search for references to "\L" format or timestamp parsing logic in CLP connector
rg "\\\\L|format.*timestamp|timestamp.*precision" velox/connectors/clp/ -i | head -50Repository: y-scope/velox
Length of output: 3517
🏁 Script executed:
# Search for SetTimestampLiteralPrecision and how it handles format specifiers
rg "SetTimestampLiteralPrecision|TimestampLiteral.*Precision" velox/connectors/clp/ -B 5 -A 10Repository: y-scope/velox
Length of output: 6629
🏁 Script executed:
# Look for handling of "\L" format specifier or how timestamp format is parsed
rg -i "format.*\\\\L|\\\\L.*format|timestamp.*format" velox/connectors/clp/ -B 3 -A 3Repository: y-scope/velox
Length of output: 2943
🏁 Script executed:
# Search for estimatePrecision usage and implementation
rg "estimatePrecision" velox/connectors/clp/ -B 5 -A 10Repository: y-scope/velox
Length of output: 4583
🏁 Script executed:
# Check if there's any logic that treats "\L" as a special case
rg "\\\\L|format.*specifier" velox/connectors/clp/search_lib/ -B 2 -A 2Repository: y-scope/velox
Length of output: 39
🏁 Script executed:
# Look at the test file to see if there are any assertions or comments about expected behaviour
sed -n '575,610p' $(fd ClpConnectorTest.cpp)Repository: y-scope/velox
Length of output: 1522
🏁 Script executed:
# Check the full test5 content to see the complete query and assertion
sed -n '655,690p' $(fd ClpConnectorTest.cpp)Repository: y-scope/velox
Length of output: 1707
🏁 Script executed:
# Search for any comments or documentation about format specifiers in test file
rg -i "format|specifier|precision|\\\\L" $(fd ClpConnectorTest.cpp) -B 2 -A 2 | head -80Repository: y-scope/velox
Length of output: 2202
🏁 Script executed:
# Check ClpTimestampsUtils.h in full to understand the thresholds better
cat velox/connectors/clp/search_lib/ClpTimestampsUtils.h | head -100Repository: y-scope/velox
Length of output: 3740
🏁 Script executed:
# Search for how SetTimestampLiteralPrecision works - does it force or suggest precision
rg -i "SetTimestampLiteralPrecision" --type cpp -A 5 -B 2 | head -60Repository: y-scope/velox
Length of output: 3387
🏁 Script executed:
# Check if there's a comments in code about why test4 might work despite the large value
sed -n '568,620p' $(fd ClpConnectorTest.cpp)Repository: y-scope/velox
Length of output: 2154
🏁 Script executed:
# Look for what kTestTimestampSeconds is set to
rg "kTestTimestampSeconds|kTestTimestampNanoseconds" $(fd ClpConnectorTest.cpp) -B 2 -A 1Repository: y-scope/velox
Length of output: 1726
Fix test4's timestamp precision to match test5's format and CLP IR millisecond specification.
The test4 timestamp query uses "1756003005000000" (16 digits) while test5 uses "1746003070000" (13 digits). With identical "\L" format specifiers, this inconsistency indicates a semantic error.
Given that CLP IR explicitly treats timestamp literals as millisecond precision (via SetTimestampLiteralPrecision::Milliseconds in ClpIrCursor.cpp), the correct millisecond representation for ~1756003005 seconds is 1756003005000 (13 digits, matching test5's format), not 1756003005000000 (16 digits).
test4 currently passes only because the archive contains sparse data with a single event well below any reasonable threshold. The 16-digit value either gets misinterpreted as nanoseconds by auto-detection (exceeding the 31.5 trillion nanosecond threshold) or happens to work due to the lax filter boundary. This becomes a latent bug in production when timestamp comparisons matter with edge-case or boundary values.
🔍 Suggested fix
- R"(timestamp < timestamp("1756003005000000", "\L"))");
+ R"(timestamp < timestamp("1756003005000", "\L"))");Also apply to line 659 if using similar inconsistent 16-digit values (verify test5 uses only 13-digit format).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@velox/connectors/clp/tests/ClpConnectorTest.cpp` around lines 579 - 580, The
timestamp literal in test4's kqlQuery is using a 16-digit value
("1756003005000000") which contradicts CLP IR's millisecond precision
(SetTimestampLiteralPrecision::Milliseconds in ClpIrCursor.cpp); update the
kqlQuery string in ClpConnectorTest.cpp (symbol: kqlQuery used in test4) to the
13-digit millisecond value "1756003005000" so it matches test5's format, and
scan the file for any other 16-digit timestamp literals (e.g., the similar value
around line 659) and convert them to millisecond precision as well.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@CMakeLists.txt`:
- Around line 497-498: You bumped fmt to 11.2.0 unconditionally which breaks
code that includes fmt/ostream.h (used in velox/common/base/Exceptions.h and
velox/common/base/Status.h); either revert or gate the version bump and migrate
the ostream-dependent formatting: in Exceptions.h and Status.h replace uses of
ostream-based fallbacks with fmt::streamed(...) where streaming semantics are
needed or provide explicit fmt::formatter<T> specializations (or implement
formatter<T> : fmt::ostream_formatter for types relying on operator<<), and
ensure the CMake change (set(fmt_SOURCE BUNDLED) / velox_resolve_dependency(fmt
...)) is only applied after all consumers are rebuilt/binary-compatible or the
migration is complete.
| set(fmt_SOURCE BUNDLED) | ||
| velox_resolve_dependency(fmt 11.2.0) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check which fmt APIs are used across the codebase to assess breakage risk from 9→11
echo "=== fmt includes ==="
rg -n '#include.*fmt' --type=cpp --type=h -g '!third_party' -g '!_build' | head -60
echo ""
echo "=== Usage of deprecated fmt APIs (ostream.h, printf.h, format_int, BasicFormatter, BasicStringRef) ==="
rg -n 'fmt/ostream\.h|fmt/printf\.h|fmt::format_int|fmt::BasicFormatter|fmt::BasicStringRef' --type=cpp --type=h -g '!third_party' -g '!_build'
echo ""
echo "=== Check spdlog dependency handling in CMakeLists.txt ==="
rg -B3 -A3 'spdlog' CMakeLists.txt | head -40Repository: y-scope/velox
Length of output: 4587
🏁 Script executed:
#!/bin/bash
# Check the context around lines 497-498 in CMakeLists.txt
echo "=== Context around fmt dependency (lines 490-510) ==="
sed -n '490,510p' CMakeLists.txt
echo ""
echo "=== Check if fmt_SOURCE is set conditionally elsewhere ==="
rg -n 'fmt_SOURCE' CMakeLists.txt
echo ""
echo "=== Check CLP connector block boundaries ==="
rg -B5 -A5 'VELOX_ENABLE_CLP_CONNECTOR' CMakeLists.txt | head -80Repository: y-scope/velox
Length of output: 1985
🌐 Web query:
fmt library 11.2.0 breaking changes compatibility fmt 9.0.0
💡 Result:
ABI / binary compatibility (shared library)
- Not ABI-compatible: anything linked against
libfmt.so.9(fmt 9.x) will not run with fmt 11.2.0 (it shipslibfmt.so.11), so you must rebuild dependents or install a distro “compat” package. (packages.fedoraproject.org)
Source compatibility: notable breaking / behavior changes from 9.0.0 → 11.2.0
fmt 9.0.0 (big one if you relied on ostream fallback)
- When
fmt/ostream.his included, automaticoperator<<discovery is disabled (to avoid ODR violations). Old behavior requiredFMT_DEPRECATED_OSTREAM(deprecated) and you should migrate tofmt::streamedor implementformatter<T> : fmt::ostream_formatter. (github.com)
fmt 10.0.0
- Removed deprecated implicit enum conversions / conversions to primitive types (for
std::formatcompatibility). Useformat_asinstead. (github.com) - Removed
FMT_DEPRECATED_OSTREAM(so any code still depending on that macro will break). (github.com)
fmt 11.0.0
- Introduced
fmt/base.hand changed some internals/perf characteristics; most user code just keeps working, but it’s a new header split (useful if you want minimal includes). (raw.githubusercontent.com) fmt::format_towhen writing into a fixed array no longer overflows; output can be truncated (behavior change). (raw.githubusercontent.com)
fmt 11.2.0
- Removed
fmt_detailnamespace (breaks code that (incorrectly) used that internal namespace). (github.com) fmt::localtimeis deprecated in favor ofstd::localtime. (github.com)- Some chrono timezone formatting now produces compile-time diagnostics when timezone info isn’t available (e.g.
{:Z}withstd::chrono::local_seconds). (github.com)
Citations:
- 1: https://packages.fedoraproject.org/pkgs/fmt/fmt/fedora-rawhide.html?utm_source=openai
- 2: https://github.com/fmtlib/fmt/releases/tag/9.0.0
- 3: https://github.com/fmtlib/fmt/releases/tag/10.0.0
- 4: https://github.com/fmtlib/fmt/releases/tag/10.0.0
- 5: https://raw.githubusercontent.com/fmtlib/fmt/master/ChangeLog.md
- 6: https://raw.githubusercontent.com/fmtlib/fmt/master/ChangeLog.md
- 7: https://github.com/fmtlib/fmt/releases/tag/11.2.0
- 8: https://github.com/fmtlib/fmt/releases/tag/11.2.0
- 9: https://github.com/fmtlib/fmt/releases/tag/11.2.0
Critical: fmt version bump to 11.2.0 will break code using fmt/ostream.h — must update migration pattern before merging.
The codebase includes fmt/ostream.h in velox/common/base/Exceptions.h and velox/common/base/Status.h, which are core files. The upgrade from fmt 9.0.0 to 11.2.0 introduces breaking changes to how ostream formatting works:
- fmt 9.0.0 disabled automatic
operator<<discovery whenfmt/ostream.his included - fmt 10.0.0 removed the
FMT_DEPRECATED_OSTREAMmacro entirely - Code relying on the old ostream fallback pattern will not compile or function correctly with fmt 11.2.0
The change is unconditional (not restricted to the CLP connector block), so it affects all build configurations. Before merging, either:
- Migrate the ostream-dependent code in Exceptions.h and Status.h to use
fmt::streamed()or implementformatter<T> : fmt::ostream_formatter, or - Confirm this version bump is intended and all necessary code migration has been completed.
Additionally, fmt 11.2.0 requires binary rebuild of all dependents (incompatible with libfmt.so.9).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@CMakeLists.txt` around lines 497 - 498, You bumped fmt to 11.2.0
unconditionally which breaks code that includes fmt/ostream.h (used in
velox/common/base/Exceptions.h and velox/common/base/Status.h); either revert or
gate the version bump and migrate the ostream-dependent formatting: in
Exceptions.h and Status.h replace uses of ostream-based fallbacks with
fmt::streamed(...) where streaming semantics are needed or provide explicit
fmt::formatter<T> specializations (or implement formatter<T> :
fmt::ostream_formatter for types relying on operator<<), and ensure the CMake
change (set(fmt_SOURCE BUNDLED) / velox_resolve_dependency(fmt ...)) is only
applied after all consumers are rebuilt/binary-compatible or the migration is
complete.
20001020ycx
left a comment
There was a problem hiding this comment.
I have a few comments about this PR, mostly regarding to the functionality when I was doing e2e testing.
Essentially, the below SQL query has no problem and the data returns successfully:
SELECT CLP_GET_JSON_STRING() from clp.default.default
WHERE "timestamp" BETWEEN TIMESTAMP '2023-03-27 00:41:39.863'
AND TIMESTAMP '2023-03-27 00:41:39.880'
limit 100
But this query would leave the timestamp column empty, I imagine it has to do with some of my comments specified in the review, could you take a closer look if that's the case?
SELECT timestamp, CLP_GET_JSON_STRING() from clp.default.default
WHERE "timestamp" BETWEEN TIMESTAMP '2023-03-27 00:41:39.863'
AND TIMESTAMP '2023-03-27 00:41:39.880'
limit 100
The set-up of my e2e testing is listed in the validation section of this PR. Happy to help with the e2e testing if you figure out the problem, or anything to do with my set up.
| auto schemaMap = archiveReader_->get_schema_map(); | ||
|
|
||
| auto const defaultTimestampPrecision{ | ||
| archiveReader_->get_header().version < clp_s::cNewTimestampFormatVersion |
There was a problem hiding this comment.
| archiveReader_->get_header().version < clp_s::cNewTimestampFormatVersion | |
| archiveReader_->get_header().version < clp_s::cDeprecatedDateStringFormatVersionMarker |
| ClpArchiveVectorLoader::populateTimestampData<clp_s::NodeType::Unknown>( | ||
| RowSet rows, | ||
| FlatVector<facebook::velox::Timestamp>* vector); | ||
| template void |
There was a problem hiding this comment.
I think the changes make sense for DeprecatedDateString. But what about the deserialization of Timestamp type, I didn't see this being added in the vector loader file.
I imagine there must be some handler for Timestamp right?
| /// | ||
| /// @param timestamp the input timestamp as an integer | ||
| /// @return the corresponding Velox timestamp | ||
| inline auto convertNanosecondEpochToVeloxTimestamp(clp_s::epochtime_t timestamp) |
There was a problem hiding this comment.
Is this called anywhere? This looks like the handler I am looking for, but probably need to call it in the VectorLoader if I understand correctly.
Yeah seems to be that I just missed cherry-picking the code that does the actual marshalling in this PR. The new PR against the 0.297 branch is up as #54, and should fix that. |
Description
This PR adds support for searching against and marshalling the new
Timestampcolumn type, as well as updating the kv-ir and archive search path such thattimestamp()literals in KQL can be used to correctly search older archives and IR streams.For backwards compatibility,
timestamp()literals are treated as millisecond precision when searching kv-ir as well as archives older than v0.5.0.This PR will require follow-up work in presto to handle pushing down timestamps as
timestamp()literals, as well as treating the new timestamp column type (node ID 14) as a databaseTIMESTAMPtype.Checklist
breaking change.
Validation performed
timestamp()literals for KQL filteringTimestampcolumn type. #51Timestampcolumn type to replaceDateStringcolumn type; Bump the archive version to0.5.0. clp#1788Summary by CodeRabbit
New Features
Bug Fixes
Tests
Chores