docs: Remove the postgresql dataset from the resources documentation (fixes #2026).#2028
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review infoConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro 📒 Files selected for processing (1)
💤 Files with no reviewable changes (1)
WalkthroughThis pull request removes the PostgreSQL dataset entry from the datasets reference table in the documentation, including its associated footnote citation. The change involves deleting three lines from a single documentation file without altering any other content or logic. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Possibly related issues
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
gibber9809
left a comment
There was a problem hiding this comment.
LGTM. PR title seems fine as well.
Description
Remove the postgresql dataset from the resources documentation page since it
uses non-ISO8601-compliant timestamps (
"2023-03-27 00:26:35.719 EDT") thatCLP-S can no longer parse after #1788 introduced stricter timestamp parsing (ISO8601 only).
The postgresql dataset's timestamps use timezone abbreviations (e.g.,
EDT)instead of numeric UTC offsets, which the new
clp_s::timestamp_parserdoes not support. This dataset can be restored once RFC 2822 / RFC 822 timezone parsing support is added.All other datasets were validated and compress successfully:
2023-03-28T04:00:00.040Z) — 141.86x compression1679711330.570420890) — 27.67xcompression
{"$date":"2023-03-21T23:34:54.576-04:00"}) — 231.02x compression1633683661085) — 7.03xcompression
--unstructured— 30.33x compression--unstructured— 61.72xcompression
Checklist
breaking change.
Validation performed
1. Build the CLP package from a clean state
Task: Build the CLP package to test dataset compression.
Command:
Output (last lines):
2. Verify postgresql fails to compress (the removed dataset)
Task: Confirm that the postgresql dataset fails with the stricter timestamp
parser.
Commands:
Output:
Error log (
compression-job-1-task-1-stderr.log):Explanation: The postgresql dataset uses timezone abbreviations (EDT) in its
timestamps, which are not ISO8601-compliant and cannot be parsed by the new
clp_s::timestamp_parser.3. Verify elasticsearch compresses successfully
Task: Confirm that elasticsearch (ISO8601
@timestampfield) compresseswithout issues.
Commands:
Output:
4. Verify spark-event-logs compresses successfully
Task: Confirm that spark-event-logs (integer epoch
Timestampfield)compresses without issues.
Commands:
Output:
5. Verify cockroachdb compresses successfully
Task: Confirm that cockroachdb (epoch float
timestampfield) compresseswithout issues.
Commands:
Output:
6. Verify mongodb compresses successfully
Task: Confirm that mongodb (ISO8601
t.$datefield in MongoDB extendedJSON) compresses without issues.
Commands:
Output:
7. Verify hive-24hr (text) compresses successfully
Task: Confirm that hive-24hr (unstructured text) compresses without issues.
Commands:
Output:
8. Verify openstack-24hr (text) compresses successfully
Task: Confirm that openstack-24hr (unstructured text) compresses without
issues.
Commands:
Output:
9. Verify docs build without warnings
Task: Ensure the updated documentation builds cleanly.
Command:
Output:
Summary by CodeRabbit