perf(logsdb): query optimization, shared code extraction, CI improvements#219
Conversation
Replace the unsound template-token pruning with a simpler, correct optimization: decompress the chunk payload and check if the needle's UTF-8 bytes exist anywhere in the raw buffer via indexOf. If the bytes aren't found, no body in the chunk can contain the needle — skip all string construction and template reconstruction. This is always correct (sound as a negative filter). Results at 500K records: - bodyContains 'error' (0 hits): 417ms → 107ms (3.9× faster) - bodyContains 'zzz_no_match': ~400ms → 87ms (4.6× faster) - bodyContains 'ssh' (133K hits): unchanged (bytes found, falls through) The raw scan uses Buffer.indexOf (SIMD-optimized in Node/V8) with a portable Uint8Array fallback for browser environments. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove the double-decode body-only → full-decode path that caused regression for high-match-rate queries (ssh: 1521ms → 613ms). New approach: raw byte scan to eliminate non-matching chunks (65-110ms), then single full decode + inline filter for matching chunks. 500K syslog benchmarks: - Non-matching body: 65-110ms (was 417ms, 4-6× faster) - Matching body 'ssh': 613ms (was 1001ms, 1.6× faster) - body+limit=100: 1ms (was 12ms, 12× faster) - time+sev+body: 407ms (was 468ms, 1.15× faster) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add 4 new test files covering: - query-correctness: body-only fast path equivalence, combined predicates, queryStream parity, limit boundaries, API signature regression - query-scale: 10K records, hit count accuracy, pruning effectiveness, stats consistency (chunksScanned + chunksPruned = total) - codec-typed-correctness: special chars, UUID/INT64/timestamp slot round-trips, structured bodies, toks metadata, mixed chunk shapes - readBodiesOnly: per-policy body-only decode correctness, structured bodies, empty/long bodies, cross-policy consistency Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
New test files: - classify.test.ts (20 tests): all BodyKind branches, TemplatedClassifier - engine-extended.test.ts (14 tests): flush edge cases, iterRecords, rowsPerChunk extremes, TypedColumnar via engine - stream-extended.test.ts (17 tests): error paths, scope/resource differentiation, complex attribute types, reference caching - compact-extended.test.ts (7 tests): codec diversity (gzip↔zstd), non-string bodies, rich metadata, TypedColumnar compaction - query-edge-cases.test.ts (16 tests): non-ASCII/CJK needles, leafGet paths, numeric bodyLeafEquals, time boundary precision, combined predicates, empty store Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ests Query engine: - Extract readRecordsFromRaw() to avoid re-decompressing payloads in the body fast path (was calling readRecords which decompress again) - Remove unused readBodiesOnly import - Cap needleCache at 64 entries to prevent unbounded memory growth - Fix biome formatting Chunk NDJSON serialization: - Fix eventName='' being silently dropped (was using truthy check) - Fix droppedAttributesCount=0 being silently dropped - Use !== undefined checks instead of truthy for both encode and decode New tests (23 tests): - ndjson-roundtrip.test.ts (12): falsy field preservation, complex bodies (null, nested object, numeric, boolean, array), full rich record pipeline round-trip - query-integration.test.ts (11): readRecordsFromRaw correctness, body fast path on NDJSON stores, resourceEquals stream pruning, queryStream generator early-termination behavior Total: 239 tests across 19 files, all passing. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds exported Possibly related PRs
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
| /** | ||
| * Template-token pruning for bodyContains. If the chunk header carries | ||
| * template literal tokens (TypedColumnarDrainPolicy stores these in | ||
| * codecMeta.toks), check if any token contains the needle as a | ||
| * substring. If no template token can match AND the chunk metadata | ||
| * confirms zero raw-string bodies, we can skip ZSTD decompression. | ||
| * | ||
| * SOUNDNESS: We can only prune when BOTH conditions hold: | ||
| * 1. No template literal token contains the needle | ||
| * 2. The chunk has zero raw-string bodies (rawCount === 0) | ||
| * Raw byte scan: check if the needle's UTF-8 bytes appear anywhere | ||
| * in the decompressed payload buffer. This is a sound negative filter: | ||
| * if the bytes aren't found, no body string in this chunk can contain | ||
| * the needle. False positives are possible (the needle bytes might | ||
| * appear in template dictionary metadata, slot type headers, etc.) | ||
| * but are rare and handled by the subsequent per-record check. |
There was a problem hiding this comment.
💡 Edge Case: Raw byte scan false negative for JSON-escaped characters in NDJSON
Even for NDJSON payloads, rawPayloadContains can produce false negatives when the bodyContains needle includes characters that get JSON-escaped (e.g., literal newlines ``, quotes ", backslashes ``). The raw bytes in the NDJSON buffer contain the escaped form (`\n`), not the original byte (`0x0A`), so a scan for the original character won't match.
This is a minor edge case because bodyContains searches in practice are almost always for printable ASCII/UTF-8 text that doesn't require JSON escaping, but it's worth documenting the limitation.
Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/o11ylogsdb/src/query.ts`:
- Around line 149-167: The raw-byte fast path (guarded by useBodyFastPath and
using rawPayloadContains before calling readRecordsFromRaw) is unsound for
policies that reconstruct or template bodies in postDecode; add an explicit
capability check on the policy (e.g., policy.allowsRawBodyScan or
!policy.postDecodeTransforms) before using useBodyFastPath, and if the policy
does not guarantee bodies are preserved as contiguous UTF-8 sequences, skip the
raw-byte scan and instead call decodeBodiesOnly or perform the full decode path
(readRecordsFromRaw) so templated/reconstructed bodies aren't incorrectly
pruned.
In `@packages/o11ylogsdb/test/compact-extended.test.ts`:
- Around line 145-163: The test "compact + query integration" is claiming to
exercise compaction but never calls the compaction routine; update the test to
actually compact stored chunks before running query by invoking the compaction
helper (e.g., compactChunk) for the relevant chunk(s) produced by the LogStore
instance (use the store variable and its chunks/IDs or API to get chunk
identifiers), then run query(store, ...) and assert on records; specifically,
after store.flush() call the compactChunk function (or store.compactChunk /
store.compactAll as available) on the flushed chunk(s) so the query truly runs
against compacted chunks.
In `@packages/o11ylogsdb/test/engine-extended.test.ts`:
- Around line 60-65: Declare lastStats with an explicit type to satisfy
noImplicitAnyLet: change the current untyped declaration for lastStats to use
the IngestStats type returned by store.append (e.g., IngestStats | undefined),
referring to the lastStats variable and the store.append(...) call in the
LogStore test so the assignment stays the same but the variable is explicitly
typed.
In `@packages/o11ylogsdb/test/query-correctness.test.ts`:
- Around line 123-138: The combined-predicate test ("combines time + severity +
bodyContains") is vulnerable to vacuous passes because it only asserts
properties inside a loop and will succeed if result.records is empty; update the
test that calls buildMultiChunkStore() and query(...) to include an explicit
cardinality assertion (e.g., expect(result.records.length).toBeGreaterThan(0) or
assert the exact expected count) before the for-loop so an empty result fails;
apply the same change to the other combined-predicate test around the query(...)
call at the later block (lines ~196-207) to assert result.records has the
expected number of entries before iterating.
In `@packages/o11ylogsdb/test/stream-extended.test.ts`:
- Around line 2-17: The test fixture fakeChunk() uses the old ChunkHeader shape
(minNano/maxNano) and is missing required fields; update fakeChunk() so its
header matches the current ChunkHeader shape in src/chunk.ts by adding
schemaVersion, timeRange (with minNano and maxNano), severityRange, resource and
scope entries, and keep payload/payloadBytes consistent; reuse the imported
Resource and InstrumentationScope types for the resource and scope fields and
ensure header.codecName, header.nLogs and header.payloadBytes remain present to
satisfy StreamRegistry/chunk consumers.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 204fa5ea-5a0d-4708-b817-9bee9dab7d2c
📒 Files selected for processing (14)
packages/o11ylogsdb/src/chunk.tspackages/o11ylogsdb/src/index.tspackages/o11ylogsdb/src/query.tspackages/o11ylogsdb/test/classify.test.tspackages/o11ylogsdb/test/codec-typed-correctness.test.tspackages/o11ylogsdb/test/compact-extended.test.tspackages/o11ylogsdb/test/engine-extended.test.tspackages/o11ylogsdb/test/ndjson-roundtrip.test.tspackages/o11ylogsdb/test/query-correctness.test.tspackages/o11ylogsdb/test/query-edge-cases.test.tspackages/o11ylogsdb/test/query-integration.test.tspackages/o11ylogsdb/test/query-scale.test.tspackages/o11ylogsdb/test/readBodiesOnly.test.tspackages/o11ylogsdb/test/stream-extended.test.ts
| describe("compact + query integration", () => { | ||
| it("query works on compacted chunks", () => { | ||
| const store = new LogStore({ | ||
| rowsPerChunk: 8, | ||
| policyFactory: () => new TypedColumnarDrainPolicy(), | ||
| }); | ||
| for (let i = 0; i < 20; i++) { | ||
| store.append(resource, scope, rec(`user ${i % 2 === 0 ? "login" : "logout"} event`, i < 10 ? 9 : 17, BigInt(i * 100))); | ||
| } | ||
| store.flush(); | ||
|
|
||
| const { records } = query(store, { bodyContains: "login", severityGte: 17 }); | ||
| expect(records.length).toBeGreaterThan(0); | ||
| for (const r of records) { | ||
| expect(r.body).toContain("login"); | ||
| expect(r.severityNumber).toBeGreaterThanOrEqual(17); | ||
| } | ||
| }); | ||
| }); |
There was a problem hiding this comment.
compact + query integration test does not exercise compaction.
Line 146 claims compacted-chunk coverage, but the test only writes to LogStore and runs query; compactChunk is never invoked. A compaction/query regression would be missed.
Minimal corrective diff (rename to match actual behavior)
-describe("compact + query integration", () => {
- it("query works on compacted chunks", () => {
+describe("query integration with typed-columnar chunks", () => {
+ it("query works on flushed typed-columnar chunks", () => {Based on learnings: Bug fixes should include a test that would fail without the fix.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/o11ylogsdb/test/compact-extended.test.ts` around lines 145 - 163,
The test "compact + query integration" is claiming to exercise compaction but
never calls the compaction routine; update the test to actually compact stored
chunks before running query by invoking the compaction helper (e.g.,
compactChunk) for the relevant chunk(s) produced by the LogStore instance (use
the store variable and its chunks/IDs or API to get chunk identifiers), then run
query(store, ...) and assert on records; specifically, after store.flush() call
the compactChunk function (or store.compactChunk / store.compactAll as
available) on the flushed chunk(s) so the query truly runs against compacted
chunks.
| it("combines time + severity + bodyContains", () => { | ||
| const { store } = buildMultiChunkStore(); | ||
| const result = query(store, { | ||
| range: { from: 1000n, to: 3000n }, // records 0..19 | ||
| severityGte: 13, | ||
| bodyContains: "error", | ||
| }); | ||
| // In the first 20 records (t=1000..2900), errors at index 0,7,14 | ||
| // severity for error records is 17, which >= 13 | ||
| for (const r of result.records) { | ||
| expect(Number(r.timeUnixNano)).toBeGreaterThanOrEqual(1000); | ||
| expect(Number(r.timeUnixNano)).toBeLessThan(3000); | ||
| expect(r.severityNumber).toBeGreaterThanOrEqual(13); | ||
| expect(r.body).toContain("error"); | ||
| } | ||
| }); |
There was a problem hiding this comment.
Prevent vacuous passes in combined-predicate tests.
Line 123 and Line 196 only validate properties inside loops; if a regression returns zero records, both tests still pass. Add explicit cardinality assertions.
Proposed fix
it("combines time + severity + bodyContains", () => {
@@
const result = query(store, {
range: { from: 1000n, to: 3000n }, // records 0..19
severityGte: 13,
bodyContains: "error",
});
+ expect(result.records.length).toBe(3);
@@
it("combined time+bodyContains fast path still correct", () => {
@@
const result = query(store, {
range: { from: 2000n, to: 4000n },
bodyContains: "gateway",
});
+ expect(result.records.length).toBe(12);Based on learnings: Bug fixes should include a test that would fail without the fix.
Also applies to: 196-207
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/o11ylogsdb/test/query-correctness.test.ts` around lines 123 - 138,
The combined-predicate test ("combines time + severity + bodyContains") is
vulnerable to vacuous passes because it only asserts properties inside a loop
and will succeed if result.records is empty; update the test that calls
buildMultiChunkStore() and query(...) to include an explicit cardinality
assertion (e.g., expect(result.records.length).toBeGreaterThan(0) or assert the
exact expected count) before the for-loop so an empty result fails; apply the
same change to the other combined-predicate test around the query(...) call at
the later block (lines ~196-207) to assert result.records has the expected
number of entries before iterating.
| import { StreamRegistry } from "../src/stream.js"; | ||
| import type { Resource, InstrumentationScope, StreamId } from "../src/types.js"; | ||
| import type { Chunk } from "../src/chunk.js"; | ||
|
|
||
| function fakeChunk(): Chunk { | ||
| return { | ||
| header: { | ||
| codecName: "ndjson+zstd-19", | ||
| nLogs: 1, | ||
| minNano: 1n, | ||
| maxNano: 2n, | ||
| codecMeta: undefined, | ||
| payloadBytes: 10, | ||
| }, | ||
| payload: new Uint8Array(10), | ||
| }; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, locate and read the chunk.ts file to understand the ChunkHeader type
find . -name "chunk.ts" -o -name "chunk.js" | head -20Repository: strawgate/o11ykit
Length of output: 133
🏁 Script executed:
# Read the chunk.ts file to see the actual type definition
cat packages/o11ylogsdb/src/chunk.tsRepository: strawgate/o11ykit
Length of output: 14934
🏁 Script executed:
# Read the test file to verify current fakeChunk() implementation
cat packages/o11ylogsdb/test/stream-extended.test.tsRepository: strawgate/o11ykit
Length of output: 7128
Update fakeChunk() to match the current ChunkHeader shape.
The fixture uses the old minNano/maxNano fields and omits required fields (schemaVersion, timeRange, severityRange, resource, scope), causing type mismatches against packages/o11ylogsdb/src/chunk.ts.
Proposed fix
-import type { Chunk } from "../src/chunk.js";
+import { CHUNK_VERSION } from "../src/chunk.js";
+import type { Chunk } from "../src/chunk.js";
@@
function fakeChunk(): Chunk {
return {
header: {
+ schemaVersion: CHUNK_VERSION,
codecName: "ndjson+zstd-19",
nLogs: 1,
- minNano: 1n,
- maxNano: 2n,
+ timeRange: { minNano: "1", maxNano: "2" },
+ severityRange: { min: 1, max: 1 },
+ resource: { attributes: [] },
+ scope: { name: "test" },
codecMeta: undefined,
payloadBytes: 10,
},
payload: new Uint8Array(10),
};
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| import { StreamRegistry } from "../src/stream.js"; | |
| import type { Resource, InstrumentationScope, StreamId } from "../src/types.js"; | |
| import type { Chunk } from "../src/chunk.js"; | |
| function fakeChunk(): Chunk { | |
| return { | |
| header: { | |
| codecName: "ndjson+zstd-19", | |
| nLogs: 1, | |
| minNano: 1n, | |
| maxNano: 2n, | |
| codecMeta: undefined, | |
| payloadBytes: 10, | |
| }, | |
| payload: new Uint8Array(10), | |
| }; | |
| import { CHUNK_VERSION } from "../src/chunk.js"; | |
| import type { Resource, InstrumentationScope, StreamId } from "../src/types.js"; | |
| import type { Chunk } from "../src/chunk.js"; | |
| function fakeChunk(): Chunk { | |
| return { | |
| header: { | |
| schemaVersion: CHUNK_VERSION, | |
| codecName: "ndjson+zstd-19", | |
| nLogs: 1, | |
| timeRange: { minNano: "1", maxNano: "2" }, | |
| severityRange: { min: 1, max: 1 }, | |
| resource: { attributes: [] }, | |
| scope: { name: "test" }, | |
| codecMeta: undefined, | |
| payloadBytes: 10, | |
| }, | |
| payload: new Uint8Array(10), | |
| }; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/o11ylogsdb/test/stream-extended.test.ts` around lines 2 - 17, The
test fixture fakeChunk() uses the old ChunkHeader shape (minNano/maxNano) and is
missing required fields; update fakeChunk() so its header matches the current
ChunkHeader shape in src/chunk.ts by adding schemaVersion, timeRange (with
minNano and maxNano), severityRange, resource and scope entries, and keep
payload/payloadBytes consistent; reuse the imported Resource and
InstrumentationScope types for the resource and scope fields and ensure
header.codecName, header.nLogs and header.payloadBytes remain present to satisfy
StreamRegistry/chunk consumers.
- Move StreamRegistry to stardb as generic StreamRegistry<C> class - logsdb and tracesdb now subclass with their Chunk types - Includes removeChunk() with stream cleanup (from tracesdb) - Includes stale ref detection (WeakMap fast path validation) - Extract bytesToHex/hexToBytes to stardb/utils.ts - Remove 9 duplicate implementations across 6 files - Lookup-table implementation for bytesToHex (consistent perf) - Add nowMillis() utility to stardb - Parallelize CI workflow: lint, typecheck, test, build run concurrently - Tests still matrix across Node 22 + 24 - Faster overall CI time (~3× with parallel jobs) - Add pre-commit hooks (husky + lint-staged) - Runs biome check on staged files only (fast) - Add knip for dead code detection - Configured for monorepo workspaces, excludes bench/ dirs - Expand coverage to all 3 engines (logsdb + tracesdb added) - Global thresholds: 65% branches, 75% functions, 70% lines - Add make test-fast and make knip targets - Add stardb stream/utils tests (16 tests) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| branches: 65, | ||
| functions: 75, | ||
| lines: 70, | ||
| statements: 70, | ||
| "packages/stardb/src/**/*.ts": { | ||
| branches: 90, | ||
| branches: 75, | ||
| functions: 100, | ||
| lines: 95, | ||
| statements: 95, | ||
| lines: 93, | ||
| statements: 93, | ||
| }, |
There was a problem hiding this comment.
💡 Quality: Coverage thresholds significantly lowered for stardb
The stardb-specific branch coverage threshold dropped from 90% → 75%, and line/statement from 95% → 93%. The old code had an explicit comment: "Hold it to a strict threshold so regressions surface here instead of whatever *db package noticed first." Since new code (StreamRegistry, utils) was added to stardb in this PR, the thresholds should ideally go up or stay flat, not drop by 15 percentage points on branches. The global thresholds also dropped substantially (branches 71→65, functions 86→75, lines 82→70). While adding more source packages to coverage tracking naturally dilutes percentages, this large a drop may mask future regressions.
Suggested fix:
Consider running `npx vitest --coverage` to see actual current coverage, then set thresholds 1-2% below the current values rather than padding with a large buffer. For stardb specifically, keeping branches ≥ 85% would preserve the original intent.
Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
packages/o11ylogsdb/src/codec-typed.ts (1)
854-859:⚠️ Potential issue | 🟠 MajorPreserve falsy sidecar fields (
""and0) during round-trip.Line 854/858 and Line 1146/1147 still gate on truthiness, so
eventName: ""anddroppedAttributesCount: 0are lost.Suggested fix
- if (r.eventName) { + if (r.eventName !== undefined) { side.e = r.eventName; sidecarHasContent = true; } - if (r.droppedAttributesCount) { + if (r.droppedAttributesCount !== undefined) { side.d = r.droppedAttributesCount; sidecarHasContent = true; } @@ - if (side.e) rec.eventName = side.e as string; - if (side.d) rec.droppedAttributesCount = side.d as number; + if (side.e !== undefined) rec.eventName = side.e as string; + if (side.d !== undefined) rec.droppedAttributesCount = side.d as number;Also applies to: 1146-1147
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/o11ylogsdb/src/codec-typed.ts` around lines 854 - 859, The checks for r.eventName and r.droppedAttributesCount use truthiness and drop valid falsy values ("" and 0); change the guards in the sidecar population to test for undefined (e.g., r.eventName !== undefined and r.droppedAttributesCount !== undefined) so empty string and zero are preserved, set side.e and side.d and update sidecarHasContent accordingly; apply the same undefined-check fix to the identical guards at the other location referenced (lines 1146/1147) so both round-trip paths keep falsy values.packages/o11ylogsdb/src/codec-columnar.ts (1)
434-439:⚠️ Potential issue | 🟠 MajorFalsy
eventName/droppedAttributesCountare still dropped.Line 434/438 and Line 581/582 should use
!== undefined; current truthy checks lose valid values ("",0).Suggested fix
- if (r.eventName) { + if (r.eventName !== undefined) { side.e = r.eventName; sidecarHasContent = true; } - if (r.droppedAttributesCount) { + if (r.droppedAttributesCount !== undefined) { side.d = r.droppedAttributesCount; sidecarHasContent = true; } @@ - if (side.e) rec.eventName = side.e; - if (side.d) rec.droppedAttributesCount = side.d; + if (side.e !== undefined) rec.eventName = side.e; + if (side.d !== undefined) rec.droppedAttributesCount = side.d;Also applies to: 581-582
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/o11ylogsdb/src/codec-columnar.ts` around lines 434 - 439, Change the truthy checks that drop valid falsy values: replace checks like "if (r.eventName)" and "if (r.droppedAttributesCount)" with explicit undefined checks ("if (r.eventName !== undefined)" and "if (r.droppedAttributesCount !== undefined)") so empty-string eventName and zero droppedAttributesCount are preserved when assigning to side.e and side.d and when updating sidecarHasContent; apply the same change to the other occurrence later in the file where r.eventName / r.droppedAttributesCount are conditionally assigned (the second block around the other checks).packages/stardb/src/index.ts (1)
16-25:⚠️ Potential issue | 🟠 MajorEnsure
@types/nodeis installed before building.The
stardbpackage manifest is correctly configured with proper exports. However, the TypeScript build fails because it cannot resolve thenodetype definitions (referenced bycodec-baseline.tsfor thenode:zlibimport). The build needs@types/nodeavailable during compilation.Once
@types/nodeis installed and the build completes, thedist/folder will be generated and all consumer imports from"stardb"will resolve correctly.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/stardb/src/index.ts` around lines 16 - 25, The TypeScript build for the stardb package fails due to missing Node type definitions referenced by codec-baseline.ts (it imports "node:zlib"); add `@types/node` as a devDependency in the package.json for the stardb package and/or ensure tsconfig.json includes "node" in the "types" array so the compiler can resolve Node ambient types during build; after adding the dependency run an install and rebuild to produce the dist/ output so exports from index.ts (e.g., StreamRegistry, bytesToHex, nowMillis) resolve for consumers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/ci.yml:
- Around line 17-24: Replace mutable action tags with immutable commit SHAs:
locate usages like actions/checkout@v6 and actions/setup-node@v6 in
.github/workflows/ci.yml and the other workflow files (publish-octo11y.yml,
release.yml, pages.yml, benchmarks.yml, actions-dist.yml) and update each
reference to the corresponding 40-character commit SHA (e.g.,
actions/checkout@<sha>, actions/setup-node@<sha>) so all third-party actions are
pinned to a specific commit; ensure you verify the correct SHA from the action's
GitHub releases/tags page and update any other `@v`<number> occurrences
consistently.
In `@knip.json`:
- Line 2: The "$schema" entry in knip.json currently points to
"https://unpkg.com/knip@latest/schema.json" which can cause schema drift; change
the URL to pin the Knip major or exact version that matches your dependency
(e.g., "https://unpkg.com/knip@6/schema.json" or
"https://unpkg.com/knip@6.7.0/schema.json") so the "$schema" value aligns with
the installed ^6.7.0 dependency.
In `@packages/stardb/src/utils.ts`:
- Around line 16-23: The hexToBytes function currently silently coerces
malformed input; update hexToBytes to validate the input string first and throw
a descriptive Error when invalid: ensure hex is a non-null string, has even
length (hex.length % 2 === 0) and contains only hex characters (0-9, a-f, A-F) —
e.g., test with a regex like /^[0-9a-fA-F]+$/ — and only then proceed to parse
each byte (preserve existing logic using Number.parseInt with radix 16); include
a clear error message mentioning "hexToBytes" and the reason (odd length or
invalid characters).
In `@packages/stardb/test/stream-utils.test.ts`:
- Around line 81-87: The test fails because StreamRegistry methods throw errors
prefixed with "StreamRegistry: ..."; update the error messages in StreamRegistry
(the throw sites used by resourceOf, scopeOf, appendChunk, chunksOf) to produce
"unknown id {id}" (i.e., remove the "StreamRegistry: " prefix) so the thrown
message exactly matches the test expectation; locate the throw statements in the
StreamRegistry class and change the message formatting to `unknown id ${id}` for
those methods.
In `@vitest.config.ts`:
- Around line 34-43: The coverage config added new package patterns
"packages/o11ylogsdb/src/**/*.ts" and "packages/o11ytracesdb/src/**/*.ts" but no
package-scoped thresholds were set; update the coverage thresholds object (the
same structure used for "packages/stardb/src/**/*.ts") to include entries for
both "packages/o11ylogsdb/src/**/*.ts" and "packages/o11ytracesdb/src/**/*.ts"
with explicit numeric values for branches, functions, lines, and statements
(e.g., mirror the style used for the stardb entry), so each package has its own
minimums and coverage drift in one package cannot be masked by another.
---
Outside diff comments:
In `@packages/o11ylogsdb/src/codec-columnar.ts`:
- Around line 434-439: Change the truthy checks that drop valid falsy values:
replace checks like "if (r.eventName)" and "if (r.droppedAttributesCount)" with
explicit undefined checks ("if (r.eventName !== undefined)" and "if
(r.droppedAttributesCount !== undefined)") so empty-string eventName and zero
droppedAttributesCount are preserved when assigning to side.e and side.d and
when updating sidecarHasContent; apply the same change to the other occurrence
later in the file where r.eventName / r.droppedAttributesCount are conditionally
assigned (the second block around the other checks).
In `@packages/o11ylogsdb/src/codec-typed.ts`:
- Around line 854-859: The checks for r.eventName and r.droppedAttributesCount
use truthiness and drop valid falsy values ("" and 0); change the guards in the
sidecar population to test for undefined (e.g., r.eventName !== undefined and
r.droppedAttributesCount !== undefined) so empty string and zero are preserved,
set side.e and side.d and update sidecarHasContent accordingly; apply the same
undefined-check fix to the identical guards at the other location referenced
(lines 1146/1147) so both round-trip paths keep falsy values.
In `@packages/stardb/src/index.ts`:
- Around line 16-25: The TypeScript build for the stardb package fails due to
missing Node type definitions referenced by codec-baseline.ts (it imports
"node:zlib"); add `@types/node` as a devDependency in the package.json for the
stardb package and/or ensure tsconfig.json includes "node" in the "types" array
so the compiler can resolve Node ambient types during build; after adding the
dependency run an install and rebuild to produce the dist/ output so exports
from index.ts (e.g., StreamRegistry, bytesToHex, nowMillis) resolve for
consumers.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 5b6fea2a-1610-49aa-8895-bb75b7dd32d5
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (19)
.github/workflows/ci.yml.husky/pre-commitMakefileknip.jsonpackage.jsonpackages/o11ylogsdb/src/chunk.tspackages/o11ylogsdb/src/codec-columnar.tspackages/o11ylogsdb/src/codec-typed.tspackages/o11ylogsdb/src/stream.tspackages/o11ytracesdb/src/chunk.tspackages/o11ytracesdb/src/correlate.tspackages/o11ytracesdb/src/query.tspackages/o11ytracesdb/src/stream.tspackages/stardb/src/index.tspackages/stardb/src/stream.tspackages/stardb/src/utils.tspackages/stardb/test/public-api.test.tspackages/stardb/test/stream-utils.test.tsvitest.config.ts
| uses: actions/checkout@v6 | ||
|
|
||
| - name: Setup Node | ||
| uses: actions/setup-node@v6 | ||
| with: | ||
| node-version: 24 | ||
| cache: npm | ||
| cache-dependency-path: package-lock.json |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify whether workflow actions are SHA-pinned (40-hex) or tag-pinned.
# Expected: third-party `uses:` lines should use @<40-hex-sha> for hardening.
echo "All action references:"
rg -nP '^\s*uses:\s*[^@]+@[^ \t]+' .github/workflows/*.yml
echo
echo "SHA-pinned action references:"
rg -nP '^\s*uses:\s*[^@]+@[a-f0-9]{40}\b' .github/workflows/*.ymlRepository: strawgate/o11ykit
Length of output: 1913
Pin all third-party GitHub Actions to commit SHAs.
All action usages throughout the repository are pinned to mutable tags (@v6, @v5, etc.) instead of immutable commit SHAs. This weakens CI supply-chain guarantees and affects not only .github/workflows/ci.yml but also publish-octo11y.yml, release.yml, pages.yml, benchmarks.yml, and actions-dist.yml. Replace all @v<number> references with @<40-character-sha>.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/ci.yml around lines 17 - 24, Replace mutable action tags
with immutable commit SHAs: locate usages like actions/checkout@v6 and
actions/setup-node@v6 in .github/workflows/ci.yml and the other workflow files
(publish-octo11y.yml, release.yml, pages.yml, benchmarks.yml, actions-dist.yml)
and update each reference to the corresponding 40-character commit SHA (e.g.,
actions/checkout@<sha>, actions/setup-node@<sha>) so all third-party actions are
pinned to a specific commit; ensure you verify the correct SHA from the action's
GitHub releases/tags page and update any other `@v`<number> occurrences
consistently.
| @@ -0,0 +1,46 @@ | |||
| { | |||
| "$schema": "https://unpkg.com/knip@latest/schema.json", | |||
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify current schema pinning and installed knip version.
# Expected: schema URL should not use `@latest` and should align with installed knip major.
echo "knip devDependency:"
jq -r '.devDependencies.knip' package.json
echo "knip schema URL:"
jq -r '.["$schema"]' knip.jsonRepository: strawgate/o11ykit
Length of output: 147
Pin Knip schema to a specific version instead of @latest.
The schema URL currently uses @latest, which can introduce silent schema drift between local development and CI/CD environments. Align the schema version with the installed Knip dependency (^6.7.0). Use a pinned major or exact version—for example, https://unpkg.com/knip@6/schema.json or https://unpkg.com/knip@6.7.0/schema.json.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@knip.json` at line 2, The "$schema" entry in knip.json currently points to
"https://unpkg.com/knip@latest/schema.json" which can cause schema drift; change
the URL to pin the Knip major or exact version that matches your dependency
(e.g., "https://unpkg.com/knip@6/schema.json" or
"https://unpkg.com/knip@6.7.0/schema.json") so the "$schema" value aligns with
the installed ^6.7.0 dependency.
| export function hexToBytes(hex: string): Uint8Array { | ||
| const len = hex.length >>> 1; | ||
| const out = new Uint8Array(len); | ||
| for (let i = 0; i < len; i++) { | ||
| out[i] = Number.parseInt(hex.slice(i * 2, i * 2 + 2), 16); | ||
| } | ||
| return out; | ||
| } |
There was a problem hiding this comment.
hexToBytes should reject malformed hex instead of silently coercing.
Current behavior accepts odd-length or non-hex content and can produce corrupted output bytes without throwing.
Suggested fix
export function hexToBytes(hex: string): Uint8Array {
+ if ((hex.length & 1) !== 0) {
+ throw new Error("hexToBytes: hex string must have even length");
+ }
const len = hex.length >>> 1;
const out = new Uint8Array(len);
for (let i = 0; i < len; i++) {
- out[i] = Number.parseInt(hex.slice(i * 2, i * 2 + 2), 16);
+ const hi = hexNibble(hex.charCodeAt(i * 2));
+ const lo = hexNibble(hex.charCodeAt(i * 2 + 1));
+ if (hi < 0 || lo < 0) throw new Error("hexToBytes: invalid hex character");
+ out[i] = (hi << 4) | lo;
}
return out;
}
+
+function hexNibble(ch: number): number {
+ if (ch >= 0x30 && ch <= 0x39) return ch - 0x30;
+ if (ch >= 0x61 && ch <= 0x66) return ch - 0x61 + 10;
+ if (ch >= 0x41 && ch <= 0x46) return ch - 0x41 + 10;
+ return -1;
+}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| export function hexToBytes(hex: string): Uint8Array { | |
| const len = hex.length >>> 1; | |
| const out = new Uint8Array(len); | |
| for (let i = 0; i < len; i++) { | |
| out[i] = Number.parseInt(hex.slice(i * 2, i * 2 + 2), 16); | |
| } | |
| return out; | |
| } | |
| export function hexToBytes(hex: string): Uint8Array { | |
| if ((hex.length & 1) !== 0) { | |
| throw new Error("hexToBytes: hex string must have even length"); | |
| } | |
| const len = hex.length >>> 1; | |
| const out = new Uint8Array(len); | |
| for (let i = 0; i < len; i++) { | |
| const hi = hexNibble(hex.charCodeAt(i * 2)); | |
| const lo = hexNibble(hex.charCodeAt(i * 2 + 1)); | |
| if (hi < 0 || lo < 0) throw new Error("hexToBytes: invalid hex character"); | |
| out[i] = (hi << 4) | lo; | |
| } | |
| return out; | |
| } | |
| function hexNibble(ch: number): number { | |
| if (ch >= 0x30 && ch <= 0x39) return ch - 0x30; | |
| if (ch >= 0x61 && ch <= 0x66) return ch - 0x61 + 10; | |
| if (ch >= 0x41 && ch <= 0x46) return ch - 0x41 + 10; | |
| return -1; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/stardb/src/utils.ts` around lines 16 - 23, The hexToBytes function
currently silently coerces malformed input; update hexToBytes to validate the
input string first and throw a descriptive Error when invalid: ensure hex is a
non-null string, has even length (hex.length % 2 === 0) and contains only hex
characters (0-9, a-f, A-F) — e.g., test with a regex like /^[0-9a-fA-F]+$/ — and
only then proceed to parse each byte (preserve existing logic using
Number.parseInt with radix 16); include a clear error message mentioning
"hexToBytes" and the reason (odd length or invalid characters).
| it("throws on unknown id", () => { | ||
| const reg = new StreamRegistry(); | ||
| expect(() => reg.resourceOf(999)).toThrow("unknown id 999"); | ||
| expect(() => reg.scopeOf(999)).toThrow("unknown id 999"); | ||
| expect(() => reg.appendChunk(999, {})).toThrow("unknown id 999"); | ||
| expect(() => reg.chunksOf(999)).toThrow("unknown id 999"); | ||
| }); |
There was a problem hiding this comment.
Error message format mismatch.
Tests expect "unknown id 999" but StreamRegistry throws "StreamRegistry: unknown id ${id}". These assertions will fail.
Proposed fix
it("throws on unknown id", () => {
const reg = new StreamRegistry();
- expect(() => reg.resourceOf(999)).toThrow("unknown id 999");
- expect(() => reg.scopeOf(999)).toThrow("unknown id 999");
- expect(() => reg.appendChunk(999, {})).toThrow("unknown id 999");
- expect(() => reg.chunksOf(999)).toThrow("unknown id 999");
+ expect(() => reg.resourceOf(999)).toThrow("StreamRegistry: unknown id 999");
+ expect(() => reg.scopeOf(999)).toThrow("StreamRegistry: unknown id 999");
+ expect(() => reg.appendChunk(999, {})).toThrow("StreamRegistry: unknown id 999");
+ expect(() => reg.chunksOf(999)).toThrow("StreamRegistry: unknown id 999");
});📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| it("throws on unknown id", () => { | |
| const reg = new StreamRegistry(); | |
| expect(() => reg.resourceOf(999)).toThrow("unknown id 999"); | |
| expect(() => reg.scopeOf(999)).toThrow("unknown id 999"); | |
| expect(() => reg.appendChunk(999, {})).toThrow("unknown id 999"); | |
| expect(() => reg.chunksOf(999)).toThrow("unknown id 999"); | |
| }); | |
| it("throws on unknown id", () => { | |
| const reg = new StreamRegistry(); | |
| expect(() => reg.resourceOf(999)).toThrow("StreamRegistry: unknown id 999"); | |
| expect(() => reg.scopeOf(999)).toThrow("StreamRegistry: unknown id 999"); | |
| expect(() => reg.appendChunk(999, {})).toThrow("StreamRegistry: unknown id 999"); | |
| expect(() => reg.chunksOf(999)).toThrow("StreamRegistry: unknown id 999"); | |
| }); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/stardb/test/stream-utils.test.ts` around lines 81 - 87, The test
fails because StreamRegistry methods throw errors prefixed with "StreamRegistry:
..."; update the error messages in StreamRegistry (the throw sites used by
resourceOf, scopeOf, appendChunk, chunksOf) to produce "unknown id {id}" (i.e.,
remove the "StreamRegistry: " prefix) so the thrown message exactly matches the
test expectation; locate the throw statements in the StreamRegistry class and
change the message formatting to `unknown id ${id}` for those methods.
| branches: 65, | ||
| functions: 75, | ||
| lines: 70, | ||
| statements: 70, | ||
| "packages/stardb/src/**/*.ts": { | ||
| branches: 90, | ||
| branches: 75, | ||
| functions: 100, | ||
| lines: 95, | ||
| statements: 95, | ||
| lines: 93, | ||
| statements: 93, | ||
| }, |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Verify which package-scoped coverage thresholds are currently configured.
rg -n -C2 '"packages/.*/src/\*\*/\*\.ts": \{' vitest.config.tsRepository: strawgate/o11ykit
Length of output: 210
🏁 Script executed:
cat -n vitest.config.ts | head -50Repository: strawgate/o11ykit
Length of output: 2004
Add package-scoped thresholds for newly included packages.
Lines 24-25 add packages/o11ylogsdb/src/**/*.ts and packages/o11ytracesdb/src/**/*.ts to coverage without scoped thresholds. Add explicit thresholds for both so coverage drift in one package cannot be masked by another.
Suggested config
thresholds: {
branches: 65,
functions: 75,
lines: 70,
statements: 70,
"packages/stardb/src/**/*.ts": {
branches: 75,
functions: 100,
lines: 93,
statements: 93,
},
+ "packages/o11ylogsdb/src/**/*.ts": {
+ branches: 65,
+ functions: 75,
+ lines: 70,
+ statements: 70,
+ },
+ "packages/o11ytracesdb/src/**/*.ts": {
+ branches: 65,
+ functions: 75,
+ lines: 70,
+ statements: 70,
+ },
},📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| branches: 65, | |
| functions: 75, | |
| lines: 70, | |
| statements: 70, | |
| "packages/stardb/src/**/*.ts": { | |
| branches: 90, | |
| branches: 75, | |
| functions: 100, | |
| lines: 95, | |
| statements: 95, | |
| lines: 93, | |
| statements: 93, | |
| }, | |
| branches: 65, | |
| functions: 75, | |
| lines: 70, | |
| statements: 70, | |
| "packages/stardb/src/**/*.ts": { | |
| branches: 75, | |
| functions: 100, | |
| lines: 93, | |
| statements: 93, | |
| }, | |
| "packages/o11ylogsdb/src/**/*.ts": { | |
| branches: 65, | |
| functions: 75, | |
| lines: 70, | |
| statements: 70, | |
| }, | |
| "packages/o11ytracesdb/src/**/*.ts": { | |
| branches: 65, | |
| functions: 75, | |
| lines: 70, | |
| statements: 70, | |
| }, |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@vitest.config.ts` around lines 34 - 43, The coverage config added new package
patterns "packages/o11ylogsdb/src/**/*.ts" and
"packages/o11ytracesdb/src/**/*.ts" but no package-scoped thresholds were set;
update the coverage thresholds object (the same structure used for
"packages/stardb/src/**/*.ts") to include entries for both
"packages/o11ylogsdb/src/**/*.ts" and "packages/o11ytracesdb/src/**/*.ts" with
explicit numeric values for branches, functions, lines, and statements (e.g.,
mirror the style used for the stardb entry), so each package has its own
minimums and coverage drift in one package cannot be masked by another.
…stardb Deep shared code extraction — net -620 LOC across engines: **ByteBuf + ByteReader (stardb/binary.ts)** - Generic growable write buffer and sequential reader - Supports u8/u16/u32/u64, float64, unsigned varint, signed zigzag varint, length-prefixed strings, section bookkeeping - Replaces 3 copies (2 in logsdb codecs, 1 in tracesdb codec) - ~400 LOC of duplication eliminated **Chunk wire format (stardb/chunk-wire.ts)** - Generic serialize/deserialize parameterized by magic bytes - Both logsdb (OLDB) and tracesdb (OTDB) now delegate to it - Consistent error handling, buffer-offset-safe DataView **FNV-1a byte hash (stardb/utils.ts)** - fnv1aBytes() replaces copy in tracesdb bloom.ts - bytesEqual() utility also added **Codec utils (o11ylogsdb/codec-utils.ts)** - anyValueToJson/jsonToAnyValue — removed 3 identical copies - extractVarsAgainstTemplate — removed 3 identical copies - Internal dedup within logsdb (shared by chunk + 3 codecs) **Additional hex cleanup** - Removed hexFromBytes duplicate in tracesdb/query.ts - Removed bufToHex duplicate in tracesdb/bloom.ts - Removed nowMillis duplicate in logsdb/query.ts Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| readU16(): number { | ||
| const v = this.view.getUint16(this.pos, true); | ||
| this.pos += 2; | ||
| return v; | ||
| } | ||
|
|
||
| readU32(): number { | ||
| const v = this.view.getUint32(this.pos, true); | ||
| this.pos += 4; | ||
| return v; |
There was a problem hiding this comment.
💡 Edge Case: ByteReader.readU16/readU32/readFloat64 missing bounds checks
readU8() and readU64() have explicit bounds checks that throw a descriptive RangeError("ByteReader: unexpected end of buffer"), but readU16() (line 146), readU32() (line 152), and readFloat64() (line 166) do not. On truncated input, the DataView will throw a generic RangeError from the engine (e.g., "Offset is outside the bounds of the DataView") with no context about the reader position or buffer size.
This is inconsistent with the other read methods and makes debugging truncated-chunk issues harder. readU32 in particular is used in the hot readSection() path and in chunk-wire deserialization.
The old ByteCursor code had explicit bounds checks on every read method.
Suggested fix:
readU16(): number {
if (this.pos + 2 > this.buf.length)
throw new RangeError("ByteReader: unexpected end of buffer");
const v = this.view.getUint16(this.pos, true);
this.pos += 2;
return v;
}
// Same pattern for readU32 (+4) and readFloat64 (+8)
Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion
|
Note Your trial team has used its Gitar budget, so automatic reviews are paused. Upgrade now to unlock full capacity. Comment "Gitar review" to trigger a review manually. Code Review 👍 Approved with suggestions 0 resolved / 3 findingsQuery optimization and CI improvements effectively centralize shared logic. Please address missing bounds checks in ByteReader, potential false negatives for JSON-escaped NDJSON characters, and the reduced stardb coverage thresholds. 💡 Edge Case: Raw byte scan false negative for JSON-escaped characters in NDJSON📄 packages/o11ylogsdb/src/query.ts:266-272 📄 packages/o11ylogsdb/src/query.ts:149-163 📄 packages/o11ylogsdb/src/query.ts:266-280 Even for NDJSON payloads, This is a minor edge case because 💡 Quality: Coverage thresholds significantly lowered for stardbThe stardb-specific branch coverage threshold dropped from 90% → 75%, and line/statement from 95% → 93%. The old code had an explicit comment: "Hold it to a strict threshold so regressions surface here instead of whatever *db package noticed first." Since new code (StreamRegistry, utils) was added to stardb in this PR, the thresholds should ideally go up or stay flat, not drop by 15 percentage points on branches. The global thresholds also dropped substantially (branches 71→65, functions 86→75, lines 82→70). While adding more source packages to coverage tracking naturally dilutes percentages, this large a drop may mask future regressions. Suggested fix💡 Edge Case: ByteReader.readU16/readU32/readFloat64 missing bounds checks📄 packages/stardb/src/binary.ts:146-155 📄 packages/stardb/src/binary.ts:166-170 readU8() and readU64() have explicit bounds checks that throw a descriptive This is inconsistent with the other read methods and makes debugging truncated-chunk issues harder. readU32 in particular is used in the hot The old Suggested fix🤖 Prompt for agentsOptionsDisplay: compact → Showing less information. Comment with these commands to change:
Important Your trial ends in 5 days — upgrade now to keep code review, CI analysis, auto-apply, custom automations, and more. Was this helpful? React with 👍 / 👎 | Gitar |
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (2)
packages/stardb/src/utils.ts (1)
16-22:⚠️ Potential issue | 🟠 MajorReject malformed hex instead of silently coercing it.
Odd-length input is truncated here, and non-hex pairs end up as
0bytes viaUint8Arraycoercion. Since this helper is now shared by chunk decoders, malformed IDs get silently corrupted instead of failing fast.Proposed fix
export function hexToBytes(hex: string): Uint8Array { + if ((hex.length & 1) !== 0) { + throw new Error("hexToBytes: hex string must have even length"); + } const len = hex.length >>> 1; const out = new Uint8Array(len); for (let i = 0; i < len; i++) { - out[i] = Number.parseInt(hex.slice(i * 2, i * 2 + 2), 16); + const byte = hex.slice(i * 2, i * 2 + 2); + if (!/^[0-9a-fA-F]{2}$/.test(byte)) { + throw new Error("hexToBytes: invalid hex character"); + } + out[i] = Number.parseInt(byte, 16); } return out; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/stardb/src/utils.ts` around lines 16 - 22, The hexToBytes function currently silently truncates odd-length strings and produces 0 bytes for non-hex input; update hexToBytes to validate input first by (1) rejecting odd-length hex strings and (2) ensuring the input contains only 0-9a-fA-F characters (or validate each parsed pair yields a valid number), and throw a clear error when validation fails; modify the hexToBytes implementation (the exported function hexToBytes) to perform these checks before creating the Uint8Array and parsing pairs so malformed IDs fail fast.packages/o11ylogsdb/src/query.ts (1)
150-168:⚠️ Potential issue | 🔴 CriticalGate the raw-byte prune behind a policy capability.
This miss-path is still unsound for policies that reconstruct bodies from columns/templates.
ColumnarDrainPolicy/TypedColumnarDrainPolicycan match after decode even whenneedlenever appears contiguously inraw, and NDJSON bodies can also miss escaped substrings. A false negative here drops the whole chunk. Restrict this optimization to payloads that preserve body bytes verbatim, otherwise fall back to the normal decode path.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/o11ylogsdb/src/query.ts` around lines 150 - 168, The raw-byte fast-path (useBodyFastPath branch) can produce false negatives for policies that reconstruct bodies (e.g., ColumnarDrainPolicy, TypedColumnarDrainPolicy) or for formats that escape/transform body bytes; before running rawPayloadContains(raw, needle) or calling codec.decode + readRecordsFromRaw, gate this optimization on a clear policy/codec capability (e.g., policy.preservesBodyBytes() or a similar flag on the policy/codec) and skip to the full decode path when that capability is absent; update the check that sets useBodyFastPath (and the conditional around rawPayloadContains) to consult that capability so only policies/codecs that guarantee verbatim body bytes use the raw-byte prune.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/o11ylogsdb/src/codec-utils.ts`:
- Around line 32-41: jsonToAnyValue currently treats any object with a $bi or $b
key as a sentinel and eagerly converts it, which can corrupt normal objects and
throw on invalid inputs; change the logic in jsonToAnyValue to only treat an
object as a sentinel when the object has exactly one own property named "$bi" or
"$b", the value is a string, and the string passes simple validation (e.g. $bi
matches /^-?\d+$/ before BigInt() and $b matches hex /^[0-9a-fA-F]+$/ with even
length before hexToBytes()); if validation fails, fall back to decoding the
object as a normal map; also wrap BigInt conversion in try/catch to avoid
crashes and return the object map on error; refer to the jsonToAnyValue function
and the sentinel keys "$bi" and "$b" for where to update the checks.
In `@packages/stardb/src/binary.ts`:
- Around line 21-30: The ensure method can infinite-loop when initialCapacity is
0 because newCap starts at 0; update the constructor or ensure to enforce a
minimum capacity (e.g., set initialCapacity to at least 1 or ensure newCap =
Math.max(1, this.buf.length) before doubling). Specifically modify the
constructor (constructor(initialCapacity = 4096) / this.buf) or the ensure
method (ensure(needed: number) / newCap calculation) so that newCap is
initialized to a positive value before the while loop, then proceed with
doubling until newCap >= this.pos + needed.
- Around line 83-93: writeZigzagVarint accepts unbounded bigint but assumes
signed 64-bit inputs and pre-allocates 10 bytes, which can overflow for values
outside int64; add an explicit range check at the start of writeZigzagVarint to
verify value is within [-2n**63n, 2n**63n - 1n] and throw a RangeError if not,
then proceed with ensure(10) and the existing zigzag/varint encoding logic
(refer to the writeZigzagVarint function name and the constants used for
shifting).
- Around line 175-187: readUvarint currently uses 32-bit bitwise ops which
silently truncate high bits; change it to accumulate using arithmetic (e.g., add
(byte & 0x7f) * 2**shift or multiply by Math.pow(2, shift)) instead of |= and <<
so all bits are preserved during the loop in readUvarint, and then enforce
strict validation: reject overlong/overflowing encodings by checking when you
reach the 5th byte (shift >= 28) that any high bits beyond 32 bits are zero and
throw a RangeError for varint overflow or malformed overlong encodings, and also
ensure you reject any continuation bytes beyond the maximum length; reference
readUvarint (and ensure compatibility with writeUvarint’s uint32 contract).
---
Duplicate comments:
In `@packages/o11ylogsdb/src/query.ts`:
- Around line 150-168: The raw-byte fast-path (useBodyFastPath branch) can
produce false negatives for policies that reconstruct bodies (e.g.,
ColumnarDrainPolicy, TypedColumnarDrainPolicy) or for formats that
escape/transform body bytes; before running rawPayloadContains(raw, needle) or
calling codec.decode + readRecordsFromRaw, gate this optimization on a clear
policy/codec capability (e.g., policy.preservesBodyBytes() or a similar flag on
the policy/codec) and skip to the full decode path when that capability is
absent; update the check that sets useBodyFastPath (and the conditional around
rawPayloadContains) to consult that capability so only policies/codecs that
guarantee verbatim body bytes use the raw-byte prune.
In `@packages/stardb/src/utils.ts`:
- Around line 16-22: The hexToBytes function currently silently truncates
odd-length strings and produces 0 bytes for non-hex input; update hexToBytes to
validate input first by (1) rejecting odd-length hex strings and (2) ensuring
the input contains only 0-9a-fA-F characters (or validate each parsed pair
yields a valid number), and throw a clear error when validation fails; modify
the hexToBytes implementation (the exported function hexToBytes) to perform
these checks before creating the Uint8Array and parsing pairs so malformed IDs
fail fast.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 2ac0e19d-e80a-4f4a-8ea9-3338c9a8b92c
📒 Files selected for processing (17)
packages/o11ylogsdb/src/chunk.tspackages/o11ylogsdb/src/codec-columnar.tspackages/o11ylogsdb/src/codec-drain.tspackages/o11ylogsdb/src/codec-typed.tspackages/o11ylogsdb/src/codec-utils.tspackages/o11ylogsdb/src/query.tspackages/o11ylogsdb/test/chunk.test.tspackages/o11ytracesdb/src/bloom.tspackages/o11ytracesdb/src/chunk.tspackages/o11ytracesdb/src/codec-columnar.tspackages/o11ytracesdb/src/query.tspackages/o11ytracesdb/test/wire-format-edge-cases.test.tspackages/stardb/src/binary.tspackages/stardb/src/chunk-wire.tspackages/stardb/src/index.tspackages/stardb/src/utils.tspackages/stardb/test/public-api.test.ts
| constructor(initialCapacity = 4096) { | ||
| this.buf = new Uint8Array(initialCapacity); | ||
| this.view = new DataView(this.buf.buffer); | ||
| } | ||
|
|
||
| ensure(needed: number): void { | ||
| if (this.pos + needed <= this.buf.length) return; | ||
| let newCap = this.buf.length * 2; | ||
| while (newCap < this.pos + needed) newCap *= 2; | ||
| const next = new Uint8Array(newCap); |
There was a problem hiding this comment.
Handle zero-capacity buffers to avoid growth-loop lockup
If initialCapacity is 0, Line 28 sets newCap to 0; Line 29 then loops forever. Guard minimum capacity before doubling.
Proposed fix
constructor(initialCapacity = 4096) {
- this.buf = new Uint8Array(initialCapacity);
+ const cap = Math.max(1, initialCapacity);
+ this.buf = new Uint8Array(cap);
this.view = new DataView(this.buf.buffer);
}
ensure(needed: number): void {
if (this.pos + needed <= this.buf.length) return;
- let newCap = this.buf.length * 2;
+ let newCap = Math.max(1, this.buf.length * 2);
while (newCap < this.pos + needed) newCap *= 2;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| constructor(initialCapacity = 4096) { | |
| this.buf = new Uint8Array(initialCapacity); | |
| this.view = new DataView(this.buf.buffer); | |
| } | |
| ensure(needed: number): void { | |
| if (this.pos + needed <= this.buf.length) return; | |
| let newCap = this.buf.length * 2; | |
| while (newCap < this.pos + needed) newCap *= 2; | |
| const next = new Uint8Array(newCap); | |
| constructor(initialCapacity = 4096) { | |
| const cap = Math.max(1, initialCapacity); | |
| this.buf = new Uint8Array(cap); | |
| this.view = new DataView(this.buf.buffer); | |
| } | |
| ensure(needed: number): void { | |
| if (this.pos + needed <= this.buf.length) return; | |
| let newCap = Math.max(1, this.buf.length * 2); | |
| while (newCap < this.pos + needed) newCap *= 2; | |
| const next = new Uint8Array(newCap); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/stardb/src/binary.ts` around lines 21 - 30, The ensure method can
infinite-loop when initialCapacity is 0 because newCap starts at 0; update the
constructor or ensure to enforce a minimum capacity (e.g., set initialCapacity
to at least 1 or ensure newCap = Math.max(1, this.buf.length) before doubling).
Specifically modify the constructor (constructor(initialCapacity = 4096) /
this.buf) or the ensure method (ensure(needed: number) / newCap calculation) so
that newCap is initialized to a positive value before the while loop, then
proceed with doubling until newCap >= this.pos + needed.
| writeZigzagVarint(value: bigint): void { | ||
| this.ensure(10); | ||
| const zz = (value << 1n) ^ (value >> 63n); | ||
| let v = zz; | ||
| do { | ||
| let byte = Number(v & 0x7fn); | ||
| v >>= 7n; | ||
| if (v > 0n) byte |= 0x80; | ||
| this.buf[this.pos++] = byte; | ||
| } while (v > 0n); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
For JavaScript BigInt, does right-shift (>>) use sign extension with unbounded precision (not fixed 64-bit width)?
💡 Result:
Yes, the right-shift operator (>>) on JavaScript BigInt uses sign extension with unbounded precision, not fixed 64-bit width. For positive BigInts, it shifts in zeros from the left (infinite leading zeros). For negative BigInts, it shifts in ones from the left (infinite leading ones), preserving the sign and allowing arbitrary-precision arithmetic shifts.
Citations:
- 1: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Right_shift
- 2: https://github.com/mdn/content/blob/main/files/en-us/web/javascript/reference/operators/right_shift/index.md
- 3: https://github.com/tc39/proposal-bigint/blob/master/spec.html
- 4: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/bigint
🏁 Script executed:
# Check the method context and any type annotations
cd packages/stardb/src && head -n 100 binary.ts | tail -n 30Repository: strawgate/o11ykit
Length of output: 759
🏁 Script executed:
# Search for any type constraints or documentation about writeZigzagVarint
rg -A 5 -B 5 "writeZigzagVarint" packages/stardb/src/Repository: strawgate/o11ykit
Length of output: 699
🏁 Script executed:
# Check if there are tests for this method
fd -e test.ts -e spec.ts -e tests.ts packages/stardb | head -5Repository: strawgate/o11ykit
Length of output: 427
🏁 Script executed:
# Look for any other zigzag methods or related varint code
rg "zigzag|varint" packages/stardb/src/ -iRepository: strawgate/o11ykit
Length of output: 1217
🏁 Script executed:
# Check for usages of writeZigzagVarint
rg "writeZigzagVarint" packages/stardb/src/ -B 2 -A 2Repository: strawgate/o11ykit
Length of output: 380
🏁 Script executed:
# Check the readZigzagVarint to see how it handles bounds
rg -A 15 "readZigzagVarint" packages/stardb/src/binary.tsRepository: strawgate/o11ykit
Length of output: 606
🏁 Script executed:
# Look at the ByteBuf class structure to understand context
sed -n '1,100p' packages/stardb/src/binary.tsRepository: strawgate/o11ykit
Length of output: 2553
🏁 Script executed:
# Search for where writeZigzagVarint is called
rg -r "$0" --type ts "writeZigzagVarint\(" packages/stardb/Repository: strawgate/o11ykit
Length of output: 125
🏁 Script executed:
# Look for tests related to binary.ts
fd --full-path "packages/stardb.*test|spec" . -type f | head -10Repository: strawgate/o11ykit
Length of output: 232
🏁 Script executed:
# Calculate max bytes needed for zigzag-encoded int64
python3 <<'EOF'
# For a signed int64, range is [-2^63, 2^63-1]
# Zigzag encoding: (n << 1) ^ (n >> 63)
# For n = -2^63: zz = (-2^63 << 1) ^ (-2^63 >> 63)
# = -2^64 ^ -1 = 2^64 - 1
# For n = 2^63-1: zz = ((2^63-1) << 1) ^ 0 = 2^64 - 2
# In varint, each byte carries 7 bits. Max value 2^64-1 needs:
import math
max_val = (1 << 64) - 1
bytes_needed = math.ceil(math.log2(max_val + 1) / 7)
print(f"Max value for int64 zigzag: {max_val}")
print(f"Bytes needed in varint: {bytes_needed}")
# What about values OUTSIDE int64 range?
out_of_range = 1 << 65
bytes_for_out_of_range = math.ceil(math.log2(out_of_range + 1) / 7)
print(f"\nOut-of-range value (2^65): {out_of_range}")
print(f"Bytes needed in varint: {bytes_for_out_of_range}")
EOFRepository: strawgate/o11ykit
Length of output: 213
🏁 Script executed:
# Check if readZigzagVarint max 70-bit shift aligns with int64
python3 <<'EOF'
# readZigzagVarint throws if shift > 70n
# In varint, 10 bytes can carry: 10 * 7 = 70 bits
# So 10 bytes max shift is 70 bits, which encodes values up to 2^70 - 1
# But after zigzag decode: (result >> 1) ^ -(result & 1)
# For 2^70-1 >> 1 ≈ 2^69, which when decoded back could be larger than int64
# Let's verify int64 zigzag bounds
min_i64 = -(1 << 63)
max_i64 = (1 << 63) - 1
# Zigzag encode
zz_min = ((min_i64 << 1) ^ (min_i64 >> 63)) & ((1 << 64) - 1)
zz_max = ((max_i64 << 1) ^ (max_i64 >> 63)) & ((1 << 64) - 1)
print(f"Int64 range: [{min_i64}, {max_i64}]")
print(f"Zigzag range: [{zz_min}, {zz_max}]")
print(f"Zigzag max bits: {zz_max.bit_length()}")
# Varint encoding needs ceiling(bits/7) bytes
import math
bytes_needed = math.ceil(zz_max.bit_length() / 7)
print(f"Bytes needed for max int64 zigzag: {bytes_needed}")
EOFRepository: strawgate/o11ykit
Length of output: 235
Add int64 bounds validation to writeZigzagVarint
The method accepts unbounded bigint but uses zigzag encoding designed for signed 64-bit integers and pre-allocates exactly 10 bytes. Out-of-range inputs (e.g., 2^70) produce encoded values requiring more than 10 bytes, causing buffer overflow. Add explicit int64 range validation:
Fix
writeZigzagVarint(value: bigint): void {
+ const MIN_I64 = -(1n << 63n);
+ const MAX_I64 = (1n << 63n) - 1n;
+ if (value < MIN_I64 || value > MAX_I64) {
+ throw new RangeError("ByteBuf: zigzag varint value out of int64 range");
+ }
this.ensure(10);
const zz = (value << 1n) ^ (value >> 63n);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| writeZigzagVarint(value: bigint): void { | |
| this.ensure(10); | |
| const zz = (value << 1n) ^ (value >> 63n); | |
| let v = zz; | |
| do { | |
| let byte = Number(v & 0x7fn); | |
| v >>= 7n; | |
| if (v > 0n) byte |= 0x80; | |
| this.buf[this.pos++] = byte; | |
| } while (v > 0n); | |
| } | |
| writeZigzagVarint(value: bigint): void { | |
| const MIN_I64 = -(1n << 63n); | |
| const MAX_I64 = (1n << 63n) - 1n; | |
| if (value < MIN_I64 || value > MAX_I64) { | |
| throw new RangeError("ByteBuf: zigzag varint value out of int64 range"); | |
| } | |
| this.ensure(10); | |
| const zz = (value << 1n) ^ (value >> 63n); | |
| let v = zz; | |
| do { | |
| let byte = Number(v & 0x7fn); | |
| v >>= 7n; | |
| if (v > 0n) byte |= 0x80; | |
| this.buf[this.pos++] = byte; | |
| } while (v > 0n); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/stardb/src/binary.ts` around lines 83 - 93, writeZigzagVarint
accepts unbounded bigint but assumes signed 64-bit inputs and pre-allocates 10
bytes, which can overflow for values outside int64; add an explicit range check
at the start of writeZigzagVarint to verify value is within [-2n**63n, 2n**63n -
1n] and throw a RangeError if not, then proceed with ensure(10) and the existing
zigzag/varint encoding logic (refer to the writeZigzagVarint function name and
the constants used for shifting).
| readUvarint(): number { | ||
| let result = 0; | ||
| let shift = 0; | ||
| let byte: number; | ||
| do { | ||
| if (this.pos >= this.buf.length) throw new RangeError("ByteReader: unexpected end of buffer"); | ||
| byte = this.buf[this.pos++] as number; | ||
| result |= (byte & 0x7f) << shift; | ||
| shift += 7; | ||
| if (shift > 35) throw new RangeError("ByteReader: varint overflow"); | ||
| } while (byte & 0x80); | ||
| return result >>> 0; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In JavaScript/TypeScript, are bitwise operators (|, <<) evaluated using 32-bit signed integers, and can this truncate values above 32 bits?
💡 Result:
Yes, in JavaScript and TypeScript, bitwise operators such as | (bitwise OR) and << (left shift) evaluate their operands as 32-bit signed integers using the ToInt32 abstract operation, which truncates values above 32 bits by taking the number modulo 2^32 and adjusting for the sign bit if necessary. This effectively discards bits beyond the 32nd bit. For example: - Numbers larger than 2^32 - 1 have higher bits discarded: 2^32 + 1 becomes 1 after ToInt32. - Left shifts << use the operand converted to 32-bit signed int and shift by (right operand modulo 32) bits. - The result is always a Number representing a 32-bit signed integer value (-2^31 to 2^31 - 1). This behavior is identical in TypeScript since it compiles to JavaScript. For arbitrary-precision bitwise operations, use BigInt literals (e.g., 1n | 2n), but standard number bitwise ops remain 32-bit. Example demonstrating truncation: console.log((1n << 32n) === 0n); // true for BigInt, but for numbers: console.log(4294967296 | 1); // 1 (2^32 | 1 becomes 0 | 1 = 1) console.log(5 << 33); // 10 (shift amount 33 % 32 = 1)
Citations:
- 1: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Bitwise_Operators
- 2: https://www.w3schools.com/js/js_bitwise.asp
- 3: https://redrockcode.com/docs/javascript/developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Bitwise_Operators.html
- 4: https://tc39.es/ecma262/pr/3770/multipage/abstract-operations.html
- 5: https://tc39.es/ecma262/multipage/ecmascript-data-types-and-values.html
- 6: https://www.tektutorialshub.com/typescript/bitwise-operators-in-typescript/
🏁 Script executed:
# Check the overall structure and how readUvarint is used
cd packages/stardb/src && wc -l binary.ts && head -20 binary.tsRepository: strawgate/o11ykit
Length of output: 652
🏁 Script executed:
# Search for readUvarint usage in the codebase
rg "readUvarint" --type ts --type js -B 2 -A 2Repository: strawgate/o11ykit
Length of output: 19690
🏁 Script executed:
# Check if there are tests for readUvarint
rg "readUvarint|ByteReader" --type ts --type js --iglob "*test*" --iglob "*spec*"Repository: strawgate/o11ykit
Length of output: 117
🏁 Script executed:
# Examine the full context around readUvarint and writeUvarint
sed -n '170,200p' packages/stardb/src/binary.tsRepository: strawgate/o11ykit
Length of output: 935
🏁 Script executed:
# Check writeUvarint to understand the expected range
sed -n '160,175p' packages/stardb/src/binary.tsRepository: strawgate/o11ykit
Length of output: 414
🏁 Script executed:
# Look for writeUvarint implementation and tests
sed -n '1,100p' packages/stardb/src/binary.ts | grep -A 15 "writeUvarint"Repository: strawgate/o11ykit
Length of output: 466
🏁 Script executed:
# Search for any tests of readUvarint/writeUvarint
rg "readUvarint|writeUvarint" --type ts -B 2 -A 5 packages/stardb/testRepository: strawgate/o11ykit
Length of output: 43
🏁 Script executed:
# Check if there's any documentation or comments about varint bounds
rg "varint|32-bit|overflow" --type ts packages/stardb/src/binary.ts -B 1 -A 1Repository: strawgate/o11ykit
Length of output: 1196
Reject overflowing uvarints instead of silently truncating them
Line 182 uses 32-bit bitwise operations (|=, <<), which silently truncate values during the loop before the overflow check at line 185 can trigger. A 5-byte varint encoding a value > 2^31 will lose high bits. Although writeUvarint() pre-conditions input to uint32, malformed input (accidental or adversarial) can cause data corruption.
Use arithmetic operations instead of bitwise ops to preserve all bits, and validate the 5th byte more aggressively to reject overlong encodings.
Proposed fix
readUvarint(): number {
- let result = 0;
- let shift = 0;
- let byte: number;
- do {
- if (this.pos >= this.buf.length) throw new RangeError("ByteReader: unexpected end of buffer");
- byte = this.buf[this.pos++] as number;
- result |= (byte & 0x7f) << shift;
- shift += 7;
- if (shift > 35) throw new RangeError("ByteReader: varint overflow");
- } while (byte & 0x80);
- return result >>> 0;
+ let result = 0;
+ for (let i = 0; i < 5; i++) {
+ if (this.pos >= this.buf.length) throw new RangeError("ByteReader: unexpected end of buffer");
+ const byte = this.buf[this.pos++] as number;
+ if (i === 4 && (byte & 0xf0) !== 0) throw new RangeError("ByteReader: varint overflow");
+ result += (byte & 0x7f) * 2 ** (7 * i);
+ if ((byte & 0x80) === 0) return result;
+ }
+ throw new RangeError("ByteReader: varint overflow");
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/stardb/src/binary.ts` around lines 175 - 187, readUvarint currently
uses 32-bit bitwise ops which silently truncate high bits; change it to
accumulate using arithmetic (e.g., add (byte & 0x7f) * 2**shift or multiply by
Math.pow(2, shift)) instead of |= and << so all bits are preserved during the
loop in readUvarint, and then enforce strict validation: reject
overlong/overflowing encodings by checking when you reach the 5th byte (shift >=
28) that any high bits beyond 32 bits are zero and throw a RangeError for varint
overflow or malformed overlong encodings, and also ensure you reject any
continuation bytes beyond the maximum length; reference readUvarint (and ensure
compatibility with writeUvarint’s uint32 contract).
Coverage: - binary.ts: 1.8% → 100% (28 comprehensive tests) - chunk-wire.ts: 6.5% → 100% (14 tests: round-trip, errors, size) - any-value.ts: new, 100% (28 tests for JSON/equals/findAttribute) - utils.ts: 54.8% → 96.8% (fnv1aBytes + bytesEqual tests) - Total stardb: 131 tests across 7 test files New shared code in stardb: - anyValueToJson/jsonToAnyValue: moved from logsdb/codec-utils.ts - anyValueEquals: moved from tracesdb/query.ts (48 LOC) - findAttribute: new helper for KeyValue[] lookup by key - StreamKey type: consolidated from logsdb + tracesdb types.ts Dedup fixes: - tracesdb/query.ts: removed local bytesEqual + anyValueEquals (~55 LOC) - logsdb/compact.ts: removed local nowMillis (imports from stardb) - logsdb/codec-utils.ts: re-exports from stardb instead of reimplementing - logsdb/types.ts + tracesdb/types.ts: import StreamKey from stardb All 1129 tests pass, lint clean, typecheck clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/o11ylogsdb/src/types.ts`:
- Around line 18-29: The module header comment incorrectly implies StreamKey is
logs-specific; update the top-of-file documentation in types.ts to state that
StreamKey is re-exported from "stardb" (not defined by the o11y logs API),
remove or reword any phrasing that claims StreamKey is logs-only or owned by
this package, and clarify that the exported types (AnyValue,
InstrumentationScope, KeyValue, Resource, SeverityText, StreamId, StreamKey)
include a StreamKey alias sourced from stardb so consumers understand its
origin.
In `@packages/stardb/src/any-value.ts`:
- Around line 22-24: The object maps created with {} in
packages/stardb/src/any-value.ts can be polluted by keys like "__proto__";
replace those plain object literals with null-prototype objects (e.g., use
Object.create(null)) wherever maps are built—specifically in the block that
creates out and assigns anyValueToJson(val) for each Object.entries(v) and in
the other map construction around lines 39-41—to ensure keys cannot mutate the
prototype chain while preserving the existing assignment logic and return
values.
- Around line 36-37: The decoder currently treats any object with properties $bi
or $b as a tagged scalar, which breaks maps that include those keys alongside
others; update the checks in the decoding logic (the code referencing obj.$bi
and obj.$b) to only decode as a tagged scalar when the object contains exactly
that single key (e.g., Object.keys(obj).length === 1 && typeof obj.$bi ===
"string" for the BigInt branch, and similarly for $b using hexToBytes) so maps
like { "$bi": "x", other: 1 } are preserved as maps.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: f40e3dd7-130b-4b83-ae84-b74c83314e89
📒 Files selected for processing (13)
packages/o11ylogsdb/src/codec-utils.tspackages/o11ylogsdb/src/compact.tspackages/o11ylogsdb/src/types.tspackages/o11ytracesdb/src/query.tspackages/o11ytracesdb/src/types.tspackages/stardb/src/any-value.tspackages/stardb/src/index.tspackages/stardb/src/types.tspackages/stardb/test/any-value.test.tspackages/stardb/test/binary.test.tspackages/stardb/test/chunk-wire.test.tspackages/stardb/test/public-api.test.tspackages/stardb/test/stream-utils.test.ts
| StreamKey, | ||
| } from "stardb"; | ||
|
|
||
| export type { AnyValue, InstrumentationScope, KeyValue, Resource, SeverityText, StreamId }; | ||
| export type { | ||
| AnyValue, | ||
| InstrumentationScope, | ||
| KeyValue, | ||
| Resource, | ||
| SeverityText, | ||
| StreamId, | ||
| StreamKey, | ||
| }; |
There was a problem hiding this comment.
Update module header docs to match the new StreamKey source.
After re-exporting StreamKey from stardb, the top comment still implies StreamKey is logs-specific. Please align the comment to avoid API ownership confusion.
Suggested doc-only patch
- * paths. The remaining types in this file (LogRecord, BodyKind,
- * StreamKey) are specific to the logs engine.
+ * paths. The remaining types in this file (LogRecord, BodyKind)
+ * are specific to the logs engine.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/o11ylogsdb/src/types.ts` around lines 18 - 29, The module header
comment incorrectly implies StreamKey is logs-specific; update the top-of-file
documentation in types.ts to state that StreamKey is re-exported from "stardb"
(not defined by the o11y logs API), remove or reword any phrasing that claims
StreamKey is logs-only or owned by this package, and clarify that the exported
types (AnyValue, InstrumentationScope, KeyValue, Resource, SeverityText,
StreamId, StreamKey) include a StreamKey alias sourced from stardb so consumers
understand its origin.
| const out: Record<string, unknown> = {}; | ||
| for (const [k, val] of Object.entries(v)) out[k] = anyValueToJson(val); | ||
| return out; |
There was a problem hiding this comment.
Use null-prototype objects to avoid __proto__ key corruption/pollution.
Line 22-24 and Line 39-41 build maps with {} and assign arbitrary keys. Keys like "__proto__" can mutate object prototype and break data fidelity/security expectations.
Proposed fix
if (typeof v === "object") {
- const out: Record<string, unknown> = {};
+ const out = Object.create(null) as Record<string, unknown>;
for (const [k, val] of Object.entries(v)) out[k] = anyValueToJson(val);
return out;
}- const out: Record<string, AnyValue> = {};
+ const out = Object.create(null) as Record<string, AnyValue>;
for (const [k, v] of Object.entries(obj)) out[k] = jsonToAnyValue(v);
return out;Also applies to: 39-41
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/stardb/src/any-value.ts` around lines 22 - 24, The object maps
created with {} in packages/stardb/src/any-value.ts can be polluted by keys like
"__proto__"; replace those plain object literals with null-prototype objects
(e.g., use Object.create(null)) wherever maps are built—specifically in the
block that creates out and assigns anyValueToJson(val) for each
Object.entries(v) and in the other map construction around lines 39-41—to ensure
keys cannot mutate the prototype chain while preserving the existing assignment
logic and return values.
| if (typeof obj.$bi === "string") return BigInt(obj.$bi); | ||
| if (typeof obj.$b === "string") return hexToBytes(obj.$b); |
There was a problem hiding this comment.
Prevent $bi / $b tag collisions when decoding maps.
Line 36 and Line 37 currently treat any object containing $bi or $b as a tagged scalar, so maps like { "$bi": "x", other: 1 } cannot round-trip correctly.
Proposed fix
export function jsonToAnyValue(j: unknown): AnyValue {
if (j === null) return null;
if (typeof j === "object" && j !== null) {
const obj = j as Record<string, unknown>;
- if (typeof obj.$bi === "string") return BigInt(obj.$bi);
- if (typeof obj.$b === "string") return hexToBytes(obj.$b);
+ const keys = Object.keys(obj);
+ if (keys.length === 1 && typeof obj.$bi === "string") return BigInt(obj.$bi);
+ if (keys.length === 1 && typeof obj.$b === "string") return hexToBytes(obj.$b);
if (Array.isArray(j)) return j.map(jsonToAnyValue);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/stardb/src/any-value.ts` around lines 36 - 37, The decoder currently
treats any object with properties $bi or $b as a tagged scalar, which breaks
maps that include those keys alongside others; update the checks in the decoding
logic (the code referencing obj.$bi and obj.$b) to only decode as a tagged
scalar when the object contains exactly that single key (e.g.,
Object.keys(obj).length === 1 && typeof obj.$bi === "string" for the BigInt
branch, and similarly for $b using hexToBytes) so maps like { "$bi": "x", other:
1 } are preserved as maps.
New stardb utilities: - timeRangeOverlaps(min, max, from?, to?): shared chunk pruning predicate Used by logsdb/query.ts and tracesdb/query.ts - bytesToUuid(bytes): format 16 bytes as 8-4-4-4-12 UUID string - uuidToBytes(str): parse UUID (with or without dashes) into 16 bytes - Upgraded hexLookup table to match the hot-path-optimized version - hexToBytes now uses hexNibble (faster than parseInt for hex parsing) Sidecar NDJSON consolidation (logsdb internal): - encodeSidecar() in codec-utils.ts replaces 2 identical 50-line blocks in codec-columnar.ts and codec-typed.ts - applySidecar() + SidecarEntry type replaces 2 identical decode blocks - Net ~100 LOC removed from codec files UUID helper consolidation (logsdb): - Removed BYTE_TO_HEX table, hexNibble, bytesToUuid, bytesToUuidNodash, uuidToBytes, uuidNodashToBytes from codec-typed.ts (~100 LOC) - bytesToUuidNodash → bytesToHex (identical for 16-byte input) - uuidNodashToBytes → hexToBytes (identical for 32-char hex) Tests: 15 new tests (UUID round-trip, timeRangeOverlaps edge cases) All 1144 tests pass, typecheck clean, lint clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace .find((a) => a.key === ...) patterns with shared findAttribute() in query.ts, aggregate.ts, and correlate.ts. Removes ~30 LOC of attribute lookup boilerplate and removes unused KeyValue import from aggregate.ts. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 6
♻️ Duplicate comments (3)
packages/o11ylogsdb/src/query.ts (1)
150-164:⚠️ Potential issue | 🔴 CriticalRaw-byte pruning is unsound for templated body encodings.
The raw byte scan at line 160 checks if
needleexists in the decompressed payload. For templated policies (Drain, ColumnarDrain, TypedColumnarDrain), the final body is reconstructed by combining template literals with variable values. If the needle spans a template boundary or exists only after reconstruction, this chunk is incorrectly pruned.Example: Template
"Error in <*>"with var"module foo"reconstructs to"Error in module foo". A search for"in module"won't match the raw payload.Consider adding a policy capability flag to gate this fast path:
- const useBodyFastPath = spec.bodyContains !== undefined && !spec.bodyLeafEquals; + const useBodyFastPath = + spec.bodyContains !== undefined && + !spec.bodyLeafEquals && + (policy?.supportsRawBodyScan?.() ?? true);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/o11ylogsdb/src/query.ts` around lines 150 - 164, The raw-byte fast path (guarded by useBodyFastPath and implemented via rawPayloadContains after codec.decode) is unsafe for templated body encodings (e.g., Drain, ColumnarDrain, TypedColumnarDrain) because a needle can appear only after template reconstruction; update the logic to skip the raw prune when the chunk's codec or policy indicates templating by adding a capability flag (e.g., supportsRawPruning or isTemplated) on the policy/codec metadata and gate the useBodyFastPath check (or the rawPayloadContains call) against that flag so templated encodings always reconstruct and search the rendered body instead of being pruned.packages/stardb/test/stream-utils.test.ts (1)
91-97:⚠️ Potential issue | 🟡 MinorError message assertions may fail.
Past review noted
StreamRegistrythrows"StreamRegistry: unknown id ${id}"but tests expect"unknown id 999". Verify the actual error message format or update assertions.#!/bin/bash # Check actual error message format in StreamRegistry ast-grep --pattern 'throw new Error($MSG)' rg -n "unknown id" packages/stardb/src/🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/stardb/test/stream-utils.test.ts` around lines 91 - 97, The test asserts "unknown id 999" but StreamRegistry actually throws a different message format; update the assertions in the test for the methods resourceOf, scopeOf, appendChunk, and chunksOf to match the real thrown message (e.g. "StreamRegistry: unknown id 999") or use a regex/assertion that checks for the substring "unknown id 999" (toThrow(/unknown id 999/)) so the expectations align with StreamRegistry's actual error format.packages/stardb/src/utils.ts (1)
48-79:⚠️ Potential issue | 🟠 MajorReject malformed hex/UUID input instead of silently coercing it.
hexNibblecurrently maps invalid chars to0, souuidToBytes/hexToBytescan decode bad input into wrong bytes.hexToBytesalso accepts odd-length strings by truncation.Proposed fix
function hexNibble(ch: number): number { if (ch >= 0x30 && ch <= 0x39) return ch - 0x30; if (ch >= 0x61 && ch <= 0x66) return ch - 0x61 + 10; if (ch >= 0x41 && ch <= 0x46) return ch - 0x41 + 10; - return 0; + return -1; } /** Parse a UUID string (with or without dashes) into 16 bytes. */ export function uuidToBytes(s: string): Uint8Array { + const hex = s.replaceAll("-", ""); + if (hex.length !== 32) { + throw new Error("uuidToBytes: UUID must contain exactly 32 hex characters"); + } const out = new Uint8Array(16); - let cur = 0; - for (let i = 0; i < s.length; i++) { - const ch = s.charCodeAt(i); - if (ch === 0x2d) continue; // dash - const hi = hexNibble(ch); - i++; - const lo = hexNibble(s.charCodeAt(i)); - out[cur++] = (hi << 4) | lo; + for (let i = 0; i < 16; i++) { + const hi = hexNibble(hex.charCodeAt(i * 2)); + const lo = hexNibble(hex.charCodeAt(i * 2 + 1)); + if (hi < 0 || lo < 0) { + throw new Error("uuidToBytes: invalid hex character"); + } + out[i] = (hi << 4) | lo; } return out; } /** Convert a hex string to a Uint8Array. */ export function hexToBytes(hex: string): Uint8Array { + if ((hex.length & 1) !== 0) { + throw new Error("hexToBytes: hex string must have even length"); + } const len = hex.length >>> 1; const out = new Uint8Array(len); for (let i = 0; i < len; i++) { const hi = hexNibble(hex.charCodeAt(i * 2)); const lo = hexNibble(hex.charCodeAt(i * 2 + 1)); + if (hi < 0 || lo < 0) { + throw new Error("hexToBytes: invalid hex character"); + } out[i] = (hi << 4) | lo; } return out; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/stardb/src/utils.ts` around lines 48 - 79, hexNibble currently maps invalid characters to 0 which silently corrupts outputs; change hexNibble to throw on non-hex characters and update callers to surface that error, validate input lengths in hexToBytes and uuidToBytes (throw on odd-length hex strings and on UUID strings that end prematurely after skipping dashes), and ensure uuidToBytes skips only '-' but otherwise checks for malformed characters using hexNibble so malformed hex/UUID input is rejected rather than coerced; refer to hexNibble, uuidToBytes, and hexToBytes when making these checks and error throws.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/o11ylogsdb/src/codec-columnar.ts`:
- Line 284: The call to encodeSidecar passes a Uint8Array (kinds) but
encodeSidecar's parameter type is readonly number[], causing a type error;
update the encodeSidecar signature in codec-utils.ts to accept Uint8Array (or a
union type like readonly number[] | Uint8Array) or change its parameter to
ArrayLike<number> so it accepts kinds without casting, and then ensure any
internal logic that iterates over the array still works with Uint8Array (use
standard indexing or for...of on the parameter) so the call
encodeSidecar(records, kinds, buf) typechecks.
In `@packages/o11ylogsdb/src/codec-typed.ts`:
- Line 554: The type error comes from calling encodeSidecar(records, kinds, buf)
where buf is a Uint8Array but encodeSidecar's signature expects readonly
number[]; update the encodeSidecar declaration in codec-utils.ts (and any
exported types it uses) to accept ArrayLike<number> (or more specifically
ArrayLike<number> | Uint8Array) instead of readonly number[] so both packages
(e.g., codec-typed.ts and codec-columnar.ts) can pass Uint8Array buffers without
type errors; ensure all imports/usages of encodeSidecar reflect the new
signature.
- Line 42: The import list in the top-level import from "./types.js" includes an
unused symbol SeverityText, causing a lint error; edit the import statement in
packages' codec-typed module to remove SeverityText (leave AnyValue and
LogRecord), then run the linter/tests to confirm the CI failure is resolved and
ensure no other references to SeverityText exist in the same file (if they do,
either remove or reintroduce a used import).
In `@packages/o11ylogsdb/src/codec-utils.ts`:
- Around line 130-131: The decode branch uses truthy checks so empty strings and
zeroes are lost; update the assignments in the function that decodes sidecar
(look for applySidecar and the block using rec and side) to test for undefined
explicitly (e.g. use side.e !== undefined or 'e' in side) before setting
rec.eventName and likewise check side.d !== undefined or 'd' in side before
setting rec.droppedAttributesCount so empty string and 0 are preserved.
- Around line 83-90: The code currently uses truthy checks on r.eventName and
r.droppedAttributesCount which causes falsy but valid values (e.g., empty string
"" or 0) to be skipped; change the checks to detect presence instead of
truthiness (e.g., test for undefined/null) so that when r.eventName is present
you still assign side.e = r.eventName and set sidecarHasContent = true, and
similarly when r.droppedAttributesCount is present assign side.d =
r.droppedAttributesCount and set sidecarHasContent = true; update the
conditional expressions around the references to r.eventName and
r.droppedAttributesCount in codec-utils.ts accordingly.
In `@packages/stardb/src/utils.ts`:
- Around line 23-45: The bytesToUuid function lacks a check that the input
Uint8Array is exactly 16 bytes, so pass-through of shorter/longer arrays yields
corrupted UUIDs; add an explicit precondition at the start of bytesToUuid that
verifies b.length === 16 and throws a clear RangeError (or TypeError) if not, so
callers fail fast and you avoid producing invalid strings; keep the rest of the
function (using hexLookup and the indexed accesses) unchanged.
---
Duplicate comments:
In `@packages/o11ylogsdb/src/query.ts`:
- Around line 150-164: The raw-byte fast path (guarded by useBodyFastPath and
implemented via rawPayloadContains after codec.decode) is unsafe for templated
body encodings (e.g., Drain, ColumnarDrain, TypedColumnarDrain) because a needle
can appear only after template reconstruction; update the logic to skip the raw
prune when the chunk's codec or policy indicates templating by adding a
capability flag (e.g., supportsRawPruning or isTemplated) on the policy/codec
metadata and gate the useBodyFastPath check (or the rawPayloadContains call)
against that flag so templated encodings always reconstruct and search the
rendered body instead of being pruned.
In `@packages/stardb/src/utils.ts`:
- Around line 48-79: hexNibble currently maps invalid characters to 0 which
silently corrupts outputs; change hexNibble to throw on non-hex characters and
update callers to surface that error, validate input lengths in hexToBytes and
uuidToBytes (throw on odd-length hex strings and on UUID strings that end
prematurely after skipping dashes), and ensure uuidToBytes skips only '-' but
otherwise checks for malformed characters using hexNibble so malformed hex/UUID
input is rejected rather than coerced; refer to hexNibble, uuidToBytes, and
hexToBytes when making these checks and error throws.
In `@packages/stardb/test/stream-utils.test.ts`:
- Around line 91-97: The test asserts "unknown id 999" but StreamRegistry
actually throws a different message format; update the assertions in the test
for the methods resourceOf, scopeOf, appendChunk, and chunksOf to match the real
thrown message (e.g. "StreamRegistry: unknown id 999") or use a regex/assertion
that checks for the substring "unknown id 999" (toThrow(/unknown id 999/)) so
the expectations align with StreamRegistry's actual error format.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: dec530d8-3d19-46ef-9120-f13331216863
📒 Files selected for processing (9)
packages/o11ylogsdb/src/codec-columnar.tspackages/o11ylogsdb/src/codec-typed.tspackages/o11ylogsdb/src/codec-utils.tspackages/o11ylogsdb/src/query.tspackages/o11ytracesdb/src/query.tspackages/stardb/src/index.tspackages/stardb/src/utils.tspackages/stardb/test/public-api.test.tspackages/stardb/test/stream-utils.test.ts
| } from "./codec-utils.js"; | ||
| import { Drain, PARAM_STR, tokenize } from "./drain.js"; | ||
| import type { AnyValue, KeyValue, LogRecord, SeverityText } from "./types.js"; | ||
| import type { AnyValue, LogRecord, SeverityText } from "./types.js"; |
There was a problem hiding this comment.
Unused import: SeverityText.
CI reports lint failure. Remove the unused import.
Proposed fix
-import type { AnyValue, LogRecord, SeverityText } from "./types.js";
+import type { AnyValue, LogRecord } from "./types.js";📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| import type { AnyValue, LogRecord, SeverityText } from "./types.js"; | |
| import type { AnyValue, LogRecord } from "./types.js"; |
🧰 Tools
🪛 GitHub Actions: validate-o11ykit-monorepo
[error] 42-42: Biome check failed (lint/correctness/noUnusedImports). Several imports are unused, including: 'import type { AnyValue, LogRecord, SeverityText } from "./types.js"'.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/o11ylogsdb/src/codec-typed.ts` at line 42, The import list in the
top-level import from "./types.js" includes an unused symbol SeverityText,
causing a lint error; edit the import statement in packages' codec-typed module
to remove SeverityText (leave AnyValue and LogRecord), then run the linter/tests
to confirm the CI failure is resolved and ensure no other references to
SeverityText exist in the same file (if they do, either remove or reintroduce a
used import).
| if (r.eventName) { | ||
| side.e = r.eventName; | ||
| sidecarHasContent = true; | ||
| } | ||
| if (r.droppedAttributesCount) { | ||
| side.d = r.droppedAttributesCount; | ||
| sidecarHasContent = true; | ||
| } |
There was a problem hiding this comment.
Falsy eventName and droppedAttributesCount are silently dropped.
PR objectives state "preserve falsy NDJSON fields (e.g., eventName: "" and droppedAttributesCount: 0)". However, lines 83 and 87 use truthy checks that skip eventName: "" and droppedAttributesCount: 0.
Proposed fix
- if (r.eventName) {
+ if (r.eventName !== undefined) {
side.e = r.eventName;
sidecarHasContent = true;
}
- if (r.droppedAttributesCount) {
+ if (r.droppedAttributesCount !== undefined) {
side.d = r.droppedAttributesCount;
sidecarHasContent = true;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (r.eventName) { | |
| side.e = r.eventName; | |
| sidecarHasContent = true; | |
| } | |
| if (r.droppedAttributesCount) { | |
| side.d = r.droppedAttributesCount; | |
| sidecarHasContent = true; | |
| } | |
| if (r.eventName !== undefined) { | |
| side.e = r.eventName; | |
| sidecarHasContent = true; | |
| } | |
| if (r.droppedAttributesCount !== undefined) { | |
| side.d = r.droppedAttributesCount; | |
| sidecarHasContent = true; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/o11ylogsdb/src/codec-utils.ts` around lines 83 - 90, The code
currently uses truthy checks on r.eventName and r.droppedAttributesCount which
causes falsy but valid values (e.g., empty string "" or 0) to be skipped; change
the checks to detect presence instead of truthiness (e.g., test for
undefined/null) so that when r.eventName is present you still assign side.e =
r.eventName and set sidecarHasContent = true, and similarly when
r.droppedAttributesCount is present assign side.d = r.droppedAttributesCount and
set sidecarHasContent = true; update the conditional expressions around the
references to r.eventName and r.droppedAttributesCount in codec-utils.ts
accordingly.
| if (side.e) rec.eventName = side.e; | ||
| if (side.d) rec.droppedAttributesCount = side.d; |
There was a problem hiding this comment.
Same falsy-check issue on decode side.
applySidecar also uses truthy checks, so eventName: "" and droppedAttributesCount: 0 won't be restored even if they were encoded.
Proposed fix
- if (side.e) rec.eventName = side.e;
- if (side.d) rec.droppedAttributesCount = side.d;
+ if (side.e !== undefined) rec.eventName = side.e;
+ if (side.d !== undefined) rec.droppedAttributesCount = side.d;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/o11ylogsdb/src/codec-utils.ts` around lines 130 - 131, The decode
branch uses truthy checks so empty strings and zeroes are lost; update the
assignments in the function that decodes sidecar (look for applySidecar and the
block using rec and side) to test for undefined explicitly (e.g. use side.e !==
undefined or 'e' in side) before setting rec.eventName and likewise check side.d
!== undefined or 'd' in side before setting rec.droppedAttributesCount so empty
string and 0 are preserved.
| export function bytesToUuid(b: Uint8Array): string { | ||
| return ( | ||
| hexLookup[b[0] as number]! + | ||
| hexLookup[b[1] as number]! + | ||
| hexLookup[b[2] as number]! + | ||
| hexLookup[b[3] as number]! + | ||
| "-" + | ||
| hexLookup[b[4] as number]! + | ||
| hexLookup[b[5] as number]! + | ||
| "-" + | ||
| hexLookup[b[6] as number]! + | ||
| hexLookup[b[7] as number]! + | ||
| "-" + | ||
| hexLookup[b[8] as number]! + | ||
| hexLookup[b[9] as number]! + | ||
| "-" + | ||
| hexLookup[b[10] as number]! + | ||
| hexLookup[b[11] as number]! + | ||
| hexLookup[b[12] as number]! + | ||
| hexLookup[b[13] as number]! + | ||
| hexLookup[b[14] as number]! + | ||
| hexLookup[b[15] as number]! | ||
| ); |
There was a problem hiding this comment.
Add an explicit 16-byte precondition in bytesToUuid.
Without a length check, non-16-byte inputs can produce corrupted UUID strings instead of failing fast.
Proposed fix
export function bytesToUuid(b: Uint8Array): string {
+ if (b.length !== 16) {
+ throw new Error("bytesToUuid: expected exactly 16 bytes");
+ }
return (
hexLookup[b[0] as number]! +🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/stardb/src/utils.ts` around lines 23 - 45, The bytesToUuid function
lacks a check that the input Uint8Array is exactly 16 bytes, so pass-through of
shorter/longer arrays yields corrupted UUIDs; add an explicit precondition at
the start of bytesToUuid that verifies b.length === 16 and throws a clear
RangeError (or TypeError) if not, so callers fail fast and you avoid producing
invalid strings; keep the rest of the function (using hexLookup and the indexed
accesses) unchanged.
Add decodeFilteredByBodyNeedle to TypedColumnarDrainPolicy. When a bodyContains query is active, the codec reconstructs bodies and tests them against the needle BEFORE parsing sidecar NDJSON. Only sidecar lines for matching records are JSON.parsed, saving ~35-60% of sidecar CPU (which was ~47% of full decode time). Template-literal shortcut: for space-free needles, templates whose literal tokens contain the needle are identified as definite matches without per-record .includes() checks. Query engine updated to use the filtered decode path via readRecordsFilteredFromRaw. Returns a sparse array with matching records at their original indices. Non-matching records are undefined. Includes 11 new tests covering filtered decode correctness, template literal shortcut, non-string body handling, and query engine integration. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move reusable code from engine packages to stardb shared core: - Interner (o11ytsdb → stardb): high-perf string→int hash table - BackpressureController (o11ytsdb → stardb): async counting semaphore - lowerBound/upperBound (o11ytsdb → stardb): BigInt64Array binary search - Bloom filter (o11ytracesdb → stardb): BF8 with double hashing - buildDictWithIndex (o11ytracesdb → stardb): frequency-sorted dictionary - encodeAnyValue/decodeAnyValue (o11ytracesdb → stardb): binary AnyValue codec - uint8IndexOf (o11ylogsdb → stardb): SIMD-accelerated byte search Engine packages now re-export from stardb, preserving all public APIs. All 1155 tests pass, typecheck clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
33 new tests covering: - Interner: intern/resolve, size, bulkIntern, cardinality limit, invalid ID - BackpressureController: acquire/release, queuing, dispose, invalid args - lowerBound/upperBound: exact match, before/after range, boundary cases - Bloom filter: insert+query, FPR bounds, base64 roundtrip, dedup, edge cases - uint8IndexOf: found, not-found, empty needle, needle > haystack - buildDictWithIndex: frequency sort, empty input - AnyValue binary codec: null, string (raw+dict), bigint, double, bool, bytes, array, map, nested structures Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Delete thin re-export wrapper files (interner.ts, backpressure.ts, bloom.ts) from engine packages. All internal imports now point directly to stardb. These were never part of the engine public APIs consumed by the site or examples. Removes Interner, BackpressureController, InternId from o11ytsdb public exports. Removes bloom.ts wrapper from o11ytracesdb (bloom functions still re-exported from index.ts via stardb). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove unused readRecordsFromRaw import from logsdb query.ts - Fix noImplicitAnyLet in engine-extended.test.ts (type lastStats) - Direct imports to stardb (no wrapper indirection) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Summary
Query Performance & Correctness (logsdb)
eventName: ""anddroppedAttributesCount: 0no longer silently dropped during NDJSON round-tripShared Code Extraction (stardb)
.find((a) => a.key ===...)boilerplateencodeSidecar()/applySidecar()deduplicates 2×50 LOC encode/decode blocksstardb Test Coverage
CI & Tooling
test-fastandkniptargetsTest Expansion
Benchmark (500K syslog records)
Storage Efficiency (500K records)