Skip to content

perf(logsdb): query optimization, shared code extraction, CI improvements#219

Merged
strawgate merged 15 commits into
mainfrom
copilot/logsdb-perf-tests
Apr 28, 2026
Merged

perf(logsdb): query optimization, shared code extraction, CI improvements#219
strawgate merged 15 commits into
mainfrom
copilot/logsdb-perf-tests

Conversation

@strawgate
Copy link
Copy Markdown
Owner

@strawgate strawgate commented Apr 28, 2026

Summary

Query Performance & Correctness (logsdb)

  • Raw byte scan optimization: decompress → Buffer.indexOf scan → skip non-matching chunks entirely
  • Single-pass decode: eliminated double-decompression bug in body fast path
  • Correctness fixes: eventName: "" and droppedAttributesCount: 0 no longer silently dropped during NDJSON round-trip
  • Bounded needle cache: 64-entry cap prevents memory leak on high-cardinality queries

Shared Code Extraction (stardb)

  • StreamRegistry: generic class in stardb, subclassed by logsdb and tracesdb (~380 LOC dedup)
  • ByteBuf + ByteReader: generic growable buffer + reader in stardb (~400 LOC dedup from 3 copies)
  • Chunk wire format: generic serialize/deserialize parameterized by magic bytes
  • bytesToHex/hexToBytes: single optimized implementation, removed 9 copies across 6 files
  • bytesToUuid/uuidToBytes: hot-path optimized with hexNibble (was 11.4% of decode CPU)
  • FNV-1a hash: fnv1aBytes shared, verified equivalence across engines
  • timeRangeOverlaps: half-open interval overlap predicate used in all query engines
  • anyValueToJson/jsonToAnyValue/anyValueEquals: OTLP value serialization shared
  • findAttribute: attribute lookup replacing .find((a) => a.key ===...) boilerplate
  • Sidecar NDJSON: encodeSidecar()/applySidecar() deduplicates 2×50 LOC encode/decode blocks
  • StreamKey type: shared interface across engines
  • Net ~1400 LOC eliminated across all engines

stardb Test Coverage

  • binary.ts: 0% → 100% (28 tests)
  • chunk-wire.ts: 0% → 100% (14 tests)
  • any-value.ts: new module, 100% (28 tests)
  • utils.ts: 54% → 97% (UUID, timeRangeOverlaps, fnv1aBytes, bytesEqual)
  • stream.ts: 88% → 94%

CI & Tooling

  • Parallel CI: lint, typecheck, test, build now run concurrently (was sequential)
  • Pre-commit hooks: husky + lint-staged runs biome check on staged files
  • Dead code detection: knip configured for monorepo workspaces
  • Coverage expansion: all 3 engines now tracked (was only tsdb + stardb)
  • Makefile: added test-fast and knip targets

Test Expansion

  • +262 new tests (82 → 248 in logsdb, +131 in stardb, +various tracesdb)
  • 1144 total repo-wide tests, all passing
  • Coverage thresholds enforced across all packages

Benchmark (500K syslog records)

Query Latency Chunks Pruned
full_scan 745ms 0/489
bodyContains "ssh" (26% hit) 1001ms 0/489
bodyContains "error" (0% hit) 417ms 489/489
time first 10% 73ms 440/489
time+sev+body 468ms 244/489

Storage Efficiency (500K records)

Corpus B/log vs ZSTD-3 vs ZSTD-19
syslog 14.8 1.45× smaller 0.92×
structured 47.6 1.44× smaller 0.95×
high-cardinality 66.3 1.74× smaller 1.31× smaller
cloud-native 21.7 1.55× smaller 0.90×

strawgate and others added 5 commits April 28, 2026 09:54
Replace the unsound template-token pruning with a simpler, correct
optimization: decompress the chunk payload and check if the needle's
UTF-8 bytes exist anywhere in the raw buffer via indexOf.

If the bytes aren't found, no body in the chunk can contain the
needle — skip all string construction and template reconstruction.
This is always correct (sound as a negative filter).

Results at 500K records:
- bodyContains 'error' (0 hits): 417ms → 107ms (3.9× faster)
- bodyContains 'zzz_no_match': ~400ms → 87ms (4.6× faster)
- bodyContains 'ssh' (133K hits): unchanged (bytes found, falls through)

The raw scan uses Buffer.indexOf (SIMD-optimized in Node/V8) with a
portable Uint8Array fallback for browser environments.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove the double-decode body-only → full-decode path that caused
regression for high-match-rate queries (ssh: 1521ms → 613ms).

New approach: raw byte scan to eliminate non-matching chunks (65-110ms),
then single full decode + inline filter for matching chunks.

500K syslog benchmarks:
- Non-matching body: 65-110ms (was 417ms, 4-6× faster)
- Matching body 'ssh': 613ms (was 1001ms, 1.6× faster)
- body+limit=100: 1ms (was 12ms, 12× faster)
- time+sev+body: 407ms (was 468ms, 1.15× faster)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add 4 new test files covering:
- query-correctness: body-only fast path equivalence, combined predicates,
  queryStream parity, limit boundaries, API signature regression
- query-scale: 10K records, hit count accuracy, pruning effectiveness,
  stats consistency (chunksScanned + chunksPruned = total)
- codec-typed-correctness: special chars, UUID/INT64/timestamp slot
  round-trips, structured bodies, toks metadata, mixed chunk shapes
- readBodiesOnly: per-policy body-only decode correctness, structured
  bodies, empty/long bodies, cross-policy consistency

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
New test files:
- classify.test.ts (20 tests): all BodyKind branches, TemplatedClassifier
- engine-extended.test.ts (14 tests): flush edge cases, iterRecords,
  rowsPerChunk extremes, TypedColumnar via engine
- stream-extended.test.ts (17 tests): error paths, scope/resource
  differentiation, complex attribute types, reference caching
- compact-extended.test.ts (7 tests): codec diversity (gzip↔zstd),
  non-string bodies, rich metadata, TypedColumnar compaction
- query-edge-cases.test.ts (16 tests): non-ASCII/CJK needles, leafGet
  paths, numeric bodyLeafEquals, time boundary precision, combined
  predicates, empty store

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ests

Query engine:
- Extract readRecordsFromRaw() to avoid re-decompressing payloads
  in the body fast path (was calling readRecords which decompress again)
- Remove unused readBodiesOnly import
- Cap needleCache at 64 entries to prevent unbounded memory growth
- Fix biome formatting

Chunk NDJSON serialization:
- Fix eventName='' being silently dropped (was using truthy check)
- Fix droppedAttributesCount=0 being silently dropped
- Use !== undefined checks instead of truthy for both encode and decode

New tests (23 tests):
- ndjson-roundtrip.test.ts (12): falsy field preservation, complex
  bodies (null, nested object, numeric, boolean, array), full rich
  record pipeline round-trip
- query-integration.test.ts (11): readRecordsFromRaw correctness,
  body fast path on NDJSON stores, resourceEquals stream pruning,
  queryStream generator early-termination behavior

Total: 239 tests across 19 files, all passing.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 28, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds exported readRecordsFromRaw(raw, header, policy?) to decode already‑decompressed chunk payloads and updates readRecords to delegate to it. Reworks the bodyContains query fast path to perform a UTF‑8 byte‑level substring scan and only decode records when the needle may be present. Replaces many local binary/hex/varint, chunk‑wire, and stream‑registry implementations with shared stardb utilities (e.g., ByteBuf/ByteReader, bytesToHex/hexToBytes, serializeChunkWire/deserializeChunkWire, StreamRegistry). Fixes NDJSON presence checks for falsy fields, centralizes AnyValue/template helpers in a new codec-utils module, and adds extensive tests and CI/tooling updates.

Possibly related PRs

  • PR 192: Introduces stardb primitives and utilities (byte/hex/UUID helpers, chunk‑wire functions, StreamRegistry) that this change imports and consumes.
  • PR 180: Overlaps changes to chunk.ts, index.ts, and query.ts, including new raw‑decode entrypoints and related decoding paths.
  • PR 205: Modifies chunk decoding and query fast‑path logic (body scanning and selective decoding) that directly relates to the bodyContains byte‑scan and raw‑decode flow.
🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working performance Performance optimization labels Apr 28, 2026
Comment on lines 266 to +272
/**
* Template-token pruning for bodyContains. If the chunk header carries
* template literal tokens (TypedColumnarDrainPolicy stores these in
* codecMeta.toks), check if any token contains the needle as a
* substring. If no template token can match AND the chunk metadata
* confirms zero raw-string bodies, we can skip ZSTD decompression.
*
* SOUNDNESS: We can only prune when BOTH conditions hold:
* 1. No template literal token contains the needle
* 2. The chunk has zero raw-string bodies (rawCount === 0)
* Raw byte scan: check if the needle's UTF-8 bytes appear anywhere
* in the decompressed payload buffer. This is a sound negative filter:
* if the bytes aren't found, no body string in this chunk can contain
* the needle. False positives are possible (the needle bytes might
* appear in template dictionary metadata, slot type headers, etc.)
* but are rare and handled by the subsequent per-record check.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Edge Case: Raw byte scan false negative for JSON-escaped characters in NDJSON

Even for NDJSON payloads, rawPayloadContains can produce false negatives when the bodyContains needle includes characters that get JSON-escaped (e.g., literal newlines ``, quotes ", backslashes ``). The raw bytes in the NDJSON buffer contain the escaped form (`\n`), not the original byte (`0x0A`), so a scan for the original character won't match.

This is a minor edge case because bodyContains searches in practice are almost always for printable ASCII/UTF-8 text that doesn't require JSON escaping, but it's worth documenting the limitation.

Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/o11ylogsdb/src/query.ts`:
- Around line 149-167: The raw-byte fast path (guarded by useBodyFastPath and
using rawPayloadContains before calling readRecordsFromRaw) is unsound for
policies that reconstruct or template bodies in postDecode; add an explicit
capability check on the policy (e.g., policy.allowsRawBodyScan or
!policy.postDecodeTransforms) before using useBodyFastPath, and if the policy
does not guarantee bodies are preserved as contiguous UTF-8 sequences, skip the
raw-byte scan and instead call decodeBodiesOnly or perform the full decode path
(readRecordsFromRaw) so templated/reconstructed bodies aren't incorrectly
pruned.

In `@packages/o11ylogsdb/test/compact-extended.test.ts`:
- Around line 145-163: The test "compact + query integration" is claiming to
exercise compaction but never calls the compaction routine; update the test to
actually compact stored chunks before running query by invoking the compaction
helper (e.g., compactChunk) for the relevant chunk(s) produced by the LogStore
instance (use the store variable and its chunks/IDs or API to get chunk
identifiers), then run query(store, ...) and assert on records; specifically,
after store.flush() call the compactChunk function (or store.compactChunk /
store.compactAll as available) on the flushed chunk(s) so the query truly runs
against compacted chunks.

In `@packages/o11ylogsdb/test/engine-extended.test.ts`:
- Around line 60-65: Declare lastStats with an explicit type to satisfy
noImplicitAnyLet: change the current untyped declaration for lastStats to use
the IngestStats type returned by store.append (e.g., IngestStats | undefined),
referring to the lastStats variable and the store.append(...) call in the
LogStore test so the assignment stays the same but the variable is explicitly
typed.

In `@packages/o11ylogsdb/test/query-correctness.test.ts`:
- Around line 123-138: The combined-predicate test ("combines time + severity +
bodyContains") is vulnerable to vacuous passes because it only asserts
properties inside a loop and will succeed if result.records is empty; update the
test that calls buildMultiChunkStore() and query(...) to include an explicit
cardinality assertion (e.g., expect(result.records.length).toBeGreaterThan(0) or
assert the exact expected count) before the for-loop so an empty result fails;
apply the same change to the other combined-predicate test around the query(...)
call at the later block (lines ~196-207) to assert result.records has the
expected number of entries before iterating.

In `@packages/o11ylogsdb/test/stream-extended.test.ts`:
- Around line 2-17: The test fixture fakeChunk() uses the old ChunkHeader shape
(minNano/maxNano) and is missing required fields; update fakeChunk() so its
header matches the current ChunkHeader shape in src/chunk.ts by adding
schemaVersion, timeRange (with minNano and maxNano), severityRange, resource and
scope entries, and keep payload/payloadBytes consistent; reuse the imported
Resource and InstrumentationScope types for the resource and scope fields and
ensure header.codecName, header.nLogs and header.payloadBytes remain present to
satisfy StreamRegistry/chunk consumers.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 204fa5ea-5a0d-4708-b817-9bee9dab7d2c

📥 Commits

Reviewing files that changed from the base of the PR and between 14a1bcd and c98eece.

📒 Files selected for processing (14)
  • packages/o11ylogsdb/src/chunk.ts
  • packages/o11ylogsdb/src/index.ts
  • packages/o11ylogsdb/src/query.ts
  • packages/o11ylogsdb/test/classify.test.ts
  • packages/o11ylogsdb/test/codec-typed-correctness.test.ts
  • packages/o11ylogsdb/test/compact-extended.test.ts
  • packages/o11ylogsdb/test/engine-extended.test.ts
  • packages/o11ylogsdb/test/ndjson-roundtrip.test.ts
  • packages/o11ylogsdb/test/query-correctness.test.ts
  • packages/o11ylogsdb/test/query-edge-cases.test.ts
  • packages/o11ylogsdb/test/query-integration.test.ts
  • packages/o11ylogsdb/test/query-scale.test.ts
  • packages/o11ylogsdb/test/readBodiesOnly.test.ts
  • packages/o11ylogsdb/test/stream-extended.test.ts

Comment thread packages/o11ylogsdb/src/query.ts Outdated
Comment on lines +145 to +163
describe("compact + query integration", () => {
it("query works on compacted chunks", () => {
const store = new LogStore({
rowsPerChunk: 8,
policyFactory: () => new TypedColumnarDrainPolicy(),
});
for (let i = 0; i < 20; i++) {
store.append(resource, scope, rec(`user ${i % 2 === 0 ? "login" : "logout"} event`, i < 10 ? 9 : 17, BigInt(i * 100)));
}
store.flush();

const { records } = query(store, { bodyContains: "login", severityGte: 17 });
expect(records.length).toBeGreaterThan(0);
for (const r of records) {
expect(r.body).toContain("login");
expect(r.severityNumber).toBeGreaterThanOrEqual(17);
}
});
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

compact + query integration test does not exercise compaction.

Line 146 claims compacted-chunk coverage, but the test only writes to LogStore and runs query; compactChunk is never invoked. A compaction/query regression would be missed.

Minimal corrective diff (rename to match actual behavior)
-describe("compact + query integration", () => {
-  it("query works on compacted chunks", () => {
+describe("query integration with typed-columnar chunks", () => {
+  it("query works on flushed typed-columnar chunks", () => {

Based on learnings: Bug fixes should include a test that would fail without the fix.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/o11ylogsdb/test/compact-extended.test.ts` around lines 145 - 163,
The test "compact + query integration" is claiming to exercise compaction but
never calls the compaction routine; update the test to actually compact stored
chunks before running query by invoking the compaction helper (e.g.,
compactChunk) for the relevant chunk(s) produced by the LogStore instance (use
the store variable and its chunks/IDs or API to get chunk identifiers), then run
query(store, ...) and assert on records; specifically, after store.flush() call
the compactChunk function (or store.compactChunk / store.compactAll as
available) on the flushed chunk(s) so the query truly runs against compacted
chunks.

Comment thread packages/o11ylogsdb/test/engine-extended.test.ts
Comment on lines +123 to +138
it("combines time + severity + bodyContains", () => {
const { store } = buildMultiChunkStore();
const result = query(store, {
range: { from: 1000n, to: 3000n }, // records 0..19
severityGte: 13,
bodyContains: "error",
});
// In the first 20 records (t=1000..2900), errors at index 0,7,14
// severity for error records is 17, which >= 13
for (const r of result.records) {
expect(Number(r.timeUnixNano)).toBeGreaterThanOrEqual(1000);
expect(Number(r.timeUnixNano)).toBeLessThan(3000);
expect(r.severityNumber).toBeGreaterThanOrEqual(13);
expect(r.body).toContain("error");
}
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Prevent vacuous passes in combined-predicate tests.

Line 123 and Line 196 only validate properties inside loops; if a regression returns zero records, both tests still pass. Add explicit cardinality assertions.

Proposed fix
   it("combines time + severity + bodyContains", () => {
@@
     const result = query(store, {
       range: { from: 1000n, to: 3000n }, // records 0..19
       severityGte: 13,
       bodyContains: "error",
     });
+    expect(result.records.length).toBe(3);
@@
   it("combined time+bodyContains fast path still correct", () => {
@@
     const result = query(store, {
       range: { from: 2000n, to: 4000n },
       bodyContains: "gateway",
     });
+    expect(result.records.length).toBe(12);

Based on learnings: Bug fixes should include a test that would fail without the fix.

Also applies to: 196-207

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/o11ylogsdb/test/query-correctness.test.ts` around lines 123 - 138,
The combined-predicate test ("combines time + severity + bodyContains") is
vulnerable to vacuous passes because it only asserts properties inside a loop
and will succeed if result.records is empty; update the test that calls
buildMultiChunkStore() and query(...) to include an explicit cardinality
assertion (e.g., expect(result.records.length).toBeGreaterThan(0) or assert the
exact expected count) before the for-loop so an empty result fails; apply the
same change to the other combined-predicate test around the query(...) call at
the later block (lines ~196-207) to assert result.records has the expected
number of entries before iterating.

Comment on lines +2 to +17
import { StreamRegistry } from "../src/stream.js";
import type { Resource, InstrumentationScope, StreamId } from "../src/types.js";
import type { Chunk } from "../src/chunk.js";

function fakeChunk(): Chunk {
return {
header: {
codecName: "ndjson+zstd-19",
nLogs: 1,
minNano: 1n,
maxNano: 2n,
codecMeta: undefined,
payloadBytes: 10,
},
payload: new Uint8Array(10),
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, locate and read the chunk.ts file to understand the ChunkHeader type
find . -name "chunk.ts" -o -name "chunk.js" | head -20

Repository: strawgate/o11ykit

Length of output: 133


🏁 Script executed:

# Read the chunk.ts file to see the actual type definition
cat packages/o11ylogsdb/src/chunk.ts

Repository: strawgate/o11ykit

Length of output: 14934


🏁 Script executed:

# Read the test file to verify current fakeChunk() implementation
cat packages/o11ylogsdb/test/stream-extended.test.ts

Repository: strawgate/o11ykit

Length of output: 7128


Update fakeChunk() to match the current ChunkHeader shape.

The fixture uses the old minNano/maxNano fields and omits required fields (schemaVersion, timeRange, severityRange, resource, scope), causing type mismatches against packages/o11ylogsdb/src/chunk.ts.

Proposed fix
-import type { Chunk } from "../src/chunk.js";
+import { CHUNK_VERSION } from "../src/chunk.js";
+import type { Chunk } from "../src/chunk.js";
@@
 function fakeChunk(): Chunk {
   return {
     header: {
+      schemaVersion: CHUNK_VERSION,
       codecName: "ndjson+zstd-19",
       nLogs: 1,
-      minNano: 1n,
-      maxNano: 2n,
+      timeRange: { minNano: "1", maxNano: "2" },
+      severityRange: { min: 1, max: 1 },
+      resource: { attributes: [] },
+      scope: { name: "test" },
       codecMeta: undefined,
       payloadBytes: 10,
     },
     payload: new Uint8Array(10),
   };
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import { StreamRegistry } from "../src/stream.js";
import type { Resource, InstrumentationScope, StreamId } from "../src/types.js";
import type { Chunk } from "../src/chunk.js";
function fakeChunk(): Chunk {
return {
header: {
codecName: "ndjson+zstd-19",
nLogs: 1,
minNano: 1n,
maxNano: 2n,
codecMeta: undefined,
payloadBytes: 10,
},
payload: new Uint8Array(10),
};
import { CHUNK_VERSION } from "../src/chunk.js";
import type { Resource, InstrumentationScope, StreamId } from "../src/types.js";
import type { Chunk } from "../src/chunk.js";
function fakeChunk(): Chunk {
return {
header: {
schemaVersion: CHUNK_VERSION,
codecName: "ndjson+zstd-19",
nLogs: 1,
timeRange: { minNano: "1", maxNano: "2" },
severityRange: { min: 1, max: 1 },
resource: { attributes: [] },
scope: { name: "test" },
codecMeta: undefined,
payloadBytes: 10,
},
payload: new Uint8Array(10),
};
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/o11ylogsdb/test/stream-extended.test.ts` around lines 2 - 17, The
test fixture fakeChunk() uses the old ChunkHeader shape (minNano/maxNano) and is
missing required fields; update fakeChunk() so its header matches the current
ChunkHeader shape in src/chunk.ts by adding schemaVersion, timeRange (with
minNano and maxNano), severityRange, resource and scope entries, and keep
payload/payloadBytes consistent; reuse the imported Resource and
InstrumentationScope types for the resource and scope fields and ensure
header.codecName, header.nLogs and header.payloadBytes remain present to satisfy
StreamRegistry/chunk consumers.

- Move StreamRegistry to stardb as generic StreamRegistry<C> class
  - logsdb and tracesdb now subclass with their Chunk types
  - Includes removeChunk() with stream cleanup (from tracesdb)
  - Includes stale ref detection (WeakMap fast path validation)

- Extract bytesToHex/hexToBytes to stardb/utils.ts
  - Remove 9 duplicate implementations across 6 files
  - Lookup-table implementation for bytesToHex (consistent perf)

- Add nowMillis() utility to stardb

- Parallelize CI workflow: lint, typecheck, test, build run concurrently
  - Tests still matrix across Node 22 + 24
  - Faster overall CI time (~3× with parallel jobs)

- Add pre-commit hooks (husky + lint-staged)
  - Runs biome check on staged files only (fast)

- Add knip for dead code detection
  - Configured for monorepo workspaces, excludes bench/ dirs

- Expand coverage to all 3 engines (logsdb + tracesdb added)
  - Global thresholds: 65% branches, 75% functions, 70% lines

- Add make test-fast and make knip targets

- Add stardb stream/utils tests (16 tests)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dosubot dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Apr 28, 2026
@strawgate strawgate changed the title perf(logsdb): query optimization, correctness fixes, comprehensive test expansion perf(logsdb): query optimization, shared code extraction, CI improvements Apr 28, 2026
Comment thread vitest.config.ts
Comment on lines +34 to 43
branches: 65,
functions: 75,
lines: 70,
statements: 70,
"packages/stardb/src/**/*.ts": {
branches: 90,
branches: 75,
functions: 100,
lines: 95,
statements: 95,
lines: 93,
statements: 93,
},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Quality: Coverage thresholds significantly lowered for stardb

The stardb-specific branch coverage threshold dropped from 90% → 75%, and line/statement from 95% → 93%. The old code had an explicit comment: "Hold it to a strict threshold so regressions surface here instead of whatever *db package noticed first." Since new code (StreamRegistry, utils) was added to stardb in this PR, the thresholds should ideally go up or stay flat, not drop by 15 percentage points on branches. The global thresholds also dropped substantially (branches 71→65, functions 86→75, lines 82→70). While adding more source packages to coverage tracking naturally dilutes percentages, this large a drop may mask future regressions.

Suggested fix:

Consider running `npx vitest --coverage` to see actual current coverage, then set thresholds 1-2% below the current values rather than padding with a large buffer. For stardb specifically, keeping branches ≥ 85% would preserve the original intent.

Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
packages/o11ylogsdb/src/codec-typed.ts (1)

854-859: ⚠️ Potential issue | 🟠 Major

Preserve falsy sidecar fields ("" and 0) during round-trip.

Line 854/858 and Line 1146/1147 still gate on truthiness, so eventName: "" and droppedAttributesCount: 0 are lost.

Suggested fix
-    if (r.eventName) {
+    if (r.eventName !== undefined) {
       side.e = r.eventName;
       sidecarHasContent = true;
     }
-    if (r.droppedAttributesCount) {
+    if (r.droppedAttributesCount !== undefined) {
       side.d = r.droppedAttributesCount;
       sidecarHasContent = true;
     }
@@
-    if (side.e) rec.eventName = side.e as string;
-    if (side.d) rec.droppedAttributesCount = side.d as number;
+    if (side.e !== undefined) rec.eventName = side.e as string;
+    if (side.d !== undefined) rec.droppedAttributesCount = side.d as number;

Also applies to: 1146-1147

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/o11ylogsdb/src/codec-typed.ts` around lines 854 - 859, The checks
for r.eventName and r.droppedAttributesCount use truthiness and drop valid falsy
values ("" and 0); change the guards in the sidecar population to test for
undefined (e.g., r.eventName !== undefined and r.droppedAttributesCount !==
undefined) so empty string and zero are preserved, set side.e and side.d and
update sidecarHasContent accordingly; apply the same undefined-check fix to the
identical guards at the other location referenced (lines 1146/1147) so both
round-trip paths keep falsy values.
packages/o11ylogsdb/src/codec-columnar.ts (1)

434-439: ⚠️ Potential issue | 🟠 Major

Falsy eventName/droppedAttributesCount are still dropped.

Line 434/438 and Line 581/582 should use !== undefined; current truthy checks lose valid values ("", 0).

Suggested fix
-    if (r.eventName) {
+    if (r.eventName !== undefined) {
       side.e = r.eventName;
       sidecarHasContent = true;
     }
-    if (r.droppedAttributesCount) {
+    if (r.droppedAttributesCount !== undefined) {
       side.d = r.droppedAttributesCount;
       sidecarHasContent = true;
     }
@@
-    if (side.e) rec.eventName = side.e;
-    if (side.d) rec.droppedAttributesCount = side.d;
+    if (side.e !== undefined) rec.eventName = side.e;
+    if (side.d !== undefined) rec.droppedAttributesCount = side.d;

Also applies to: 581-582

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/o11ylogsdb/src/codec-columnar.ts` around lines 434 - 439, Change the
truthy checks that drop valid falsy values: replace checks like "if
(r.eventName)" and "if (r.droppedAttributesCount)" with explicit undefined
checks ("if (r.eventName !== undefined)" and "if (r.droppedAttributesCount !==
undefined)") so empty-string eventName and zero droppedAttributesCount are
preserved when assigning to side.e and side.d and when updating
sidecarHasContent; apply the same change to the other occurrence later in the
file where r.eventName / r.droppedAttributesCount are conditionally assigned
(the second block around the other checks).
packages/stardb/src/index.ts (1)

16-25: ⚠️ Potential issue | 🟠 Major

Ensure @types/node is installed before building.

The stardb package manifest is correctly configured with proper exports. However, the TypeScript build fails because it cannot resolve the node type definitions (referenced by codec-baseline.ts for the node:zlib import). The build needs @types/node available during compilation.

Once @types/node is installed and the build completes, the dist/ folder will be generated and all consumer imports from "stardb" will resolve correctly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stardb/src/index.ts` around lines 16 - 25, The TypeScript build for
the stardb package fails due to missing Node type definitions referenced by
codec-baseline.ts (it imports "node:zlib"); add `@types/node` as a devDependency
in the package.json for the stardb package and/or ensure tsconfig.json includes
"node" in the "types" array so the compiler can resolve Node ambient types
during build; after adding the dependency run an install and rebuild to produce
the dist/ output so exports from index.ts (e.g., StreamRegistry, bytesToHex,
nowMillis) resolve for consumers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/ci.yml:
- Around line 17-24: Replace mutable action tags with immutable commit SHAs:
locate usages like actions/checkout@v6 and actions/setup-node@v6 in
.github/workflows/ci.yml and the other workflow files (publish-octo11y.yml,
release.yml, pages.yml, benchmarks.yml, actions-dist.yml) and update each
reference to the corresponding 40-character commit SHA (e.g.,
actions/checkout@<sha>, actions/setup-node@<sha>) so all third-party actions are
pinned to a specific commit; ensure you verify the correct SHA from the action's
GitHub releases/tags page and update any other `@v`<number> occurrences
consistently.

In `@knip.json`:
- Line 2: The "$schema" entry in knip.json currently points to
"https://unpkg.com/knip@latest/schema.json" which can cause schema drift; change
the URL to pin the Knip major or exact version that matches your dependency
(e.g., "https://unpkg.com/knip@6/schema.json" or
"https://unpkg.com/knip@6.7.0/schema.json") so the "$schema" value aligns with
the installed ^6.7.0 dependency.

In `@packages/stardb/src/utils.ts`:
- Around line 16-23: The hexToBytes function currently silently coerces
malformed input; update hexToBytes to validate the input string first and throw
a descriptive Error when invalid: ensure hex is a non-null string, has even
length (hex.length % 2 === 0) and contains only hex characters (0-9, a-f, A-F) —
e.g., test with a regex like /^[0-9a-fA-F]+$/ — and only then proceed to parse
each byte (preserve existing logic using Number.parseInt with radix 16); include
a clear error message mentioning "hexToBytes" and the reason (odd length or
invalid characters).

In `@packages/stardb/test/stream-utils.test.ts`:
- Around line 81-87: The test fails because StreamRegistry methods throw errors
prefixed with "StreamRegistry: ..."; update the error messages in StreamRegistry
(the throw sites used by resourceOf, scopeOf, appendChunk, chunksOf) to produce
"unknown id {id}" (i.e., remove the "StreamRegistry: " prefix) so the thrown
message exactly matches the test expectation; locate the throw statements in the
StreamRegistry class and change the message formatting to `unknown id ${id}` for
those methods.

In `@vitest.config.ts`:
- Around line 34-43: The coverage config added new package patterns
"packages/o11ylogsdb/src/**/*.ts" and "packages/o11ytracesdb/src/**/*.ts" but no
package-scoped thresholds were set; update the coverage thresholds object (the
same structure used for "packages/stardb/src/**/*.ts") to include entries for
both "packages/o11ylogsdb/src/**/*.ts" and "packages/o11ytracesdb/src/**/*.ts"
with explicit numeric values for branches, functions, lines, and statements
(e.g., mirror the style used for the stardb entry), so each package has its own
minimums and coverage drift in one package cannot be masked by another.

---

Outside diff comments:
In `@packages/o11ylogsdb/src/codec-columnar.ts`:
- Around line 434-439: Change the truthy checks that drop valid falsy values:
replace checks like "if (r.eventName)" and "if (r.droppedAttributesCount)" with
explicit undefined checks ("if (r.eventName !== undefined)" and "if
(r.droppedAttributesCount !== undefined)") so empty-string eventName and zero
droppedAttributesCount are preserved when assigning to side.e and side.d and
when updating sidecarHasContent; apply the same change to the other occurrence
later in the file where r.eventName / r.droppedAttributesCount are conditionally
assigned (the second block around the other checks).

In `@packages/o11ylogsdb/src/codec-typed.ts`:
- Around line 854-859: The checks for r.eventName and r.droppedAttributesCount
use truthiness and drop valid falsy values ("" and 0); change the guards in the
sidecar population to test for undefined (e.g., r.eventName !== undefined and
r.droppedAttributesCount !== undefined) so empty string and zero are preserved,
set side.e and side.d and update sidecarHasContent accordingly; apply the same
undefined-check fix to the identical guards at the other location referenced
(lines 1146/1147) so both round-trip paths keep falsy values.

In `@packages/stardb/src/index.ts`:
- Around line 16-25: The TypeScript build for the stardb package fails due to
missing Node type definitions referenced by codec-baseline.ts (it imports
"node:zlib"); add `@types/node` as a devDependency in the package.json for the
stardb package and/or ensure tsconfig.json includes "node" in the "types" array
so the compiler can resolve Node ambient types during build; after adding the
dependency run an install and rebuild to produce the dist/ output so exports
from index.ts (e.g., StreamRegistry, bytesToHex, nowMillis) resolve for
consumers.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 5b6fea2a-1610-49aa-8895-bb75b7dd32d5

📥 Commits

Reviewing files that changed from the base of the PR and between c98eece and c01a536.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (19)
  • .github/workflows/ci.yml
  • .husky/pre-commit
  • Makefile
  • knip.json
  • package.json
  • packages/o11ylogsdb/src/chunk.ts
  • packages/o11ylogsdb/src/codec-columnar.ts
  • packages/o11ylogsdb/src/codec-typed.ts
  • packages/o11ylogsdb/src/stream.ts
  • packages/o11ytracesdb/src/chunk.ts
  • packages/o11ytracesdb/src/correlate.ts
  • packages/o11ytracesdb/src/query.ts
  • packages/o11ytracesdb/src/stream.ts
  • packages/stardb/src/index.ts
  • packages/stardb/src/stream.ts
  • packages/stardb/src/utils.ts
  • packages/stardb/test/public-api.test.ts
  • packages/stardb/test/stream-utils.test.ts
  • vitest.config.ts

Comment thread .github/workflows/ci.yml
Comment on lines +17 to +24
uses: actions/checkout@v6

- name: Setup Node
uses: actions/setup-node@v6
with:
node-version: 24
cache: npm
cache-dependency-path: package-lock.json
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify whether workflow actions are SHA-pinned (40-hex) or tag-pinned.
# Expected: third-party `uses:` lines should use @<40-hex-sha> for hardening.

echo "All action references:"
rg -nP '^\s*uses:\s*[^@]+@[^ \t]+' .github/workflows/*.yml

echo
echo "SHA-pinned action references:"
rg -nP '^\s*uses:\s*[^@]+@[a-f0-9]{40}\b' .github/workflows/*.yml

Repository: strawgate/o11ykit

Length of output: 1913


Pin all third-party GitHub Actions to commit SHAs.

All action usages throughout the repository are pinned to mutable tags (@v6, @v5, etc.) instead of immutable commit SHAs. This weakens CI supply-chain guarantees and affects not only .github/workflows/ci.yml but also publish-octo11y.yml, release.yml, pages.yml, benchmarks.yml, and actions-dist.yml. Replace all @v<number> references with @<40-character-sha>.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/ci.yml around lines 17 - 24, Replace mutable action tags
with immutable commit SHAs: locate usages like actions/checkout@v6 and
actions/setup-node@v6 in .github/workflows/ci.yml and the other workflow files
(publish-octo11y.yml, release.yml, pages.yml, benchmarks.yml, actions-dist.yml)
and update each reference to the corresponding 40-character commit SHA (e.g.,
actions/checkout@<sha>, actions/setup-node@<sha>) so all third-party actions are
pinned to a specific commit; ensure you verify the correct SHA from the action's
GitHub releases/tags page and update any other `@v`<number> occurrences
consistently.

Comment thread knip.json
@@ -0,0 +1,46 @@
{
"$schema": "https://unpkg.com/knip@latest/schema.json",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify current schema pinning and installed knip version.
# Expected: schema URL should not use `@latest` and should align with installed knip major.

echo "knip devDependency:"
jq -r '.devDependencies.knip' package.json

echo "knip schema URL:"
jq -r '.["$schema"]' knip.json

Repository: strawgate/o11ykit

Length of output: 147


Pin Knip schema to a specific version instead of @latest.

The schema URL currently uses @latest, which can introduce silent schema drift between local development and CI/CD environments. Align the schema version with the installed Knip dependency (^6.7.0). Use a pinned major or exact version—for example, https://unpkg.com/knip@6/schema.json or https://unpkg.com/knip@6.7.0/schema.json.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@knip.json` at line 2, The "$schema" entry in knip.json currently points to
"https://unpkg.com/knip@latest/schema.json" which can cause schema drift; change
the URL to pin the Knip major or exact version that matches your dependency
(e.g., "https://unpkg.com/knip@6/schema.json" or
"https://unpkg.com/knip@6.7.0/schema.json") so the "$schema" value aligns with
the installed ^6.7.0 dependency.

Comment on lines +16 to +23
export function hexToBytes(hex: string): Uint8Array {
const len = hex.length >>> 1;
const out = new Uint8Array(len);
for (let i = 0; i < len; i++) {
out[i] = Number.parseInt(hex.slice(i * 2, i * 2 + 2), 16);
}
return out;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

hexToBytes should reject malformed hex instead of silently coercing.

Current behavior accepts odd-length or non-hex content and can produce corrupted output bytes without throwing.

Suggested fix
 export function hexToBytes(hex: string): Uint8Array {
+  if ((hex.length & 1) !== 0) {
+    throw new Error("hexToBytes: hex string must have even length");
+  }
   const len = hex.length >>> 1;
   const out = new Uint8Array(len);
   for (let i = 0; i < len; i++) {
-    out[i] = Number.parseInt(hex.slice(i * 2, i * 2 + 2), 16);
+    const hi = hexNibble(hex.charCodeAt(i * 2));
+    const lo = hexNibble(hex.charCodeAt(i * 2 + 1));
+    if (hi < 0 || lo < 0) throw new Error("hexToBytes: invalid hex character");
+    out[i] = (hi << 4) | lo;
   }
   return out;
 }
+
+function hexNibble(ch: number): number {
+  if (ch >= 0x30 && ch <= 0x39) return ch - 0x30;
+  if (ch >= 0x61 && ch <= 0x66) return ch - 0x61 + 10;
+  if (ch >= 0x41 && ch <= 0x46) return ch - 0x41 + 10;
+  return -1;
+}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export function hexToBytes(hex: string): Uint8Array {
const len = hex.length >>> 1;
const out = new Uint8Array(len);
for (let i = 0; i < len; i++) {
out[i] = Number.parseInt(hex.slice(i * 2, i * 2 + 2), 16);
}
return out;
}
export function hexToBytes(hex: string): Uint8Array {
if ((hex.length & 1) !== 0) {
throw new Error("hexToBytes: hex string must have even length");
}
const len = hex.length >>> 1;
const out = new Uint8Array(len);
for (let i = 0; i < len; i++) {
const hi = hexNibble(hex.charCodeAt(i * 2));
const lo = hexNibble(hex.charCodeAt(i * 2 + 1));
if (hi < 0 || lo < 0) throw new Error("hexToBytes: invalid hex character");
out[i] = (hi << 4) | lo;
}
return out;
}
function hexNibble(ch: number): number {
if (ch >= 0x30 && ch <= 0x39) return ch - 0x30;
if (ch >= 0x61 && ch <= 0x66) return ch - 0x61 + 10;
if (ch >= 0x41 && ch <= 0x46) return ch - 0x41 + 10;
return -1;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stardb/src/utils.ts` around lines 16 - 23, The hexToBytes function
currently silently coerces malformed input; update hexToBytes to validate the
input string first and throw a descriptive Error when invalid: ensure hex is a
non-null string, has even length (hex.length % 2 === 0) and contains only hex
characters (0-9, a-f, A-F) — e.g., test with a regex like /^[0-9a-fA-F]+$/ — and
only then proceed to parse each byte (preserve existing logic using
Number.parseInt with radix 16); include a clear error message mentioning
"hexToBytes" and the reason (odd length or invalid characters).

Comment on lines +81 to +87
it("throws on unknown id", () => {
const reg = new StreamRegistry();
expect(() => reg.resourceOf(999)).toThrow("unknown id 999");
expect(() => reg.scopeOf(999)).toThrow("unknown id 999");
expect(() => reg.appendChunk(999, {})).toThrow("unknown id 999");
expect(() => reg.chunksOf(999)).toThrow("unknown id 999");
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Error message format mismatch.

Tests expect "unknown id 999" but StreamRegistry throws "StreamRegistry: unknown id ${id}". These assertions will fail.

Proposed fix
   it("throws on unknown id", () => {
     const reg = new StreamRegistry();
-    expect(() => reg.resourceOf(999)).toThrow("unknown id 999");
-    expect(() => reg.scopeOf(999)).toThrow("unknown id 999");
-    expect(() => reg.appendChunk(999, {})).toThrow("unknown id 999");
-    expect(() => reg.chunksOf(999)).toThrow("unknown id 999");
+    expect(() => reg.resourceOf(999)).toThrow("StreamRegistry: unknown id 999");
+    expect(() => reg.scopeOf(999)).toThrow("StreamRegistry: unknown id 999");
+    expect(() => reg.appendChunk(999, {})).toThrow("StreamRegistry: unknown id 999");
+    expect(() => reg.chunksOf(999)).toThrow("StreamRegistry: unknown id 999");
   });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
it("throws on unknown id", () => {
const reg = new StreamRegistry();
expect(() => reg.resourceOf(999)).toThrow("unknown id 999");
expect(() => reg.scopeOf(999)).toThrow("unknown id 999");
expect(() => reg.appendChunk(999, {})).toThrow("unknown id 999");
expect(() => reg.chunksOf(999)).toThrow("unknown id 999");
});
it("throws on unknown id", () => {
const reg = new StreamRegistry();
expect(() => reg.resourceOf(999)).toThrow("StreamRegistry: unknown id 999");
expect(() => reg.scopeOf(999)).toThrow("StreamRegistry: unknown id 999");
expect(() => reg.appendChunk(999, {})).toThrow("StreamRegistry: unknown id 999");
expect(() => reg.chunksOf(999)).toThrow("StreamRegistry: unknown id 999");
});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stardb/test/stream-utils.test.ts` around lines 81 - 87, The test
fails because StreamRegistry methods throw errors prefixed with "StreamRegistry:
..."; update the error messages in StreamRegistry (the throw sites used by
resourceOf, scopeOf, appendChunk, chunksOf) to produce "unknown id {id}" (i.e.,
remove the "StreamRegistry: " prefix) so the thrown message exactly matches the
test expectation; locate the throw statements in the StreamRegistry class and
change the message formatting to `unknown id ${id}` for those methods.

Comment thread vitest.config.ts
Comment on lines +34 to 43
branches: 65,
functions: 75,
lines: 70,
statements: 70,
"packages/stardb/src/**/*.ts": {
branches: 90,
branches: 75,
functions: 100,
lines: 95,
statements: 95,
lines: 93,
statements: 93,
},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail
# Verify which package-scoped coverage thresholds are currently configured.
rg -n -C2 '"packages/.*/src/\*\*/\*\.ts": \{' vitest.config.ts

Repository: strawgate/o11ykit

Length of output: 210


🏁 Script executed:

cat -n vitest.config.ts | head -50

Repository: strawgate/o11ykit

Length of output: 2004


Add package-scoped thresholds for newly included packages.

Lines 24-25 add packages/o11ylogsdb/src/**/*.ts and packages/o11ytracesdb/src/**/*.ts to coverage without scoped thresholds. Add explicit thresholds for both so coverage drift in one package cannot be masked by another.

Suggested config
       thresholds: {
         branches: 65,
         functions: 75,
         lines: 70,
         statements: 70,
         "packages/stardb/src/**/*.ts": {
           branches: 75,
           functions: 100,
           lines: 93,
           statements: 93,
         },
+        "packages/o11ylogsdb/src/**/*.ts": {
+          branches: 65,
+          functions: 75,
+          lines: 70,
+          statements: 70,
+        },
+        "packages/o11ytracesdb/src/**/*.ts": {
+          branches: 65,
+          functions: 75,
+          lines: 70,
+          statements: 70,
+        },
       },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
branches: 65,
functions: 75,
lines: 70,
statements: 70,
"packages/stardb/src/**/*.ts": {
branches: 90,
branches: 75,
functions: 100,
lines: 95,
statements: 95,
lines: 93,
statements: 93,
},
branches: 65,
functions: 75,
lines: 70,
statements: 70,
"packages/stardb/src/**/*.ts": {
branches: 75,
functions: 100,
lines: 93,
statements: 93,
},
"packages/o11ylogsdb/src/**/*.ts": {
branches: 65,
functions: 75,
lines: 70,
statements: 70,
},
"packages/o11ytracesdb/src/**/*.ts": {
branches: 65,
functions: 75,
lines: 70,
statements: 70,
},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@vitest.config.ts` around lines 34 - 43, The coverage config added new package
patterns "packages/o11ylogsdb/src/**/*.ts" and
"packages/o11ytracesdb/src/**/*.ts" but no package-scoped thresholds were set;
update the coverage thresholds object (the same structure used for
"packages/stardb/src/**/*.ts") to include entries for both
"packages/o11ylogsdb/src/**/*.ts" and "packages/o11ytracesdb/src/**/*.ts" with
explicit numeric values for branches, functions, lines, and statements (e.g.,
mirror the style used for the stardb entry), so each package has its own
minimums and coverage drift in one package cannot be masked by another.

…stardb

Deep shared code extraction — net -620 LOC across engines:

**ByteBuf + ByteReader (stardb/binary.ts)**
- Generic growable write buffer and sequential reader
- Supports u8/u16/u32/u64, float64, unsigned varint, signed
  zigzag varint, length-prefixed strings, section bookkeeping
- Replaces 3 copies (2 in logsdb codecs, 1 in tracesdb codec)
- ~400 LOC of duplication eliminated

**Chunk wire format (stardb/chunk-wire.ts)**
- Generic serialize/deserialize parameterized by magic bytes
- Both logsdb (OLDB) and tracesdb (OTDB) now delegate to it
- Consistent error handling, buffer-offset-safe DataView

**FNV-1a byte hash (stardb/utils.ts)**
- fnv1aBytes() replaces copy in tracesdb bloom.ts
- bytesEqual() utility also added

**Codec utils (o11ylogsdb/codec-utils.ts)**
- anyValueToJson/jsonToAnyValue — removed 3 identical copies
- extractVarsAgainstTemplate — removed 3 identical copies
- Internal dedup within logsdb (shared by chunk + 3 codecs)

**Additional hex cleanup**
- Removed hexFromBytes duplicate in tracesdb/query.ts
- Removed bufToHex duplicate in tracesdb/bloom.ts
- Removed nowMillis duplicate in logsdb/query.ts

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dosubot dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Apr 28, 2026
Comment on lines +146 to +155
readU16(): number {
const v = this.view.getUint16(this.pos, true);
this.pos += 2;
return v;
}

readU32(): number {
const v = this.view.getUint32(this.pos, true);
this.pos += 4;
return v;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Edge Case: ByteReader.readU16/readU32/readFloat64 missing bounds checks

readU8() and readU64() have explicit bounds checks that throw a descriptive RangeError("ByteReader: unexpected end of buffer"), but readU16() (line 146), readU32() (line 152), and readFloat64() (line 166) do not. On truncated input, the DataView will throw a generic RangeError from the engine (e.g., "Offset is outside the bounds of the DataView") with no context about the reader position or buffer size.

This is inconsistent with the other read methods and makes debugging truncated-chunk issues harder. readU32 in particular is used in the hot readSection() path and in chunk-wire deserialization.

The old ByteCursor code had explicit bounds checks on every read method.

Suggested fix:

readU16(): number {
  if (this.pos + 2 > this.buf.length)
    throw new RangeError("ByteReader: unexpected end of buffer");
  const v = this.view.getUint16(this.pos, true);
  this.pos += 2;
  return v;
}
// Same pattern for readU32 (+4) and readFloat64 (+8)

Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion

@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 28, 2026

Note

Your trial team has used its Gitar budget, so automatic reviews are paused. Upgrade now to unlock full capacity. Comment "Gitar review" to trigger a review manually.
Learn more about usage limits

Code Review 👍 Approved with suggestions 0 resolved / 3 findings

Query optimization and CI improvements effectively centralize shared logic. Please address missing bounds checks in ByteReader, potential false negatives for JSON-escaped NDJSON characters, and the reduced stardb coverage thresholds.

💡 Edge Case: Raw byte scan false negative for JSON-escaped characters in NDJSON

📄 packages/o11ylogsdb/src/query.ts:266-272 📄 packages/o11ylogsdb/src/query.ts:149-163 📄 packages/o11ylogsdb/src/query.ts:266-280

Even for NDJSON payloads, rawPayloadContains can produce false negatives when the bodyContains needle includes characters that get JSON-escaped (e.g., literal newlines ``, quotes ", backslashes ``). The raw bytes in the NDJSON buffer contain the escaped form (`\n`), not the original byte (`0x0A`), so a scan for the original character won't match.

This is a minor edge case because bodyContains searches in practice are almost always for printable ASCII/UTF-8 text that doesn't require JSON escaping, but it's worth documenting the limitation.

💡 Quality: Coverage thresholds significantly lowered for stardb

📄 vitest.config.ts:34-43

The stardb-specific branch coverage threshold dropped from 90% → 75%, and line/statement from 95% → 93%. The old code had an explicit comment: "Hold it to a strict threshold so regressions surface here instead of whatever *db package noticed first." Since new code (StreamRegistry, utils) was added to stardb in this PR, the thresholds should ideally go up or stay flat, not drop by 15 percentage points on branches. The global thresholds also dropped substantially (branches 71→65, functions 86→75, lines 82→70). While adding more source packages to coverage tracking naturally dilutes percentages, this large a drop may mask future regressions.

Suggested fix
Consider running `npx vitest --coverage` to see actual current coverage, then set thresholds 1-2% below the current values rather than padding with a large buffer. For stardb specifically, keeping branches ≥ 85% would preserve the original intent.
💡 Edge Case: ByteReader.readU16/readU32/readFloat64 missing bounds checks

📄 packages/stardb/src/binary.ts:146-155 📄 packages/stardb/src/binary.ts:166-170

readU8() and readU64() have explicit bounds checks that throw a descriptive RangeError("ByteReader: unexpected end of buffer"), but readU16() (line 146), readU32() (line 152), and readFloat64() (line 166) do not. On truncated input, the DataView will throw a generic RangeError from the engine (e.g., "Offset is outside the bounds of the DataView") with no context about the reader position or buffer size.

This is inconsistent with the other read methods and makes debugging truncated-chunk issues harder. readU32 in particular is used in the hot readSection() path and in chunk-wire deserialization.

The old ByteCursor code had explicit bounds checks on every read method.

Suggested fix
readU16(): number {
  if (this.pos + 2 > this.buf.length)
    throw new RangeError("ByteReader: unexpected end of buffer");
  const v = this.view.getUint16(this.pos, true);
  this.pos += 2;
  return v;
}
// Same pattern for readU32 (+4) and readFloat64 (+8)
🤖 Prompt for agents
Code Review: Query optimization and CI improvements effectively centralize shared logic. Please address missing bounds checks in ByteReader, potential false negatives for JSON-escaped NDJSON characters, and the reduced stardb coverage thresholds.

1. 💡 Edge Case: Raw byte scan false negative for JSON-escaped characters in NDJSON
   Files: packages/o11ylogsdb/src/query.ts:266-272, packages/o11ylogsdb/src/query.ts:149-163, packages/o11ylogsdb/src/query.ts:266-280

   Even for NDJSON payloads, `rawPayloadContains` can produce false negatives when the `bodyContains` needle includes characters that get JSON-escaped (e.g., literal newlines `
   `, quotes `"`, backslashes `\`). The raw bytes in the NDJSON buffer contain the escaped form (`\n`), not the original byte (`0x0A`), so a scan for the original character won't match.
   
   This is a minor edge case because `bodyContains` searches in practice are almost always for printable ASCII/UTF-8 text that doesn't require JSON escaping, but it's worth documenting the limitation.

2. 💡 Quality: Coverage thresholds significantly lowered for stardb
   Files: vitest.config.ts:34-43

   The stardb-specific branch coverage threshold dropped from 90% → 75%, and line/statement from 95% → 93%. The old code had an explicit comment: "Hold it to a strict threshold so regressions surface here instead of whatever *db package noticed first." Since new code (StreamRegistry, utils) was added to stardb in this PR, the thresholds should ideally go up or stay flat, not drop by 15 percentage points on branches. The global thresholds also dropped substantially (branches 71→65, functions 86→75, lines 82→70). While adding more source packages to coverage tracking naturally dilutes percentages, this large a drop may mask future regressions.

   Suggested fix:
   Consider running `npx vitest --coverage` to see actual current coverage, then set thresholds 1-2% below the current values rather than padding with a large buffer. For stardb specifically, keeping branches ≥ 85% would preserve the original intent.

3. 💡 Edge Case: ByteReader.readU16/readU32/readFloat64 missing bounds checks
   Files: packages/stardb/src/binary.ts:146-155, packages/stardb/src/binary.ts:166-170

   readU8() and readU64() have explicit bounds checks that throw a descriptive `RangeError("ByteReader: unexpected end of buffer")`, but readU16() (line 146), readU32() (line 152), and readFloat64() (line 166) do not. On truncated input, the DataView will throw a generic `RangeError` from the engine (e.g., "Offset is outside the bounds of the DataView") with no context about the reader position or buffer size.
   
   This is inconsistent with the other read methods and makes debugging truncated-chunk issues harder. readU32 in particular is used in the hot `readSection()` path and in chunk-wire deserialization.
   
   The old `ByteCursor` code had explicit bounds checks on every read method.

   Suggested fix:
   readU16(): number {
     if (this.pos + 2 > this.buf.length)
       throw new RangeError("ByteReader: unexpected end of buffer");
     const v = this.view.getUint16(this.pos, true);
     this.pos += 2;
     return v;
   }
   // Same pattern for readU32 (+4) and readFloat64 (+8)

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Important

Your trial ends in 5 days — upgrade now to keep code review, CI analysis, auto-apply, custom automations, and more.

Was this helpful? React with 👍 / 👎 | Gitar

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (2)
packages/stardb/src/utils.ts (1)

16-22: ⚠️ Potential issue | 🟠 Major

Reject malformed hex instead of silently coercing it.

Odd-length input is truncated here, and non-hex pairs end up as 0 bytes via Uint8Array coercion. Since this helper is now shared by chunk decoders, malformed IDs get silently corrupted instead of failing fast.

Proposed fix
 export function hexToBytes(hex: string): Uint8Array {
+  if ((hex.length & 1) !== 0) {
+    throw new Error("hexToBytes: hex string must have even length");
+  }
   const len = hex.length >>> 1;
   const out = new Uint8Array(len);
   for (let i = 0; i < len; i++) {
-    out[i] = Number.parseInt(hex.slice(i * 2, i * 2 + 2), 16);
+    const byte = hex.slice(i * 2, i * 2 + 2);
+    if (!/^[0-9a-fA-F]{2}$/.test(byte)) {
+      throw new Error("hexToBytes: invalid hex character");
+    }
+    out[i] = Number.parseInt(byte, 16);
   }
   return out;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stardb/src/utils.ts` around lines 16 - 22, The hexToBytes function
currently silently truncates odd-length strings and produces 0 bytes for non-hex
input; update hexToBytes to validate input first by (1) rejecting odd-length hex
strings and (2) ensuring the input contains only 0-9a-fA-F characters (or
validate each parsed pair yields a valid number), and throw a clear error when
validation fails; modify the hexToBytes implementation (the exported function
hexToBytes) to perform these checks before creating the Uint8Array and parsing
pairs so malformed IDs fail fast.
packages/o11ylogsdb/src/query.ts (1)

150-168: ⚠️ Potential issue | 🔴 Critical

Gate the raw-byte prune behind a policy capability.

This miss-path is still unsound for policies that reconstruct bodies from columns/templates. ColumnarDrainPolicy / TypedColumnarDrainPolicy can match after decode even when needle never appears contiguously in raw, and NDJSON bodies can also miss escaped substrings. A false negative here drops the whole chunk. Restrict this optimization to payloads that preserve body bytes verbatim, otherwise fall back to the normal decode path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/o11ylogsdb/src/query.ts` around lines 150 - 168, The raw-byte
fast-path (useBodyFastPath branch) can produce false negatives for policies that
reconstruct bodies (e.g., ColumnarDrainPolicy, TypedColumnarDrainPolicy) or for
formats that escape/transform body bytes; before running rawPayloadContains(raw,
needle) or calling codec.decode + readRecordsFromRaw, gate this optimization on
a clear policy/codec capability (e.g., policy.preservesBodyBytes() or a similar
flag on the policy/codec) and skip to the full decode path when that capability
is absent; update the check that sets useBodyFastPath (and the conditional
around rawPayloadContains) to consult that capability so only policies/codecs
that guarantee verbatim body bytes use the raw-byte prune.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/o11ylogsdb/src/codec-utils.ts`:
- Around line 32-41: jsonToAnyValue currently treats any object with a $bi or $b
key as a sentinel and eagerly converts it, which can corrupt normal objects and
throw on invalid inputs; change the logic in jsonToAnyValue to only treat an
object as a sentinel when the object has exactly one own property named "$bi" or
"$b", the value is a string, and the string passes simple validation (e.g. $bi
matches /^-?\d+$/ before BigInt() and $b matches hex /^[0-9a-fA-F]+$/ with even
length before hexToBytes()); if validation fails, fall back to decoding the
object as a normal map; also wrap BigInt conversion in try/catch to avoid
crashes and return the object map on error; refer to the jsonToAnyValue function
and the sentinel keys "$bi" and "$b" for where to update the checks.

In `@packages/stardb/src/binary.ts`:
- Around line 21-30: The ensure method can infinite-loop when initialCapacity is
0 because newCap starts at 0; update the constructor or ensure to enforce a
minimum capacity (e.g., set initialCapacity to at least 1 or ensure newCap =
Math.max(1, this.buf.length) before doubling). Specifically modify the
constructor (constructor(initialCapacity = 4096) / this.buf) or the ensure
method (ensure(needed: number) / newCap calculation) so that newCap is
initialized to a positive value before the while loop, then proceed with
doubling until newCap >= this.pos + needed.
- Around line 83-93: writeZigzagVarint accepts unbounded bigint but assumes
signed 64-bit inputs and pre-allocates 10 bytes, which can overflow for values
outside int64; add an explicit range check at the start of writeZigzagVarint to
verify value is within [-2n**63n, 2n**63n - 1n] and throw a RangeError if not,
then proceed with ensure(10) and the existing zigzag/varint encoding logic
(refer to the writeZigzagVarint function name and the constants used for
shifting).
- Around line 175-187: readUvarint currently uses 32-bit bitwise ops which
silently truncate high bits; change it to accumulate using arithmetic (e.g., add
(byte & 0x7f) * 2**shift or multiply by Math.pow(2, shift)) instead of |= and <<
so all bits are preserved during the loop in readUvarint, and then enforce
strict validation: reject overlong/overflowing encodings by checking when you
reach the 5th byte (shift >= 28) that any high bits beyond 32 bits are zero and
throw a RangeError for varint overflow or malformed overlong encodings, and also
ensure you reject any continuation bytes beyond the maximum length; reference
readUvarint (and ensure compatibility with writeUvarint’s uint32 contract).

---

Duplicate comments:
In `@packages/o11ylogsdb/src/query.ts`:
- Around line 150-168: The raw-byte fast-path (useBodyFastPath branch) can
produce false negatives for policies that reconstruct bodies (e.g.,
ColumnarDrainPolicy, TypedColumnarDrainPolicy) or for formats that
escape/transform body bytes; before running rawPayloadContains(raw, needle) or
calling codec.decode + readRecordsFromRaw, gate this optimization on a clear
policy/codec capability (e.g., policy.preservesBodyBytes() or a similar flag on
the policy/codec) and skip to the full decode path when that capability is
absent; update the check that sets useBodyFastPath (and the conditional around
rawPayloadContains) to consult that capability so only policies/codecs that
guarantee verbatim body bytes use the raw-byte prune.

In `@packages/stardb/src/utils.ts`:
- Around line 16-22: The hexToBytes function currently silently truncates
odd-length strings and produces 0 bytes for non-hex input; update hexToBytes to
validate input first by (1) rejecting odd-length hex strings and (2) ensuring
the input contains only 0-9a-fA-F characters (or validate each parsed pair
yields a valid number), and throw a clear error when validation fails; modify
the hexToBytes implementation (the exported function hexToBytes) to perform
these checks before creating the Uint8Array and parsing pairs so malformed IDs
fail fast.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 2ac0e19d-e80a-4f4a-8ea9-3338c9a8b92c

📥 Commits

Reviewing files that changed from the base of the PR and between c01a536 and bd6d0d9.

📒 Files selected for processing (17)
  • packages/o11ylogsdb/src/chunk.ts
  • packages/o11ylogsdb/src/codec-columnar.ts
  • packages/o11ylogsdb/src/codec-drain.ts
  • packages/o11ylogsdb/src/codec-typed.ts
  • packages/o11ylogsdb/src/codec-utils.ts
  • packages/o11ylogsdb/src/query.ts
  • packages/o11ylogsdb/test/chunk.test.ts
  • packages/o11ytracesdb/src/bloom.ts
  • packages/o11ytracesdb/src/chunk.ts
  • packages/o11ytracesdb/src/codec-columnar.ts
  • packages/o11ytracesdb/src/query.ts
  • packages/o11ytracesdb/test/wire-format-edge-cases.test.ts
  • packages/stardb/src/binary.ts
  • packages/stardb/src/chunk-wire.ts
  • packages/stardb/src/index.ts
  • packages/stardb/src/utils.ts
  • packages/stardb/test/public-api.test.ts

Comment thread packages/o11ylogsdb/src/codec-utils.ts Outdated
Comment on lines +21 to +30
constructor(initialCapacity = 4096) {
this.buf = new Uint8Array(initialCapacity);
this.view = new DataView(this.buf.buffer);
}

ensure(needed: number): void {
if (this.pos + needed <= this.buf.length) return;
let newCap = this.buf.length * 2;
while (newCap < this.pos + needed) newCap *= 2;
const next = new Uint8Array(newCap);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle zero-capacity buffers to avoid growth-loop lockup

If initialCapacity is 0, Line 28 sets newCap to 0; Line 29 then loops forever. Guard minimum capacity before doubling.

Proposed fix
   constructor(initialCapacity = 4096) {
-    this.buf = new Uint8Array(initialCapacity);
+    const cap = Math.max(1, initialCapacity);
+    this.buf = new Uint8Array(cap);
     this.view = new DataView(this.buf.buffer);
   }

   ensure(needed: number): void {
     if (this.pos + needed <= this.buf.length) return;
-    let newCap = this.buf.length * 2;
+    let newCap = Math.max(1, this.buf.length * 2);
     while (newCap < this.pos + needed) newCap *= 2;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
constructor(initialCapacity = 4096) {
this.buf = new Uint8Array(initialCapacity);
this.view = new DataView(this.buf.buffer);
}
ensure(needed: number): void {
if (this.pos + needed <= this.buf.length) return;
let newCap = this.buf.length * 2;
while (newCap < this.pos + needed) newCap *= 2;
const next = new Uint8Array(newCap);
constructor(initialCapacity = 4096) {
const cap = Math.max(1, initialCapacity);
this.buf = new Uint8Array(cap);
this.view = new DataView(this.buf.buffer);
}
ensure(needed: number): void {
if (this.pos + needed <= this.buf.length) return;
let newCap = Math.max(1, this.buf.length * 2);
while (newCap < this.pos + needed) newCap *= 2;
const next = new Uint8Array(newCap);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stardb/src/binary.ts` around lines 21 - 30, The ensure method can
infinite-loop when initialCapacity is 0 because newCap starts at 0; update the
constructor or ensure to enforce a minimum capacity (e.g., set initialCapacity
to at least 1 or ensure newCap = Math.max(1, this.buf.length) before doubling).
Specifically modify the constructor (constructor(initialCapacity = 4096) /
this.buf) or the ensure method (ensure(needed: number) / newCap calculation) so
that newCap is initialized to a positive value before the while loop, then
proceed with doubling until newCap >= this.pos + needed.

Comment on lines +83 to +93
writeZigzagVarint(value: bigint): void {
this.ensure(10);
const zz = (value << 1n) ^ (value >> 63n);
let v = zz;
do {
let byte = Number(v & 0x7fn);
v >>= 7n;
if (v > 0n) byte |= 0x80;
this.buf[this.pos++] = byte;
} while (v > 0n);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

For JavaScript BigInt, does right-shift (>>) use sign extension with unbounded precision (not fixed 64-bit width)?

💡 Result:

Yes, the right-shift operator (>>) on JavaScript BigInt uses sign extension with unbounded precision, not fixed 64-bit width. For positive BigInts, it shifts in zeros from the left (infinite leading zeros). For negative BigInts, it shifts in ones from the left (infinite leading ones), preserving the sign and allowing arbitrary-precision arithmetic shifts.

Citations:


🏁 Script executed:

# Check the method context and any type annotations
cd packages/stardb/src && head -n 100 binary.ts | tail -n 30

Repository: strawgate/o11ykit

Length of output: 759


🏁 Script executed:

# Search for any type constraints or documentation about writeZigzagVarint
rg -A 5 -B 5 "writeZigzagVarint" packages/stardb/src/

Repository: strawgate/o11ykit

Length of output: 699


🏁 Script executed:

# Check if there are tests for this method
fd -e test.ts -e spec.ts -e tests.ts packages/stardb | head -5

Repository: strawgate/o11ykit

Length of output: 427


🏁 Script executed:

# Look for any other zigzag methods or related varint code
rg "zigzag|varint" packages/stardb/src/ -i

Repository: strawgate/o11ykit

Length of output: 1217


🏁 Script executed:

# Check for usages of writeZigzagVarint
rg "writeZigzagVarint" packages/stardb/src/ -B 2 -A 2

Repository: strawgate/o11ykit

Length of output: 380


🏁 Script executed:

# Check the readZigzagVarint to see how it handles bounds
rg -A 15 "readZigzagVarint" packages/stardb/src/binary.ts

Repository: strawgate/o11ykit

Length of output: 606


🏁 Script executed:

# Look at the ByteBuf class structure to understand context
sed -n '1,100p' packages/stardb/src/binary.ts

Repository: strawgate/o11ykit

Length of output: 2553


🏁 Script executed:

# Search for where writeZigzagVarint is called
rg -r "$0" --type ts "writeZigzagVarint\(" packages/stardb/

Repository: strawgate/o11ykit

Length of output: 125


🏁 Script executed:

# Look for tests related to binary.ts
fd --full-path "packages/stardb.*test|spec" . -type f | head -10

Repository: strawgate/o11ykit

Length of output: 232


🏁 Script executed:

# Calculate max bytes needed for zigzag-encoded int64
python3 <<'EOF'
# For a signed int64, range is [-2^63, 2^63-1]
# Zigzag encoding: (n << 1) ^ (n >> 63)
# For n = -2^63: zz = (-2^63 << 1) ^ (-2^63 >> 63)
#              = -2^64 ^ -1 = 2^64 - 1
# For n = 2^63-1: zz = ((2^63-1) << 1) ^ 0 = 2^64 - 2

# In varint, each byte carries 7 bits. Max value 2^64-1 needs:
import math
max_val = (1 << 64) - 1
bytes_needed = math.ceil(math.log2(max_val + 1) / 7)
print(f"Max value for int64 zigzag: {max_val}")
print(f"Bytes needed in varint: {bytes_needed}")

# What about values OUTSIDE int64 range?
out_of_range = 1 << 65
bytes_for_out_of_range = math.ceil(math.log2(out_of_range + 1) / 7)
print(f"\nOut-of-range value (2^65): {out_of_range}")
print(f"Bytes needed in varint: {bytes_for_out_of_range}")
EOF

Repository: strawgate/o11ykit

Length of output: 213


🏁 Script executed:

# Check if readZigzagVarint max 70-bit shift aligns with int64
python3 <<'EOF'
# readZigzagVarint throws if shift > 70n
# In varint, 10 bytes can carry: 10 * 7 = 70 bits
# So 10 bytes max shift is 70 bits, which encodes values up to 2^70 - 1
# But after zigzag decode: (result >> 1) ^ -(result & 1)
# For 2^70-1 >> 1 ≈ 2^69, which when decoded back could be larger than int64

# Let's verify int64 zigzag bounds
min_i64 = -(1 << 63)
max_i64 = (1 << 63) - 1

# Zigzag encode
zz_min = ((min_i64 << 1) ^ (min_i64 >> 63)) & ((1 << 64) - 1)
zz_max = ((max_i64 << 1) ^ (max_i64 >> 63)) & ((1 << 64) - 1)

print(f"Int64 range: [{min_i64}, {max_i64}]")
print(f"Zigzag range: [{zz_min}, {zz_max}]")
print(f"Zigzag max bits: {zz_max.bit_length()}")

# Varint encoding needs ceiling(bits/7) bytes
import math
bytes_needed = math.ceil(zz_max.bit_length() / 7)
print(f"Bytes needed for max int64 zigzag: {bytes_needed}")
EOF

Repository: strawgate/o11ykit

Length of output: 235


Add int64 bounds validation to writeZigzagVarint

The method accepts unbounded bigint but uses zigzag encoding designed for signed 64-bit integers and pre-allocates exactly 10 bytes. Out-of-range inputs (e.g., 2^70) produce encoded values requiring more than 10 bytes, causing buffer overflow. Add explicit int64 range validation:

Fix
  writeZigzagVarint(value: bigint): void {
+   const MIN_I64 = -(1n << 63n);
+   const MAX_I64 = (1n << 63n) - 1n;
+   if (value < MIN_I64 || value > MAX_I64) {
+     throw new RangeError("ByteBuf: zigzag varint value out of int64 range");
+   }
    this.ensure(10);
    const zz = (value << 1n) ^ (value >> 63n);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
writeZigzagVarint(value: bigint): void {
this.ensure(10);
const zz = (value << 1n) ^ (value >> 63n);
let v = zz;
do {
let byte = Number(v & 0x7fn);
v >>= 7n;
if (v > 0n) byte |= 0x80;
this.buf[this.pos++] = byte;
} while (v > 0n);
}
writeZigzagVarint(value: bigint): void {
const MIN_I64 = -(1n << 63n);
const MAX_I64 = (1n << 63n) - 1n;
if (value < MIN_I64 || value > MAX_I64) {
throw new RangeError("ByteBuf: zigzag varint value out of int64 range");
}
this.ensure(10);
const zz = (value << 1n) ^ (value >> 63n);
let v = zz;
do {
let byte = Number(v & 0x7fn);
v >>= 7n;
if (v > 0n) byte |= 0x80;
this.buf[this.pos++] = byte;
} while (v > 0n);
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stardb/src/binary.ts` around lines 83 - 93, writeZigzagVarint
accepts unbounded bigint but assumes signed 64-bit inputs and pre-allocates 10
bytes, which can overflow for values outside int64; add an explicit range check
at the start of writeZigzagVarint to verify value is within [-2n**63n, 2n**63n -
1n] and throw a RangeError if not, then proceed with ensure(10) and the existing
zigzag/varint encoding logic (refer to the writeZigzagVarint function name and
the constants used for shifting).

Comment on lines +175 to +187
readUvarint(): number {
let result = 0;
let shift = 0;
let byte: number;
do {
if (this.pos >= this.buf.length) throw new RangeError("ByteReader: unexpected end of buffer");
byte = this.buf[this.pos++] as number;
result |= (byte & 0x7f) << shift;
shift += 7;
if (shift > 35) throw new RangeError("ByteReader: varint overflow");
} while (byte & 0x80);
return result >>> 0;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

In JavaScript/TypeScript, are bitwise operators (|, <<) evaluated using 32-bit signed integers, and can this truncate values above 32 bits?

💡 Result:

Yes, in JavaScript and TypeScript, bitwise operators such as | (bitwise OR) and << (left shift) evaluate their operands as 32-bit signed integers using the ToInt32 abstract operation, which truncates values above 32 bits by taking the number modulo 2^32 and adjusting for the sign bit if necessary. This effectively discards bits beyond the 32nd bit. For example: - Numbers larger than 2^32 - 1 have higher bits discarded: 2^32 + 1 becomes 1 after ToInt32. - Left shifts << use the operand converted to 32-bit signed int and shift by (right operand modulo 32) bits. - The result is always a Number representing a 32-bit signed integer value (-2^31 to 2^31 - 1). This behavior is identical in TypeScript since it compiles to JavaScript. For arbitrary-precision bitwise operations, use BigInt literals (e.g., 1n | 2n), but standard number bitwise ops remain 32-bit. Example demonstrating truncation: console.log((1n << 32n) === 0n); // true for BigInt, but for numbers: console.log(4294967296 | 1); // 1 (2^32 | 1 becomes 0 | 1 = 1) console.log(5 << 33); // 10 (shift amount 33 % 32 = 1)

Citations:


🏁 Script executed:

# Check the overall structure and how readUvarint is used
cd packages/stardb/src && wc -l binary.ts && head -20 binary.ts

Repository: strawgate/o11ykit

Length of output: 652


🏁 Script executed:

# Search for readUvarint usage in the codebase
rg "readUvarint" --type ts --type js -B 2 -A 2

Repository: strawgate/o11ykit

Length of output: 19690


🏁 Script executed:

# Check if there are tests for readUvarint
rg "readUvarint|ByteReader" --type ts --type js --iglob "*test*" --iglob "*spec*"

Repository: strawgate/o11ykit

Length of output: 117


🏁 Script executed:

# Examine the full context around readUvarint and writeUvarint
sed -n '170,200p' packages/stardb/src/binary.ts

Repository: strawgate/o11ykit

Length of output: 935


🏁 Script executed:

# Check writeUvarint to understand the expected range
sed -n '160,175p' packages/stardb/src/binary.ts

Repository: strawgate/o11ykit

Length of output: 414


🏁 Script executed:

# Look for writeUvarint implementation and tests
sed -n '1,100p' packages/stardb/src/binary.ts | grep -A 15 "writeUvarint"

Repository: strawgate/o11ykit

Length of output: 466


🏁 Script executed:

# Search for any tests of readUvarint/writeUvarint
rg "readUvarint|writeUvarint" --type ts -B 2 -A 5 packages/stardb/test

Repository: strawgate/o11ykit

Length of output: 43


🏁 Script executed:

# Check if there's any documentation or comments about varint bounds
rg "varint|32-bit|overflow" --type ts packages/stardb/src/binary.ts -B 1 -A 1

Repository: strawgate/o11ykit

Length of output: 1196


Reject overflowing uvarints instead of silently truncating them

Line 182 uses 32-bit bitwise operations (|=, <<), which silently truncate values during the loop before the overflow check at line 185 can trigger. A 5-byte varint encoding a value > 2^31 will lose high bits. Although writeUvarint() pre-conditions input to uint32, malformed input (accidental or adversarial) can cause data corruption.

Use arithmetic operations instead of bitwise ops to preserve all bits, and validate the 5th byte more aggressively to reject overlong encodings.

Proposed fix
   readUvarint(): number {
-    let result = 0;
-    let shift = 0;
-    let byte: number;
-    do {
-      if (this.pos >= this.buf.length) throw new RangeError("ByteReader: unexpected end of buffer");
-      byte = this.buf[this.pos++] as number;
-      result |= (byte & 0x7f) << shift;
-      shift += 7;
-      if (shift > 35) throw new RangeError("ByteReader: varint overflow");
-    } while (byte & 0x80);
-    return result >>> 0;
+    let result = 0;
+    for (let i = 0; i < 5; i++) {
+      if (this.pos >= this.buf.length) throw new RangeError("ByteReader: unexpected end of buffer");
+      const byte = this.buf[this.pos++] as number;
+      if (i === 4 && (byte & 0xf0) !== 0) throw new RangeError("ByteReader: varint overflow");
+      result += (byte & 0x7f) * 2 ** (7 * i);
+      if ((byte & 0x80) === 0) return result;
+    }
+    throw new RangeError("ByteReader: varint overflow");
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stardb/src/binary.ts` around lines 175 - 187, readUvarint currently
uses 32-bit bitwise ops which silently truncate high bits; change it to
accumulate using arithmetic (e.g., add (byte & 0x7f) * 2**shift or multiply by
Math.pow(2, shift)) instead of |= and << so all bits are preserved during the
loop in readUvarint, and then enforce strict validation: reject
overlong/overflowing encodings by checking when you reach the 5th byte (shift >=
28) that any high bits beyond 32 bits are zero and throw a RangeError for varint
overflow or malformed overlong encodings, and also ensure you reject any
continuation bytes beyond the maximum length; reference readUvarint (and ensure
compatibility with writeUvarint’s uint32 contract).

Coverage:
- binary.ts: 1.8% → 100% (28 comprehensive tests)
- chunk-wire.ts: 6.5% → 100% (14 tests: round-trip, errors, size)
- any-value.ts: new, 100% (28 tests for JSON/equals/findAttribute)
- utils.ts: 54.8% → 96.8% (fnv1aBytes + bytesEqual tests)
- Total stardb: 131 tests across 7 test files

New shared code in stardb:
- anyValueToJson/jsonToAnyValue: moved from logsdb/codec-utils.ts
- anyValueEquals: moved from tracesdb/query.ts (48 LOC)
- findAttribute: new helper for KeyValue[] lookup by key
- StreamKey type: consolidated from logsdb + tracesdb types.ts

Dedup fixes:
- tracesdb/query.ts: removed local bytesEqual + anyValueEquals (~55 LOC)
- logsdb/compact.ts: removed local nowMillis (imports from stardb)
- logsdb/codec-utils.ts: re-exports from stardb instead of reimplementing
- logsdb/types.ts + tracesdb/types.ts: import StreamKey from stardb

All 1129 tests pass, lint clean, typecheck clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/o11ylogsdb/src/types.ts`:
- Around line 18-29: The module header comment incorrectly implies StreamKey is
logs-specific; update the top-of-file documentation in types.ts to state that
StreamKey is re-exported from "stardb" (not defined by the o11y logs API),
remove or reword any phrasing that claims StreamKey is logs-only or owned by
this package, and clarify that the exported types (AnyValue,
InstrumentationScope, KeyValue, Resource, SeverityText, StreamId, StreamKey)
include a StreamKey alias sourced from stardb so consumers understand its
origin.

In `@packages/stardb/src/any-value.ts`:
- Around line 22-24: The object maps created with {} in
packages/stardb/src/any-value.ts can be polluted by keys like "__proto__";
replace those plain object literals with null-prototype objects (e.g., use
Object.create(null)) wherever maps are built—specifically in the block that
creates out and assigns anyValueToJson(val) for each Object.entries(v) and in
the other map construction around lines 39-41—to ensure keys cannot mutate the
prototype chain while preserving the existing assignment logic and return
values.
- Around line 36-37: The decoder currently treats any object with properties $bi
or $b as a tagged scalar, which breaks maps that include those keys alongside
others; update the checks in the decoding logic (the code referencing obj.$bi
and obj.$b) to only decode as a tagged scalar when the object contains exactly
that single key (e.g., Object.keys(obj).length === 1 && typeof obj.$bi ===
"string" for the BigInt branch, and similarly for $b using hexToBytes) so maps
like { "$bi": "x", other: 1 } are preserved as maps.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: f40e3dd7-130b-4b83-ae84-b74c83314e89

📥 Commits

Reviewing files that changed from the base of the PR and between bd6d0d9 and 1966013.

📒 Files selected for processing (13)
  • packages/o11ylogsdb/src/codec-utils.ts
  • packages/o11ylogsdb/src/compact.ts
  • packages/o11ylogsdb/src/types.ts
  • packages/o11ytracesdb/src/query.ts
  • packages/o11ytracesdb/src/types.ts
  • packages/stardb/src/any-value.ts
  • packages/stardb/src/index.ts
  • packages/stardb/src/types.ts
  • packages/stardb/test/any-value.test.ts
  • packages/stardb/test/binary.test.ts
  • packages/stardb/test/chunk-wire.test.ts
  • packages/stardb/test/public-api.test.ts
  • packages/stardb/test/stream-utils.test.ts

Comment on lines +18 to +29
StreamKey,
} from "stardb";

export type { AnyValue, InstrumentationScope, KeyValue, Resource, SeverityText, StreamId };
export type {
AnyValue,
InstrumentationScope,
KeyValue,
Resource,
SeverityText,
StreamId,
StreamKey,
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Update module header docs to match the new StreamKey source.

After re-exporting StreamKey from stardb, the top comment still implies StreamKey is logs-specific. Please align the comment to avoid API ownership confusion.

Suggested doc-only patch
- * paths. The remaining types in this file (LogRecord, BodyKind,
- * StreamKey) are specific to the logs engine.
+ * paths. The remaining types in this file (LogRecord, BodyKind)
+ * are specific to the logs engine.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/o11ylogsdb/src/types.ts` around lines 18 - 29, The module header
comment incorrectly implies StreamKey is logs-specific; update the top-of-file
documentation in types.ts to state that StreamKey is re-exported from "stardb"
(not defined by the o11y logs API), remove or reword any phrasing that claims
StreamKey is logs-only or owned by this package, and clarify that the exported
types (AnyValue, InstrumentationScope, KeyValue, Resource, SeverityText,
StreamId, StreamKey) include a StreamKey alias sourced from stardb so consumers
understand its origin.

Comment on lines +22 to +24
const out: Record<string, unknown> = {};
for (const [k, val] of Object.entries(v)) out[k] = anyValueToJson(val);
return out;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use null-prototype objects to avoid __proto__ key corruption/pollution.

Line 22-24 and Line 39-41 build maps with {} and assign arbitrary keys. Keys like "__proto__" can mutate object prototype and break data fidelity/security expectations.

Proposed fix
   if (typeof v === "object") {
-    const out: Record<string, unknown> = {};
+    const out = Object.create(null) as Record<string, unknown>;
     for (const [k, val] of Object.entries(v)) out[k] = anyValueToJson(val);
     return out;
   }
-    const out: Record<string, AnyValue> = {};
+    const out = Object.create(null) as Record<string, AnyValue>;
     for (const [k, v] of Object.entries(obj)) out[k] = jsonToAnyValue(v);
     return out;

Also applies to: 39-41

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stardb/src/any-value.ts` around lines 22 - 24, The object maps
created with {} in packages/stardb/src/any-value.ts can be polluted by keys like
"__proto__"; replace those plain object literals with null-prototype objects
(e.g., use Object.create(null)) wherever maps are built—specifically in the
block that creates out and assigns anyValueToJson(val) for each
Object.entries(v) and in the other map construction around lines 39-41—to ensure
keys cannot mutate the prototype chain while preserving the existing assignment
logic and return values.

Comment on lines +36 to +37
if (typeof obj.$bi === "string") return BigInt(obj.$bi);
if (typeof obj.$b === "string") return hexToBytes(obj.$b);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Prevent $bi / $b tag collisions when decoding maps.

Line 36 and Line 37 currently treat any object containing $bi or $b as a tagged scalar, so maps like { "$bi": "x", other: 1 } cannot round-trip correctly.

Proposed fix
 export function jsonToAnyValue(j: unknown): AnyValue {
   if (j === null) return null;
   if (typeof j === "object" && j !== null) {
     const obj = j as Record<string, unknown>;
-    if (typeof obj.$bi === "string") return BigInt(obj.$bi);
-    if (typeof obj.$b === "string") return hexToBytes(obj.$b);
+    const keys = Object.keys(obj);
+    if (keys.length === 1 && typeof obj.$bi === "string") return BigInt(obj.$bi);
+    if (keys.length === 1 && typeof obj.$b === "string") return hexToBytes(obj.$b);
     if (Array.isArray(j)) return j.map(jsonToAnyValue);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stardb/src/any-value.ts` around lines 36 - 37, The decoder currently
treats any object with properties $bi or $b as a tagged scalar, which breaks
maps that include those keys alongside others; update the checks in the decoding
logic (the code referencing obj.$bi and obj.$b) to only decode as a tagged
scalar when the object contains exactly that single key (e.g.,
Object.keys(obj).length === 1 && typeof obj.$bi === "string" for the BigInt
branch, and similarly for $b using hexToBytes) so maps like { "$bi": "x", other:
1 } are preserved as maps.

strawgate and others added 2 commits April 28, 2026 11:48
New stardb utilities:
- timeRangeOverlaps(min, max, from?, to?): shared chunk pruning predicate
  Used by logsdb/query.ts and tracesdb/query.ts
- bytesToUuid(bytes): format 16 bytes as 8-4-4-4-12 UUID string
- uuidToBytes(str): parse UUID (with or without dashes) into 16 bytes
- Upgraded hexLookup table to match the hot-path-optimized version
- hexToBytes now uses hexNibble (faster than parseInt for hex parsing)

Sidecar NDJSON consolidation (logsdb internal):
- encodeSidecar() in codec-utils.ts replaces 2 identical 50-line blocks
  in codec-columnar.ts and codec-typed.ts
- applySidecar() + SidecarEntry type replaces 2 identical decode blocks
- Net ~100 LOC removed from codec files

UUID helper consolidation (logsdb):
- Removed BYTE_TO_HEX table, hexNibble, bytesToUuid, bytesToUuidNodash,
  uuidToBytes, uuidNodashToBytes from codec-typed.ts (~100 LOC)
- bytesToUuidNodash → bytesToHex (identical for 16-byte input)
- uuidNodashToBytes → hexToBytes (identical for 32-char hex)

Tests: 15 new tests (UUID round-trip, timeRangeOverlaps edge cases)
All 1144 tests pass, typecheck clean, lint clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace .find((a) => a.key === ...) patterns with shared findAttribute()
in query.ts, aggregate.ts, and correlate.ts. Removes ~30 LOC of attribute
lookup boilerplate and removes unused KeyValue import from aggregate.ts.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (3)
packages/o11ylogsdb/src/query.ts (1)

150-164: ⚠️ Potential issue | 🔴 Critical

Raw-byte pruning is unsound for templated body encodings.

The raw byte scan at line 160 checks if needle exists in the decompressed payload. For templated policies (Drain, ColumnarDrain, TypedColumnarDrain), the final body is reconstructed by combining template literals with variable values. If the needle spans a template boundary or exists only after reconstruction, this chunk is incorrectly pruned.

Example: Template "Error in <*>" with var "module foo" reconstructs to "Error in module foo". A search for "in module" won't match the raw payload.

Consider adding a policy capability flag to gate this fast path:

-  const useBodyFastPath = spec.bodyContains !== undefined && !spec.bodyLeafEquals;
+  const useBodyFastPath =
+    spec.bodyContains !== undefined &&
+    !spec.bodyLeafEquals &&
+    (policy?.supportsRawBodyScan?.() ?? true);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/o11ylogsdb/src/query.ts` around lines 150 - 164, The raw-byte fast
path (guarded by useBodyFastPath and implemented via rawPayloadContains after
codec.decode) is unsafe for templated body encodings (e.g., Drain,
ColumnarDrain, TypedColumnarDrain) because a needle can appear only after
template reconstruction; update the logic to skip the raw prune when the chunk's
codec or policy indicates templating by adding a capability flag (e.g.,
supportsRawPruning or isTemplated) on the policy/codec metadata and gate the
useBodyFastPath check (or the rawPayloadContains call) against that flag so
templated encodings always reconstruct and search the rendered body instead of
being pruned.
packages/stardb/test/stream-utils.test.ts (1)

91-97: ⚠️ Potential issue | 🟡 Minor

Error message assertions may fail.

Past review noted StreamRegistry throws "StreamRegistry: unknown id ${id}" but tests expect "unknown id 999". Verify the actual error message format or update assertions.

#!/bin/bash
# Check actual error message format in StreamRegistry
ast-grep --pattern 'throw new Error($MSG)'
rg -n "unknown id" packages/stardb/src/
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stardb/test/stream-utils.test.ts` around lines 91 - 97, The test
asserts "unknown id 999" but StreamRegistry actually throws a different message
format; update the assertions in the test for the methods resourceOf, scopeOf,
appendChunk, and chunksOf to match the real thrown message (e.g.
"StreamRegistry: unknown id 999") or use a regex/assertion that checks for the
substring "unknown id 999" (toThrow(/unknown id 999/)) so the expectations align
with StreamRegistry's actual error format.
packages/stardb/src/utils.ts (1)

48-79: ⚠️ Potential issue | 🟠 Major

Reject malformed hex/UUID input instead of silently coercing it.

hexNibble currently maps invalid chars to 0, so uuidToBytes/hexToBytes can decode bad input into wrong bytes. hexToBytes also accepts odd-length strings by truncation.

Proposed fix
 function hexNibble(ch: number): number {
   if (ch >= 0x30 && ch <= 0x39) return ch - 0x30;
   if (ch >= 0x61 && ch <= 0x66) return ch - 0x61 + 10;
   if (ch >= 0x41 && ch <= 0x46) return ch - 0x41 + 10;
-  return 0;
+  return -1;
 }
 
 /** Parse a UUID string (with or without dashes) into 16 bytes. */
 export function uuidToBytes(s: string): Uint8Array {
+  const hex = s.replaceAll("-", "");
+  if (hex.length !== 32) {
+    throw new Error("uuidToBytes: UUID must contain exactly 32 hex characters");
+  }
   const out = new Uint8Array(16);
-  let cur = 0;
-  for (let i = 0; i < s.length; i++) {
-    const ch = s.charCodeAt(i);
-    if (ch === 0x2d) continue; // dash
-    const hi = hexNibble(ch);
-    i++;
-    const lo = hexNibble(s.charCodeAt(i));
-    out[cur++] = (hi << 4) | lo;
+  for (let i = 0; i < 16; i++) {
+    const hi = hexNibble(hex.charCodeAt(i * 2));
+    const lo = hexNibble(hex.charCodeAt(i * 2 + 1));
+    if (hi < 0 || lo < 0) {
+      throw new Error("uuidToBytes: invalid hex character");
+    }
+    out[i] = (hi << 4) | lo;
   }
   return out;
 }
 
 /** Convert a hex string to a Uint8Array. */
 export function hexToBytes(hex: string): Uint8Array {
+  if ((hex.length & 1) !== 0) {
+    throw new Error("hexToBytes: hex string must have even length");
+  }
   const len = hex.length >>> 1;
   const out = new Uint8Array(len);
   for (let i = 0; i < len; i++) {
     const hi = hexNibble(hex.charCodeAt(i * 2));
     const lo = hexNibble(hex.charCodeAt(i * 2 + 1));
+    if (hi < 0 || lo < 0) {
+      throw new Error("hexToBytes: invalid hex character");
+    }
     out[i] = (hi << 4) | lo;
   }
   return out;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stardb/src/utils.ts` around lines 48 - 79, hexNibble currently maps
invalid characters to 0 which silently corrupts outputs; change hexNibble to
throw on non-hex characters and update callers to surface that error, validate
input lengths in hexToBytes and uuidToBytes (throw on odd-length hex strings and
on UUID strings that end prematurely after skipping dashes), and ensure
uuidToBytes skips only '-' but otherwise checks for malformed characters using
hexNibble so malformed hex/UUID input is rejected rather than coerced; refer to
hexNibble, uuidToBytes, and hexToBytes when making these checks and error
throws.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/o11ylogsdb/src/codec-columnar.ts`:
- Line 284: The call to encodeSidecar passes a Uint8Array (kinds) but
encodeSidecar's parameter type is readonly number[], causing a type error;
update the encodeSidecar signature in codec-utils.ts to accept Uint8Array (or a
union type like readonly number[] | Uint8Array) or change its parameter to
ArrayLike<number> so it accepts kinds without casting, and then ensure any
internal logic that iterates over the array still works with Uint8Array (use
standard indexing or for...of on the parameter) so the call
encodeSidecar(records, kinds, buf) typechecks.

In `@packages/o11ylogsdb/src/codec-typed.ts`:
- Line 554: The type error comes from calling encodeSidecar(records, kinds, buf)
where buf is a Uint8Array but encodeSidecar's signature expects readonly
number[]; update the encodeSidecar declaration in codec-utils.ts (and any
exported types it uses) to accept ArrayLike<number> (or more specifically
ArrayLike<number> | Uint8Array) instead of readonly number[] so both packages
(e.g., codec-typed.ts and codec-columnar.ts) can pass Uint8Array buffers without
type errors; ensure all imports/usages of encodeSidecar reflect the new
signature.
- Line 42: The import list in the top-level import from "./types.js" includes an
unused symbol SeverityText, causing a lint error; edit the import statement in
packages' codec-typed module to remove SeverityText (leave AnyValue and
LogRecord), then run the linter/tests to confirm the CI failure is resolved and
ensure no other references to SeverityText exist in the same file (if they do,
either remove or reintroduce a used import).

In `@packages/o11ylogsdb/src/codec-utils.ts`:
- Around line 130-131: The decode branch uses truthy checks so empty strings and
zeroes are lost; update the assignments in the function that decodes sidecar
(look for applySidecar and the block using rec and side) to test for undefined
explicitly (e.g. use side.e !== undefined or 'e' in side) before setting
rec.eventName and likewise check side.d !== undefined or 'd' in side before
setting rec.droppedAttributesCount so empty string and 0 are preserved.
- Around line 83-90: The code currently uses truthy checks on r.eventName and
r.droppedAttributesCount which causes falsy but valid values (e.g., empty string
"" or 0) to be skipped; change the checks to detect presence instead of
truthiness (e.g., test for undefined/null) so that when r.eventName is present
you still assign side.e = r.eventName and set sidecarHasContent = true, and
similarly when r.droppedAttributesCount is present assign side.d =
r.droppedAttributesCount and set sidecarHasContent = true; update the
conditional expressions around the references to r.eventName and
r.droppedAttributesCount in codec-utils.ts accordingly.

In `@packages/stardb/src/utils.ts`:
- Around line 23-45: The bytesToUuid function lacks a check that the input
Uint8Array is exactly 16 bytes, so pass-through of shorter/longer arrays yields
corrupted UUIDs; add an explicit precondition at the start of bytesToUuid that
verifies b.length === 16 and throws a clear RangeError (or TypeError) if not, so
callers fail fast and you avoid producing invalid strings; keep the rest of the
function (using hexLookup and the indexed accesses) unchanged.

---

Duplicate comments:
In `@packages/o11ylogsdb/src/query.ts`:
- Around line 150-164: The raw-byte fast path (guarded by useBodyFastPath and
implemented via rawPayloadContains after codec.decode) is unsafe for templated
body encodings (e.g., Drain, ColumnarDrain, TypedColumnarDrain) because a needle
can appear only after template reconstruction; update the logic to skip the raw
prune when the chunk's codec or policy indicates templating by adding a
capability flag (e.g., supportsRawPruning or isTemplated) on the policy/codec
metadata and gate the useBodyFastPath check (or the rawPayloadContains call)
against that flag so templated encodings always reconstruct and search the
rendered body instead of being pruned.

In `@packages/stardb/src/utils.ts`:
- Around line 48-79: hexNibble currently maps invalid characters to 0 which
silently corrupts outputs; change hexNibble to throw on non-hex characters and
update callers to surface that error, validate input lengths in hexToBytes and
uuidToBytes (throw on odd-length hex strings and on UUID strings that end
prematurely after skipping dashes), and ensure uuidToBytes skips only '-' but
otherwise checks for malformed characters using hexNibble so malformed hex/UUID
input is rejected rather than coerced; refer to hexNibble, uuidToBytes, and
hexToBytes when making these checks and error throws.

In `@packages/stardb/test/stream-utils.test.ts`:
- Around line 91-97: The test asserts "unknown id 999" but StreamRegistry
actually throws a different message format; update the assertions in the test
for the methods resourceOf, scopeOf, appendChunk, and chunksOf to match the real
thrown message (e.g. "StreamRegistry: unknown id 999") or use a regex/assertion
that checks for the substring "unknown id 999" (toThrow(/unknown id 999/)) so
the expectations align with StreamRegistry's actual error format.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: dec530d8-3d19-46ef-9120-f13331216863

📥 Commits

Reviewing files that changed from the base of the PR and between 1966013 and 064cb0e.

📒 Files selected for processing (9)
  • packages/o11ylogsdb/src/codec-columnar.ts
  • packages/o11ylogsdb/src/codec-typed.ts
  • packages/o11ylogsdb/src/codec-utils.ts
  • packages/o11ylogsdb/src/query.ts
  • packages/o11ytracesdb/src/query.ts
  • packages/stardb/src/index.ts
  • packages/stardb/src/utils.ts
  • packages/stardb/test/public-api.test.ts
  • packages/stardb/test/stream-utils.test.ts

Comment thread packages/o11ylogsdb/src/codec-columnar.ts
} from "./codec-utils.js";
import { Drain, PARAM_STR, tokenize } from "./drain.js";
import type { AnyValue, KeyValue, LogRecord, SeverityText } from "./types.js";
import type { AnyValue, LogRecord, SeverityText } from "./types.js";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Unused import: SeverityText.

CI reports lint failure. Remove the unused import.

Proposed fix
-import type { AnyValue, LogRecord, SeverityText } from "./types.js";
+import type { AnyValue, LogRecord } from "./types.js";
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import type { AnyValue, LogRecord, SeverityText } from "./types.js";
import type { AnyValue, LogRecord } from "./types.js";
🧰 Tools
🪛 GitHub Actions: validate-o11ykit-monorepo

[error] 42-42: Biome check failed (lint/correctness/noUnusedImports). Several imports are unused, including: 'import type { AnyValue, LogRecord, SeverityText } from "./types.js"'.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/o11ylogsdb/src/codec-typed.ts` at line 42, The import list in the
top-level import from "./types.js" includes an unused symbol SeverityText,
causing a lint error; edit the import statement in packages' codec-typed module
to remove SeverityText (leave AnyValue and LogRecord), then run the linter/tests
to confirm the CI failure is resolved and ensure no other references to
SeverityText exist in the same file (if they do, either remove or reintroduce a
used import).

Comment thread packages/o11ylogsdb/src/codec-typed.ts
Comment on lines +83 to +90
if (r.eventName) {
side.e = r.eventName;
sidecarHasContent = true;
}
if (r.droppedAttributesCount) {
side.d = r.droppedAttributesCount;
sidecarHasContent = true;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Falsy eventName and droppedAttributesCount are silently dropped.

PR objectives state "preserve falsy NDJSON fields (e.g., eventName: "" and droppedAttributesCount: 0)". However, lines 83 and 87 use truthy checks that skip eventName: "" and droppedAttributesCount: 0.

Proposed fix
-    if (r.eventName) {
+    if (r.eventName !== undefined) {
       side.e = r.eventName;
       sidecarHasContent = true;
     }
-    if (r.droppedAttributesCount) {
+    if (r.droppedAttributesCount !== undefined) {
       side.d = r.droppedAttributesCount;
       sidecarHasContent = true;
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (r.eventName) {
side.e = r.eventName;
sidecarHasContent = true;
}
if (r.droppedAttributesCount) {
side.d = r.droppedAttributesCount;
sidecarHasContent = true;
}
if (r.eventName !== undefined) {
side.e = r.eventName;
sidecarHasContent = true;
}
if (r.droppedAttributesCount !== undefined) {
side.d = r.droppedAttributesCount;
sidecarHasContent = true;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/o11ylogsdb/src/codec-utils.ts` around lines 83 - 90, The code
currently uses truthy checks on r.eventName and r.droppedAttributesCount which
causes falsy but valid values (e.g., empty string "" or 0) to be skipped; change
the checks to detect presence instead of truthiness (e.g., test for
undefined/null) so that when r.eventName is present you still assign side.e =
r.eventName and set sidecarHasContent = true, and similarly when
r.droppedAttributesCount is present assign side.d = r.droppedAttributesCount and
set sidecarHasContent = true; update the conditional expressions around the
references to r.eventName and r.droppedAttributesCount in codec-utils.ts
accordingly.

Comment on lines +130 to +131
if (side.e) rec.eventName = side.e;
if (side.d) rec.droppedAttributesCount = side.d;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Same falsy-check issue on decode side.

applySidecar also uses truthy checks, so eventName: "" and droppedAttributesCount: 0 won't be restored even if they were encoded.

Proposed fix
-  if (side.e) rec.eventName = side.e;
-  if (side.d) rec.droppedAttributesCount = side.d;
+  if (side.e !== undefined) rec.eventName = side.e;
+  if (side.d !== undefined) rec.droppedAttributesCount = side.d;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/o11ylogsdb/src/codec-utils.ts` around lines 130 - 131, The decode
branch uses truthy checks so empty strings and zeroes are lost; update the
assignments in the function that decodes sidecar (look for applySidecar and the
block using rec and side) to test for undefined explicitly (e.g. use side.e !==
undefined or 'e' in side) before setting rec.eventName and likewise check side.d
!== undefined or 'd' in side before setting rec.droppedAttributesCount so empty
string and 0 are preserved.

Comment on lines +23 to +45
export function bytesToUuid(b: Uint8Array): string {
return (
hexLookup[b[0] as number]! +
hexLookup[b[1] as number]! +
hexLookup[b[2] as number]! +
hexLookup[b[3] as number]! +
"-" +
hexLookup[b[4] as number]! +
hexLookup[b[5] as number]! +
"-" +
hexLookup[b[6] as number]! +
hexLookup[b[7] as number]! +
"-" +
hexLookup[b[8] as number]! +
hexLookup[b[9] as number]! +
"-" +
hexLookup[b[10] as number]! +
hexLookup[b[11] as number]! +
hexLookup[b[12] as number]! +
hexLookup[b[13] as number]! +
hexLookup[b[14] as number]! +
hexLookup[b[15] as number]!
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add an explicit 16-byte precondition in bytesToUuid.

Without a length check, non-16-byte inputs can produce corrupted UUID strings instead of failing fast.

Proposed fix
 export function bytesToUuid(b: Uint8Array): string {
+  if (b.length !== 16) {
+    throw new Error("bytesToUuid: expected exactly 16 bytes");
+  }
   return (
     hexLookup[b[0] as number]! +
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stardb/src/utils.ts` around lines 23 - 45, The bytesToUuid function
lacks a check that the input Uint8Array is exactly 16 bytes, so pass-through of
shorter/longer arrays yields corrupted UUIDs; add an explicit precondition at
the start of bytesToUuid that verifies b.length === 16 and throws a clear
RangeError (or TypeError) if not, so callers fail fast and you avoid producing
invalid strings; keep the rest of the function (using hexLookup and the indexed
accesses) unchanged.

strawgate and others added 2 commits April 28, 2026 12:16
Add decodeFilteredByBodyNeedle to TypedColumnarDrainPolicy. When a
bodyContains query is active, the codec reconstructs bodies and tests
them against the needle BEFORE parsing sidecar NDJSON. Only sidecar
lines for matching records are JSON.parsed, saving ~35-60% of sidecar
CPU (which was ~47% of full decode time).

Template-literal shortcut: for space-free needles, templates whose
literal tokens contain the needle are identified as definite matches
without per-record .includes() checks.

Query engine updated to use the filtered decode path via
readRecordsFilteredFromRaw. Returns a sparse array with matching
records at their original indices. Non-matching records are undefined.

Includes 11 new tests covering filtered decode correctness, template
literal shortcut, non-string body handling, and query engine integration.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move reusable code from engine packages to stardb shared core:

- Interner (o11ytsdb → stardb): high-perf string→int hash table
- BackpressureController (o11ytsdb → stardb): async counting semaphore
- lowerBound/upperBound (o11ytsdb → stardb): BigInt64Array binary search
- Bloom filter (o11ytracesdb → stardb): BF8 with double hashing
- buildDictWithIndex (o11ytracesdb → stardb): frequency-sorted dictionary
- encodeAnyValue/decodeAnyValue (o11ytracesdb → stardb): binary AnyValue codec
- uint8IndexOf (o11ylogsdb → stardb): SIMD-accelerated byte search

Engine packages now re-export from stardb, preserving all public APIs.
All 1155 tests pass, typecheck clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
strawgate and others added 3 commits April 28, 2026 12:34
33 new tests covering:
- Interner: intern/resolve, size, bulkIntern, cardinality limit, invalid ID
- BackpressureController: acquire/release, queuing, dispose, invalid args
- lowerBound/upperBound: exact match, before/after range, boundary cases
- Bloom filter: insert+query, FPR bounds, base64 roundtrip, dedup, edge cases
- uint8IndexOf: found, not-found, empty needle, needle > haystack
- buildDictWithIndex: frequency sort, empty input
- AnyValue binary codec: null, string (raw+dict), bigint, double, bool,
  bytes, array, map, nested structures

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Delete thin re-export wrapper files (interner.ts, backpressure.ts,
bloom.ts) from engine packages. All internal imports now point
directly to stardb. These were never part of the engine public APIs
consumed by the site or examples.

Removes Interner, BackpressureController, InternId from o11ytsdb
public exports. Removes bloom.ts wrapper from o11ytracesdb (bloom
functions still re-exported from index.ts via stardb).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove unused readRecordsFromRaw import from logsdb query.ts
- Fix noImplicitAnyLet in engine-extended.test.ts (type lastStats)
- Direct imports to stardb (no wrapper indirection)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@strawgate strawgate merged commit 4fb53bc into main Apr 28, 2026
4 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working performance Performance optimization size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant