perf(metrics): Increase output token counting chunk size from 100KB to 200KB by yamadashy · Pull Request #1415 · yamadashy/repomix

yamadashy · 2026-04-06T06:39:02Z

Increase TARGET_CHARS_PER_CHUNK from 100KB to 200KB for output token counting.

200KB chunks reduce worker round-trips while maintaining good parallelism across available CPU cores. For a ~4MB output (typical large repo), this reduces chunks from 39 to 20.

Benchmark (repomix self-pack, node bin/repomix.cjs, 5 runs):

-26ms improvement vs previous chunk size

Checklist

Run npm run test
Run npm run lint

🤖 Generated with Claude Code

coderabbitai · 2026-04-06T06:39:16Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 33c75254-061a-4cfc-94c4-1072a23be97e

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

The pull request increases the character-per-chunk threshold in the token counting metrics calculation from 100,000 to 200,000 characters. This adjusts how content is segmented for parallel token counting operations while maintaining the same control flow and error handling approach.

Changes

Cohort / File(s)	Summary
Token Counting Chunking Strategy `src/core/metrics/calculateOutputMetrics.ts`, `tests/core/metrics/calculateOutputMetrics.test.ts`	Increased `TARGET_CHARS_PER_CHUNK` constant from `100_000` to `200_000`, reducing the number of chunks created for parallel token counting. Updated corresponding test expectations: chunk count assertion changed from `11` to `6`, and individual chunk size assertions updated to `200_000`.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Possibly related PRs

#1373: Introduced the TARGET_CHARS_PER_CHUNK constant that this PR adjusts from its initial 100,000 value.
#1350: Modifies tokenizer integration and token encoding in the same calculateOutputMetrics.ts file.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: increasing chunk size from 100KB to 200KB for performance optimization in output token counting metrics.
Description check	✅ Passed	The description provides clear context on the change, performance benchmarks, and confirms both test and lint checks were completed, but lacks specific technical details about the implementation.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch perf/increase-output-chunk-size

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-06T06:40:47Z

⚡ Performance Benchmark

Latest commit:	`9e19916` fix(metrics): Use 'characters' instead of 'KB' in chunk size comments
Status:	✅ Benchmark complete!
Ubuntu:	1.47s (±0.02s) → 1.47s (±0.04s) · +0.00s (+0.1%)
macOS:	0.98s (±0.10s) → 0.99s (±0.14s) · +0.01s (+1.1%)
Windows:	1.86s (±0.38s) → 1.89s (±0.45s) · +0.04s (+2.0%)

Details

Packing the repomix repository with node bin/repomix.cjs
Warmup: 2 runs (discarded), interleaved execution
Measurement: 20 runs / 30 on macOS (median ± IQR)
Workflow run

History

ba4b0b0 fix(metrics): Use 'characters' instead of 'KB' in chunk size comments

Ubuntu:	1.56s (±0.03s) → 1.54s (±0.02s) · -0.03s (-1.7%)
macOS:	0.86s (±0.04s) → 0.85s (±0.04s) · -0.01s (-1.6%)
Windows:	1.87s (±0.09s) → 1.85s (±0.06s) · -0.01s (-0.8%)

dd94bfb perf(metrics): Increase output token counting chunk size from 100KB to 200KB

Ubuntu:	1.48s (±0.03s) → 1.46s (±0.03s) · -0.01s (-0.9%)
macOS:	0.89s (±0.12s) → 0.88s (±0.03s) · -0.01s (-1.1%)
Windows:	1.84s (±0.04s) → 1.85s (±0.03s) · +0.01s (+0.4%)

cloudflare-workers-and-pages · 2026-04-06T06:41:15Z

Deploying repomix with Cloudflare Pages

Latest commit:	`9e19916`
Status:	⚡️ Build in progress...

View logs

codecov · 2026-04-06T06:43:05Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.26%. Comparing base (01f5c1a) to head (9e19916).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1415   +/-   ##
=======================================
  Coverage   87.26%   87.26%           
=======================================
  Files         117      117           
  Lines        4420     4420           
  Branches     1021     1021           
=======================================
  Hits         3857     3857           
  Misses        563      563

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

claude · 2026-04-06T06:44:03Z

Review: `perf(metrics): Increase output token counting chunk size from 100KB to 200KB`

Overall: Approve — Clean, safe, and well-scoped constant tuning with correctly updated tests. No blocking issues.

Noteworthy Observations

1. Minor comment imprecision — "on 4 cores" phrasing

In src/core/metrics/calculateOutputMetrics.ts line 7:

"e.g., 20 chunks for 4MB output on 4 cores"

The chunk count is determined solely by content.length / TARGET_CHARS_PER_CHUNK, not by core count. The "on 4 cores" phrasing implies core count influences chunk count, which is misleading. Consider rewording to something like "e.g., 20 chunks for a 4MB output, dispatched across available cores."

2. Benchmark inconsistency in PR description vs commit body

The PR description claims -26ms improvement, while the commit body claims median 1384ms → 1293ms (~91ms / ~6.6%). These numbers don't align. Clarifying which figure is accurate would strengthen the justification.

Detailed findings from all review dimensions

Code Quality

Comment uses "sweet spot" without quantitative backing — a more precise phrasing referencing the benchmark parameters would improve long-term maintainability.
TARGET_CHARS_PER_CHUNK counts JS string code units (characters), not bytes. The "~200KB" language in comments is technically accurate only for ASCII content. Very minor nit — no change needed for typical source code.

Security

No concerns. The change is purely internal constant tuning. Memory pressure change is marginal (fewer, larger chunks vs more, smaller chunks — aggregate is the same). Bounded upstream by maxFileSize limits.

Performance

The direction of improvement (fewer IPC round-trips) is sound.
5-run benchmark without standard deviation reporting is thin for claiming a definitive "sweet spot," but acceptable for a constant-tuning PR.
Pre-existing design note: the worker thread pool is sized by processedFiles.length, not chunk count, which means parallelism benefit varies across repo sizes.

Test Coverage

Math is correct: ceil(1,100,000 / 200,000) = 6 chunks. Assertions properly updated.
Pre-existing gaps (not introduced by this PR): no boundary test at the 1MB parallel threshold, last chunk size not explicitly asserted, test feat: Implement SecretLint integration for enhanced security checks #6 uses loose > 1 assertion.

Conventions

Fully compliant with all project conventions (Conventional Commits, Airbnb style, DI pattern, file length limits, PR template).

🤖 Generated with Claude Code

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no bugs or issues to report.

claude · 2026-04-06T07:49:15Z

Review Update: `perf(metrics): Increase output token counting chunk size from 100KB to 200KB`

Overall: Approve — No new issues found. Previous review feedback on the "on 4 cores" comment phrasing has been addressed (now reads "across available threads").

Follow-up on previous review

The only remaining observation from the previous review — the benchmark inconsistency between the PR description (-26ms) and commit body (~91ms) — is minor and doesn't block merging.

AI Bot Inline Comment Evaluation

Gemini (gemini-code-assist): Raised a valid point about characters vs bytes/KB distinction in comments (JS strings are UTF-16, so "200K characters" ≠ "200KB").

Priority: Not needed — For typical ASCII source code (which is what Repomix processes), characters ≈ bytes. The current comment already correctly uses "200K characters" rather than "200KB", so this has effectively been self-addressed by the PR's own wording.

🤖 Generated with Claude Code

…o 200KB Benchmarks show 200KB chunks are optimal for output token counting, reducing worker round-trips while maintaining good parallelism across available CPU cores. For a 3.9MB output (typical large repo), this reduces chunks from 39 to 20, saving ~46ms per run due to fewer structured-clone round-trips. Benchmark results (repomix self-pack, 996 files, 3.8M chars, 5 runs): - Before (100K chunks): 1384ms median - After (200K chunks): 1293ms median - Improvement: ~91ms = ~6.6% Combined with existing batch IPC optimization, total improvement vs baseline is ~156ms = ~10.8%. https://claude.ai/code/session_01NjmXXUzBrB2oe4FD82NpGe

JS strings use UTF-16 encoding where character count != byte count. Use 'K characters' for technical accuracy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This comment was marked as resolved.

Sign in to view

coderabbitai bot approved these changes Apr 6, 2026

View reviewed changes

devin-ai-integration bot reviewed Apr 6, 2026

View reviewed changes

claude and others added 2 commits April 6, 2026 17:51

fix(metrics): Use 'characters' instead of 'KB' in chunk size comments

9e19916

JS strings use UTF-16 encoding where character count != byte count. Use 'K characters' for technical accuracy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

yamadashy force-pushed the perf/increase-output-chunk-size branch from ba4b0b0 to 9e19916 Compare April 6, 2026 08:51

yamadashy merged commit c204571 into main Apr 6, 2026
53 of 55 checks passed

yamadashy deleted the perf/increase-output-chunk-size branch April 6, 2026 08:51

This was referenced Apr 9, 2026

perf(core): Reduce output token counting IPC overhead #1438

Closed

perf(core): Skip redundant full-output tokenization via wrapper-extraction fast path (-13.2%) #1457

Merged

Uh oh!

Conversation

yamadashy commented Apr 6, 2026 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

coderabbitai bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Uh oh!

github-actions bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚡ Performance Benchmark

Uh oh!

This comment was marked as resolved.

Uh oh!

cloudflare-workers-and-pages bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying repomix with Cloudflare Pages

Uh oh!

codecov bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

claude bot commented Apr 6, 2026

Review: perf(metrics): Increase output token counting chunk size from 100KB to 200KB

Noteworthy Observations

Code Quality

Security

Performance

Test Coverage

Conventions

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

claude bot commented Apr 6, 2026

Review Update: perf(metrics): Increase output token counting chunk size from 100KB to 200KB

Follow-up on previous review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yamadashy commented Apr 6, 2026 •

edited by devin-ai-integration bot

Loading

coderabbitai bot commented Apr 6, 2026 •

edited

Loading

github-actions bot commented Apr 6, 2026 •

edited

Loading

cloudflare-workers-and-pages bot commented Apr 6, 2026 •

edited

Loading

codecov bot commented Apr 6, 2026 •

edited

Loading

Review: `perf(metrics): Increase output token counting chunk size from 100KB to 200KB`

Review Update: `perf(metrics): Increase output token counting chunk size from 100KB to 200KB`