Skip to content

perf(core): Estimate output tokens via sampling with CV-based fallback#1397

Closed
yamadashy wants to merge 2 commits intomainfrom
perf/sampling-estimation-with-cv-fallback
Closed

perf(core): Estimate output tokens via sampling with CV-based fallback#1397
yamadashy wants to merge 2 commits intomainfrom
perf/sampling-estimation-with-cv-fallback

Conversation

@yamadashy
Copy link
Copy Markdown
Owner

@yamadashy yamadashy commented Apr 5, 2026

For outputs larger than 500KB, estimate the total token count by sampling 10 evenly spaced 100KB portions and extrapolating the chars-per-token ratio. This avoids running full BPE tokenization on the entire output.

To guard against worst-case scenarios (periodic structure resonating with the sample stride, or mixed CJK/ASCII content), compute the coefficient of variation (CV) of per-sample chars/token ratios. If CV exceeds 0.15, fall back to full tokenization to maintain accuracy.

How it works

  1. For outputs > 500KB, take 10 evenly spaced 100KB samples
  2. Tokenize each sample and compute per-sample chars/token ratios
  3. Calculate the coefficient of variation (CV = stddev / mean) of the ratios
  4. If CV ≤ 0.15 → extrapolate total tokens from the sample ratio (fast path)
  5. If CV > 0.15 → fall back to full tokenization (safe path)

Worst-case handling

  • Periodic resonance: If file sizes align with the sample stride, all samples may hit the same structural position (e.g. all markup or all code). The CV check detects this non-uniformity.
  • Mixed content (CJK/ASCII): Different scripts have very different chars/token ratios. If the output mixes them unevenly, the CV will be high and trigger fallback.

Checklist

  • Run npm run test
  • Run npm run lint

Open with Devin

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 5, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ea6e0be8-6326-4b84-852a-551c28c6d5fd

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Introduced sampling-based token estimation for very large content outputs that attempts extrapolation before falling back to full tokenization when sampling variance exceeds thresholds. Refactored tokenization logic into separate helper functions and removed the prior parallel-execution decision from main control flow.

Changes

Cohort / File(s) Summary
Core Token Estimation Refactor
src/core/metrics/calculateOutputMetrics.ts
Introduced tryEstimateBySampling to extrapolate token counts from fixed-size samples using character-to-token ratios; added fullTokenize helper consolidating prior chunk-splitting and parallel aggregation logic; replaced shouldRunInParallel decision with conditional routing: sampling for large outputs (>threshold) or direct execution for smaller outputs; includes CV-based fallback to full tokenization and sampling-specific trace logging.
Test Coverage for Sampling Behavior
tests/core/metrics/calculateOutputMetrics.test.ts
Enhanced existing "large content" chunking tests with sample-task detection (via path containing -sample-) and intentional CV-forcing mock responses; added new describe('sampling estimation') block validating: sampling usage for ~600KB content (≤10 runs, ±5% accuracy), fallback on high-variance samples (>10 runs), and no sampling for ~400KB content (exactly 1 run).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: introducing performance optimization through sampling-based token estimation with CV-based fallback for large outputs.
Description check ✅ Passed The description covers the key implementation details, includes the 'How it works' section explaining the sampling strategy and CV logic, addresses worst-case handling scenarios, and confirms checklist items completed.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/sampling-estimation-with-cv-fallback

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 5, 2026

⚡ Performance Benchmark

Latest commit:309fbf2 perf(core): Raise sampling threshold from 500KB to 3MB
Status:✅ Benchmark complete!
Ubuntu:1.63s (±0.05s) → 1.75s (±0.04s) · +0.12s (+7.3%)
macOS:1.34s (±0.31s) → 1.35s (±0.27s) · +0.02s (+1.2%)
Windows:2.22s (±0.14s) → 1.83s (±0.13s) · -0.39s (-17.4%)
Details
  • Packing the repomix repository with node bin/repomix.cjs
  • Warmup: 2 runs (discarded), interleaved execution
  • Measurement: 20 runs / 30 on macOS (median ± IQR)
  • Workflow run
History

e27732b perf(core): Raise sampling threshold from 500KB to 3MB

Ubuntu:1.58s (±0.03s) → 1.69s (±0.03s) · +0.11s (+7.0%)
macOS:1.31s (±0.19s) → 1.51s (±0.20s) · +0.20s (+15.2%)
Windows:1.97s (±0.18s) → 1.60s (±0.12s) · -0.37s (-18.7%)

23923aa perf(core): Estimate output tokens via sampling with CV-based fallback

Ubuntu:1.56s (±0.03s) → 1.67s (±0.04s) · +0.10s (+6.6%)
macOS:0.92s (±0.07s) → 1.00s (±0.06s) · +0.08s (+8.5%)
Windows:1.96s (±0.06s) → 1.60s (±0.04s) · -0.35s (-17.9%)

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 5, 2026

Codecov Report

❌ Patch coverage is 96.00000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.47%. Comparing base (4acbbc0) to head (309fbf2).

Files with missing lines Patch % Lines
src/core/metrics/calculateOutputMetrics.ts 96.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1397      +/-   ##
==========================================
+ Coverage   87.40%   87.47%   +0.06%     
==========================================
  Files         116      116              
  Lines        4392     4431      +39     
  Branches     1018     1026       +8     
==========================================
+ Hits         3839     3876      +37     
- Misses        553      555       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a sampling-based estimation mechanism for calculating token counts in large outputs, aimed at improving performance by avoiding full tokenization when content is sufficiently uniform. It includes a coefficient of variation (CV) check to detect heterogeneous content and fall back to full tokenization when necessary. Feedback indicates that the current sampling threshold of 500KB is too low relative to the parallel processing threshold, potentially causing overhead and redundant processing for files in the 500KB to 1MB range; increasing this threshold to 2MB is recommended.


// Sampling constants for token count estimation on large outputs.
// Instead of full BPE tokenization, we sample evenly spaced portions and extrapolate.
const OUTPUT_SAMPLING_THRESHOLD = 500_000; // 500KB - outputs below this are fully tokenized
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The OUTPUT_SAMPLING_THRESHOLD of 500KB is lower than the MIN_CONTENT_LENGTH_FOR_PARALLEL threshold (1MB). This introduces several inefficiencies for files in the 500KB - 1MB range:

  1. Increased Overhead: Files in this range were previously processed in a single task. Now, they trigger tryEstimateBySampling, which launches up to 10 parallel tasks. The overhead of worker communication for 10 small tasks likely outweighs any benefit, especially since the total amount of data tokenized remains the same or even increases due to overlaps.
  2. Overlapping Samples: Since OUTPUT_SAMPLE_COUNT * OUTPUT_SAMPLE_SIZE is 1MB, any file smaller than 1MB will have overlapping samples (because the stride will be less than the sample size). This results in redundant tokenization of the same characters across different worker tasks.
  3. Inaccuracy Risk: For files where sampling doesn't actually reduce the workload (e.g., 600KB), you are introducing potential estimation error for no performance gain.

Consider increasing OUTPUT_SAMPLING_THRESHOLD to a value where sampling provides a significant reduction in work, such as 2MB.

Suggested change
const OUTPUT_SAMPLING_THRESHOLD = 500_000; // 500KB - outputs below this are fully tokenized
const OUTPUT_SAMPLING_THRESHOLD = 2_000_000; // 2MB - outputs below this are fully tokenized

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages bot commented Apr 5, 2026

Deploying repomix with  Cloudflare Pages  Cloudflare Pages

Latest commit: 309fbf2
Status: ✅  Deploy successful!
Preview URL: https://993ae52b.repomix.pages.dev
Branch Preview URL: https://perf-sampling-estimation-wit.repomix.pages.dev

View logs

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
tests/core/metrics/calculateOutputMetrics.test.ts (1)

192-272: Please add a low-CV aliasing regression.

These cases cover uniform data and intentionally noisy samples, but they do not exercise the deterministic aliasing case where every 100KB sample lands on the same phase and CV still looks healthy. An alternating-band fixture would lock in the fix for that failure mode.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/core/metrics/calculateOutputMetrics.test.ts` around lines 192 - 272,
Add a new test that reproduces the low-CV aliasing regression by constructing
content made of repeating alternating-band blocks (e.g., 100KB block A then
100KB block B repeated) so every 100KB sample lands on the same phase and
produces an artificially low CV; call calculateOutputMetrics with a mock
TaskRunner (mocking TokenCountTask run and using the task.path sample naming
that includes '-sample-') where sample runs return a consistent token/char ratio
(so sampling appears stable) but full-chunk runs return a different ratio, then
assert that the function detects aliasing and falls back to full tokenization
(i.e., the mock runCallCount should be >10 and the final result should reflect
the full-chunk ratio). Ensure the test references calculateOutputMetrics,
TokenCountTask, and uses the same sample path pattern ('-sample-') so it targets
the sampling logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/core/metrics/calculateOutputMetrics.ts`:
- Around line 34-42: The branch in calculateOutputMetrics.ts that returns an
extrapolated numeric value from tryEstimateBySampling hides that the count is
estimated; change the public contract returned by calculateOutputMetrics (and
values produced by tryEstimateBySampling and fullTokenize) from a bare number to
a small result object such as { count: number, estimated: boolean } (or add an
explicit isEstimated flag) so callers can distinguish exact vs extrapolated
counts, update calculateMetrics.ts to aggregate using .count and to
track/propagate estimation metadata, and ensure fullTokenize returns
estimated=false while tryEstimateBySampling returns estimated=true when
extrapolating.
- Around line 106-149: The sampling start calculation using start = i * stride
biases samples to each band edge and can miss phase/resonance patterns; change
the sampling in the Array.from mapping (where stride, start, and sampleContent
are computed inside the sampleResults creation) to distribute starts across the
full [0, content.length - OUTPUT_SAMPLE_SIZE] range instead of using i * stride
— e.g., compute maxStart = Math.max(0, content.length - OUTPUT_SAMPLE_SIZE) and
set start = Math.floor(((i + 0.5) / sampleCount) * maxStart) (or add a small
per-bucket jitter) to center/jitter each bucket, keeping the rest of the logic
(validSamples, ratios, cv) unchanged and ensuring start is clamped to valid
bounds.

---

Nitpick comments:
In `@tests/core/metrics/calculateOutputMetrics.test.ts`:
- Around line 192-272: Add a new test that reproduces the low-CV aliasing
regression by constructing content made of repeating alternating-band blocks
(e.g., 100KB block A then 100KB block B repeated) so every 100KB sample lands on
the same phase and produces an artificially low CV; call calculateOutputMetrics
with a mock TaskRunner (mocking TokenCountTask run and using the task.path
sample naming that includes '-sample-') where sample runs return a consistent
token/char ratio (so sampling appears stable) but full-chunk runs return a
different ratio, then assert that the function detects aliasing and falls back
to full tokenization (i.e., the mock runCallCount should be >10 and the final
result should reflect the full-chunk ratio). Ensure the test references
calculateOutputMetrics, TokenCountTask, and uses the same sample path pattern
('-sample-') so it targets the sampling logic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5d9c24ee-a764-4e26-9d42-e36f127b01f8

📥 Commits

Reviewing files that changed from the base of the PR and between 611b88a and 23923aa.

📒 Files selected for processing (2)
  • src/core/metrics/calculateOutputMetrics.ts
  • tests/core/metrics/calculateOutputMetrics.test.ts

Comment on lines +34 to 42
if (content.length > OUTPUT_SAMPLING_THRESHOLD) {
// For large outputs, try sampling estimation first
const estimated = await tryEstimateBySampling(content, encoding, path, deps);
if (estimated !== null) {
result = estimated;
} else {
// Sampling variance too high, fall back to full tokenization
result = await fullTokenize(content, encoding, path, deps);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Surface when this count is estimated.

This branch now returns an extrapolated value through the same Promise<number> contract, but src/core/metrics/calculateMetrics.ts still reduces these numbers into totalTokens as if they were exact. That silently changes the semantics of large-output metrics for every downstream caller. Please propagate estimate metadata or keep the exact path on the public result.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/metrics/calculateOutputMetrics.ts` around lines 34 - 42, The branch
in calculateOutputMetrics.ts that returns an extrapolated numeric value from
tryEstimateBySampling hides that the count is estimated; change the public
contract returned by calculateOutputMetrics (and values produced by
tryEstimateBySampling and fullTokenize) from a bare number to a small result
object such as { count: number, estimated: boolean } (or add an explicit
isEstimated flag) so callers can distinguish exact vs extrapolated counts,
update calculateMetrics.ts to aggregate using .count and to track/propagate
estimation metadata, and ensure fullTokenize returns estimated=false while
tryEstimateBySampling returns estimated=true when extrapolating.

Comment on lines +106 to +149
const stride = Math.floor(content.length / sampleCount);

const sampleResults = await Promise.all(
Array.from({ length: sampleCount }, (_, i) => {
const start = i * stride;
const sampleContent = content.slice(start, start + OUTPUT_SAMPLE_SIZE);
return deps.taskRunner
.run({
content: sampleContent,
encoding,
path: path ? `${path}-sample-${i}` : undefined,
})
.then((tokens) => ({ chars: sampleContent.length, tokens }));
}),
);

const validSamples = sampleResults.filter((s) => s.tokens > 0 && s.chars > 0);
if (validSamples.length < 2) {
return null;
}

// Compute per-sample chars/token ratios and check coefficient of variation (CV = stddev / mean).
// High CV indicates the content is heterogeneous (e.g. mixed CJK/ASCII, or periodic structure
// resonating with the sample stride), making the extrapolation unreliable.
const ratios = validSamples.map((s) => s.chars / s.tokens);
const mean = ratios.reduce((sum, r) => sum + r, 0) / ratios.length;
const variance = ratios.reduce((sum, r) => sum + (r - mean) ** 2, 0) / ratios.length;
const cv = Math.sqrt(variance) / mean;

if (cv > SAMPLING_CV_THRESHOLD) {
logger.trace(
`Sampling CV ${cv.toFixed(3)} exceeds threshold ${SAMPLING_CV_THRESHOLD}, falling back to full tokenization`,
);
return null;
}

// Extrapolate total token count from the overall sample ratio
const totalSampleTokens = validSamples.reduce((sum, s) => sum + s.tokens, 0);
const totalSampleChars = validSamples.reduce((sum, s) => sum + s.chars, 0);
const estimated = Math.round((content.length / totalSampleChars) * totalSampleTokens);

logger.trace(
`Estimated output tokens from ${validSamples.length} samples: ${estimated} (CV=${cv.toFixed(3)}, ${(totalSampleChars / totalSampleTokens).toFixed(2)} chars/token)`,
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The current sample placement can miss the exact resonance case the CV gate is supposed to catch.

start = i * stride always samples the leading edge of each band and can leave the tail unsampled. On alternating 100KB ASCII/CJK blocks, every sample can land on the same phase, CV stays near zero, and the estimate is still badly wrong. Please distribute starts across the full [0, content.length - OUTPUT_SAMPLE_SIZE] range—ideally centered or jittered per bucket—before relying on CV as the safety check.

🩹 One way to de-bias the sample starts
-  const stride = Math.floor(content.length / sampleCount);
+  const lastStart = Math.max(0, content.length - OUTPUT_SAMPLE_SIZE);
+  const bucketSize = content.length / sampleCount;

   const sampleResults = await Promise.all(
     Array.from({ length: sampleCount }, (_, i) => {
-      const start = i * stride;
+      const rawStart =
+        i === 0
+          ? 0
+          : i === sampleCount - 1
+            ? lastStart
+            : (i + 0.5) * bucketSize - OUTPUT_SAMPLE_SIZE / 2;
+      const start = Math.max(0, Math.min(lastStart, Math.round(rawStart)));
       const sampleContent = content.slice(start, start + OUTPUT_SAMPLE_SIZE);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/metrics/calculateOutputMetrics.ts` around lines 106 - 149, The
sampling start calculation using start = i * stride biases samples to each band
edge and can miss phase/resonance patterns; change the sampling in the
Array.from mapping (where stride, start, and sampleContent are computed inside
the sampleResults creation) to distribute starts across the full [0,
content.length - OUTPUT_SAMPLE_SIZE] range instead of using i * stride — e.g.,
compute maxStart = Math.max(0, content.length - OUTPUT_SAMPLE_SIZE) and set
start = Math.floor(((i + 0.5) / sampleCount) * maxStart) (or add a small
per-bucket jitter) to center/jitter each bucket, keeping the rest of the logic
(validSamples, ratios, cv) unchanged and ensuring start is clamped to valid
bounds.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional findings.

Open in Devin Review

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 5, 2026

Code Review

Overall this is a well-structured PR with clean code, good separation of concerns (fullTokenize / tryEstimateBySampling), thorough comments, and solid test coverage. The CV-based fallback is a thoughtful safety mechanism. I have a few observations worth considering before merge.

Key Findings

1. Wasted work on CV fallback (Medium)

When tryEstimateBySampling returns null due to high CV, the 10 sample worker calls are pure overhead on top of the subsequent full tokenization. For a 3.5MB output this means 10 + 35 = 45 worker calls instead of 35. Mixed-language repos (CJK + ASCII, code + markdown) are likely to trigger the fallback consistently, making these outputs ~29% more expensive than before.

Consider an early-exit strategy: run 2–3 pilot samples first, check CV, and only commit to all 10 if the pilot looks stable. This would cap the wasted work to 2–3 calls instead of 10.

2. Benchmark results need validation at 3MB threshold (Medium)

The benchmark on commit 1 (500KB threshold) showed regressions on Ubuntu (+6.6%) and macOS (+8.5%). The threshold was raised to 3MB to escape this regression, but there are no benchmark results confirming the 3MB threshold actually improves performance for real-world large outputs. The repomix self-pack benchmark likely produces output well under 3MB, meaning the optimization never fires in the benchmark. It would be valuable to benchmark against a genuinely large mono-repo output (>3MB) to validate the win.

3. Last segment is never sampled (Low)

Details

With stride = floor(content.length / sampleCount), the last sample starts at offset 9 * stride and reads 100KB forward. For a 3MB output (stride = 300KB), sample 9 covers bytes 2.7M–2.8M, leaving the final 200KB unsampled. Repomix outputs typically end with the last packed file's content — if that file has very different token density (e.g., a minified JS bundle), the estimate will be silently skewed without triggering the CV check.

Anchoring the last sample to content.length - OUTPUT_SAMPLE_SIZE would close this blind spot.

4. Test assertion could be tighter (Low)

Details

In the "should fall back to full tokenization when sampling CV is too high" test, expect(runCallCount).toBeGreaterThan(10) is weak — it would pass even if only 11 calls happened. toBe(10 + 35) (10 samples + 35 full-tokenize chunks for 3.5MB) would lock in the expected behavior and catch regressions where the fallback path changes.

5. Missing test for validSamples < 2 branch (Low)

Details

Codecov flags 3 uncovered lines. The sampleCount < 2 guard is structurally unreachable (3MB / 100KB = 30, always >= 2), but validSamples < 2 is reachable when the tokenizer returns 0 for most samples. A test with a mock returning 0 tokens would cover this degenerate-worker path.

What Looks Good

  • Clean refactor: extracting fullTokenize and tryEstimateBySampling improves readability
  • Well-documented constants with rationale for each threshold
  • CV-based fallback is a sound statistical approach for detecting heterogeneous content
  • Dependency injection pattern preserved throughout
  • Conventional Commits with meaningful bodies explaining the "why"
  • Test coverage at 94% patch coverage is solid

Summary

The approach is sound for genuinely large outputs. The main concern is validating that the 3MB threshold actually produces measurable speedups for real-world large repos, and mitigating the wasted-work cost on the CV fallback path. The code quality itself is high.


🤖 Generated with Claude Code

yamadashy and others added 2 commits April 5, 2026 15:18
For outputs larger than 500KB, estimate the total token count by sampling
10 evenly spaced 100KB portions and extrapolating the chars-per-token ratio.
This avoids running full BPE tokenization on the entire output.

To guard against worst-case scenarios (periodic structure resonating with
the sample stride, or mixed CJK/ASCII content), compute the coefficient of
variation (CV) of per-sample chars/token ratios. If CV exceeds 0.15,
fall back to full tokenization to maintain accuracy.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The 500KB threshold caused regression on Ubuntu (+6.6%) and macOS (+8.5%)
because outputs in the 500KB-1MB range went from 1 worker call (direct
processing) to 10 calls (sampling), adding significant overhead.

Raising the threshold to 3MB ensures sampling only kicks in when
full parallel chunking would require 30+ tasks, making the reduction
to 10 samples worthwhile. The else branch now uses fullTokenize to
preserve the existing parallel chunking path for 1-3MB outputs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@yamadashy yamadashy force-pushed the perf/sampling-estimation-with-cv-fallback branch from e27732b to 309fbf2 Compare April 5, 2026 06:18
@yamadashy
Copy link
Copy Markdown
Owner Author

Closing: sampling-based estimation regresses performance on Ubuntu/macOS. The parallel worker pool is already efficient enough on Unix that reducing 39 chunks to 10 samples doesn't help — it only adds overhead. Windows benefits (-18.7%) but doesn't justify the Unix regression (+7-15%). May revisit with a different approach (e.g., platform-aware thresholds or single-sample ratio estimation).

@yamadashy yamadashy closed this Apr 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant