perf(formatter): pre-allocate enough space for the FormatElement buffer#15422
Conversation
How to use the Graphite Merge QueueAdd either label to this PR to merge it via the merge queue:
You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has enabled the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. This stack of pull requests is managed by Graphite. Learn more about stacking. |
CodSpeed Performance ReportMerging #15422 will improve performances by 11.99%Comparing Summary
Benchmarks breakdown
Footnotes
|
c67a54a to
9f99f78
Compare
54747f5 to
bb10f8a
Compare
9f99f78 to
22e9152
Compare
0092dd1 to
471b240
Compare
22e9152 to
4117570
Compare
471b240 to
e3f750e
Compare
35990b2 to
a0fe000
Compare
e3f750e to
598dcbf
Compare
a0fe000 to
efe0d91
Compare
There was a problem hiding this comment.
Pull Request Overview
This PR optimizes buffer capacity pre-allocation in the formatter by replacing the previous heuristic (based on the number of arguments) with an empirically-derived formula based on source text length. The new approach pre-allocates 0.4x the source text size to minimize reallocations during formatting.
- Replaces argument count-based capacity with source text length-based calculation
- Adds detailed comments explaining the empirical analysis behind the 0.4x multiplier
- Uses
(source_len * 2) / 5to avoid floating-point arithmetic
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
598dcbf to
c27ac48
Compare
Merge activity
|
…er (#15422) # VecBuffer Capacity Analysis ## Overview This document explains the empirical analysis that determined the optimal buffer capacity allocation for the formatter's `VecBuffer`. ## Data Source Analysis of **4,891 files** from the **VSCode repository** formatter test runs, measuring: - Source text length (input) - Formatted document length (output buffer requirement) The VSCode repository provides a comprehensive real-world dataset with diverse JavaScript/TypeScript patterns, file sizes, and coding styles, making it an ideal benchmark for formatter capacity optimization. ## Key Findings ### Overall Statistics | Metric | Value | Interpretation | |--------|-------|----------------| | **Median ratio** | 0.194 (19.4%) | Half of files need ≤19.4% of source length | | **Average ratio** | 0.189 (18.9%) | Typical formatted size | | **75th percentile** | 0.254 (25.4%) | 75% of files need ≤25.4% | | **90th percentile** | 0.314 (31.4%) | 90% of files need ≤31.4% | | **95th percentile** | 0.355 (35.5%) | 95% of files need ≤35.5% | | **99th percentile** | 0.477 (47.7%) | 99% of files need ≤47.7% | | **Max observed** | 0.947 (94.7%) | Extreme outlier case | ### Buffer Requirements by File Size | File Size Range | Files | Median | 95th Percentile | 99th Percentile | Example (95th) | |-----------------|-------|--------|-----------------|-----------------|----------------| | **< 1KB** | 277 | 0.126 | **0.300** | 0.779 | 500B → 150B | | **1KB - 5KB** | 1,772 | 0.190 | **0.360** | 0.462 | 3KB → 1.08KB | | **5KB - 10KB** | 1,002 | 0.206 | **0.377** | 0.454 | 7.5KB → 2.83KB | | **10KB - 50KB** | 1,628 | 0.202 | **0.346** | 0.482 | 30KB → 10.38KB | | **> 50KB** | 212 | 0.193 | **0.302** | 0.348 | 100KB → 30.2KB | **Key Insight**: The 95th percentile ranges from 0.30 to 0.38 across all file sizes, showing consistent behavior regardless of file size. ## New Implementation ### Chosen Formula ```rust let capacity = (context.source_text().len() * 2) / 5; // 0.4 multiplier ``` ### How 0.4 Was Derived 1. **Identified worst-case 95th percentile**: 0.377 (5KB-10KB files) 2. **Added safety margin**: 0.377 → 0.40 3. **Verified universal coverage**: - All size ranges have 95th percentile ≤ 0.377 - 0.4 > 0.377, so it covers 95%+ of all file sizes 4. **Chose clean fraction**: `2/5` for efficient integer arithmetic ### Benefits | Aspect | Improvement | |--------|-------------| | **Small files** | 7x memory reduction (from 133% to 40%) | | **Large files** | Slight increase (from 33% to 40%, +21%) | | **Coverage** | 95%+ files avoid reallocation | | **Code simplicity** | No branching needed | | **Universality** | Single formula for all file sizes | ## Performance Characteristics - **Memory efficiency**: Allocates only ~2x actual need (40% vs 19% median) - **Reallocation rate**: <5% of files will need buffer growth - **Safety margin**: 12% headroom above worst-case 95th percentile - **Trade-off**: Accepts rare reallocations for 5% of files to save memory on the other 95% ## Validation The formula was validated across: - 277 tiny files (<1KB) - 1,772 small files (1-5KB) - 1,002 medium files (5-10KB) - 1,628 large files (10-50KB) - 212 very large files (>50KB) All size ranges showed consistent 95th percentile requirements between 0.30-0.38, confirming that a universal 0.4 multiplier is optimal. ## Conclusion The **0.4 multiplier** (`capacity = source_len * 2 / 5`) provides the best balance between: - Memory efficiency (60% savings vs old small-file allocation) - Performance (95%+ hit rate without reallocation) - Code simplicity (no conditional logic) - Universal applicability (works for all file sizes) This is a data-driven optimization based on real-world formatter usage across thousands of files from the VSCode repository, representing production-grade JavaScript/TypeScript code patterns.
c27ac48 to
f4b75b6
Compare

VecBuffer Capacity Analysis
Overview
This document explains the empirical analysis that determined the optimal buffer capacity allocation for the formatter's
VecBuffer.Data Source
Analysis of 4,891 files from the VSCode repository formatter test runs, measuring:
The VSCode repository provides a comprehensive real-world dataset with diverse JavaScript/TypeScript patterns, file sizes, and coding styles, making it an ideal benchmark for formatter capacity optimization.
Key Findings
Overall Statistics
Buffer Requirements by File Size
Key Insight: The 95th percentile ranges from 0.30 to 0.38 across all file sizes, showing consistent behavior regardless of file size.
New Implementation
Chosen Formula
How 0.4 Was Derived
2/5for efficient integer arithmeticBenefits
Performance Characteristics
Validation
The formula was validated across:
All size ranges showed consistent 95th percentile requirements between 0.30-0.38, confirming that a universal 0.4 multiplier is optimal.
Conclusion
The 0.4 multiplier (
capacity = source_len * 2 / 5) provides the best balance between:This is a data-driven optimization based on real-world formatter usage across thousands of files from the VSCode repository, representing production-grade JavaScript/TypeScript code patterns.