Skip to content

Comments

feat(codegen): add options to control indentation#12691

Merged
overlookmotel merged 8 commits intomainfrom
copilot/fix-12008568-45f0-406e-a8b2-665d1e5425af
Aug 1, 2025
Merged

feat(codegen): add options to control indentation#12691
overlookmotel merged 8 commits intomainfrom
copilot/fix-12008568-45f0-406e-a8b2-665d1e5425af

Conversation

Copy link
Contributor

Copilot AI commented Jul 31, 2025

Add ability to control indentation style (tabs/spaces) and width (e.g. 2 / 4).

Usage

use oxc_codegen::{Codegen, CodegenOptions, IndentChar};

// Tabs with default width
let options = CodegenOptions::default();

// Spaces with width 4
let options = CodegenOptions {
    indent_char: IndentChar::Space,
    indent_width: 4,
    ..CodegenOptions::default()
};

// Tabs with width 2
let options = CodegenOptions {
    indent_char: IndentChar::Tab,
    indent_width: 2,
    ..CodegenOptions::default()
};

All existing functionality is preserved with backward compatibility maintained. The updated system enables new use cases like multi-tab indentation.

@graphite-app
Copy link
Contributor

graphite-app bot commented Jul 31, 2025

How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • 0-merge - adds this PR to the back of the merge queue
  • hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

@Boshen
Copy link
Member

Boshen commented Aug 1, 2025

@copilot please continue

Copy link
Contributor Author

Copilot AI commented Aug 1, 2025

@copilot please continue

I've implemented configurable indentation support for oxc codegen! The implementation includes:

New Features:

  • IndentStyle enum with Tabs (default) and Spaces options
  • indent_size option for controlling spaces per indent level (default: 2)
  • Backward compatibility - existing code continues to use tabs by default

Technical Changes:

  • Extended CodeBuffer with print_indent_with_style() method
  • Updated Codegen::print_indent() to use the new configuration
  • Added comprehensive tests covering all indentation styles

Usage:

// Tabs (default behavior)
let options = CodegenOptions::default();

// 2 spaces per indent
let options = CodegenOptions {
    indent_style: IndentStyle::Spaces,
    indent_size: 2,
    ..CodegenOptions::default()
};

// 4 spaces per indent  
let options = CodegenOptions {
    indent_style: IndentStyle::Spaces,
    indent_size: 4,
    ..CodegenOptions::default()
};

All existing tests pass, ensuring no regressions. The implementation is ready for use! Commit: bc39ec1

Copilot AI changed the title [WIP] Implement support for inserting spaces or tabs to @oxc-project/oxc/files/crates/oxc_codegen/src/lib.rs . Fix @oxc-project/backlog/issues/150 Implement configurable indentation (spaces vs tabs) for oxc codegen Aug 1, 2025
@Boshen
Copy link
Member

Boshen commented Aug 1, 2025

@overlookmotel do we need to modify CodeBuffer? It's currently hard coded with a tab, make it configurable instead?

Copy link
Contributor Author

Copilot AI commented Aug 1, 2025

@overlookmotel do we need to modify CodeBuffer? It's currently hard coded with a tab, make it configurable instead?

I've refactored the implementation as suggested! The CodeBuffer now has configurable indentation built-in:

  • Added indent_char and indent_width fields to CodeBuffer
  • Modified print_indent() to automatically use the configured style
  • Updated Codegen::with_options() to configure the CodeBuffer based on indent options
  • Simplified Codegen::print_indent() to just call CodeBuffer::print_indent()

This is much cleaner - the indentation logic is now centralized in CodeBuffer where it belongs, and callers just call print_indent() without needing to know about styles. Commit: e8f9913

@Boshen
Copy link
Member

Boshen commented Aug 1, 2025

@copilot CodeBuffer should cache the indent character after initialization.

Copy link
Contributor Author

Copilot AI commented Aug 1, 2025

@copilot CodeBuffer should cache the indent character after initialization.

Done! I've optimized the CodeBuffer to cache the indent character byte value during initialization instead of doing enum-to-byte conversion on every print_indent() call. The cached byte is stored in a new indent_byte field and used by the optimized print_indent_impl() method. This eliminates the repeated match statement overhead while maintaining the same functionality and performance characteristics. Commit: e452a46

@overlookmotel
Copy link
Member

overlookmotel commented Aug 1, 2025

@copilot Please make IndentChar a #[repr(u8)] enum with the discriminants manually set to b' ' and b'\t'. Then you can remove indent_byte field. indent_char as u8 will play the same role.

Please rename with_indent_config to with_indent. And same for any other places using "indent_config". "indent_config" is unnecessarily verbose.

@overlookmotel
Copy link
Member

overlookmotel commented Aug 1, 2025

@Boshen Can you please make the requests to copilot I made above? It doesn't seem to work if the comment isn't by you (is this a setting we can change?)

@overlookmotel
Copy link
Member

I'll need to review the CodeBuffer changes before merging. It may have destroyed the optimization for writing indentation fast.

@Boshen
Copy link
Member

Boshen commented Aug 1, 2025

@copilot Please make IndentChar a #[repr(u8)] enum with the discriminants manually set to b' ' and b'\t'. Then you can remove indent_byte field. indent_char as u8 will play the same role.

Please rename with_indent_config to with_indent. And same for any other places using "indent_config". "indent_config" is unnecessarily verbose.

Copy link
Contributor Author

Copilot AI commented Aug 1, 2025

@copilot Please make IndentChar a #[repr(u8)] enum with the discriminants manually set to b' ' and b'\t'. Then you can remove indent_byte field. indent_char as u8 will play the same role.

Please re...

Done! I've implemented both requested optimizations:

  1. Made IndentChar a #[repr(u8)] enum with discriminants manually set to b' ' and b'\t'
  2. Removed the indent_byte field - now using indent_char as u8 directly for the same performance benefit
  3. Renamed methods from with_indent_config to with_indent (and similar for other methods)

The optimization maintains the same performance characteristics while eliminating redundant storage and simplifying the code. All tests pass. Commit: a47101c

@overlookmotel
Copy link
Member

overlookmotel commented Aug 1, 2025

@copilot Please:

  1. Get rid of IndentStyle. Just use IndentChar everywhere.
  2. Allow user to set options.indent_width > 1 when indenting with tabs.
  3. Rename indent_size to indent_width. No point having 2 names for the same thing.
  4. Address any other inconsistencies in naming.
  5. Don't export IndentChar from root of oxc_data_structures crate, only the code_buffer module. Import it from code_buffer module where it's used. Re-export it from oxc_codegen crate.
  6. Correct typos "x86_84" and "x86_86" -> "x86_64".
  7. Remove print_indent_with_style method.
  8. Combine print_indent_impl method into print_indent.

cc @Boshen. I wonder if it works if you just say "@copilot do what @overlookmotel said in previous comment".

or even "@copilot watch for comments on this issue and do anything that @overlookmotel tells you to"!

@Boshen
Copy link
Member

Boshen commented Aug 1, 2025

@copilot do what @overlookmotel said in previous comment

Copy link
Contributor Author

Copilot AI commented Aug 1, 2025

@copilot Please:

  1. Get rid of IndentStyle. Just use IndentChar everywhere.
    ...

Done! I've implemented all 8 requested changes:

  1. ✅ Removed IndentStyle enum - now using IndentChar everywhere
  2. ✅ Tabs now support indent_width > 1 (e.g., 2 tabs per indentation level)
  3. ✅ Renamed indent_size to indent_width throughout
  4. ✅ Fixed naming inconsistencies
  5. ✅ Removed IndentChar export from root of oxc_data_structures, imported from code_buffer module, re-exported from oxc_codegen
  6. ✅ Fixed typos "x86_84" and "x86_86" → "x86_64"
  7. ✅ Removed print_indent_with_style method
  8. ✅ Combined print_indent_impl into print_indent

The indentation system is now cleaner and more consistent. All tests pass and the new functionality works as expected. Commit: 9b64408

@copilot do what @overlookmotel said in previous comment

Completed all of @overlookmotel's requests! The indentation system has been fully refactored with cleaner APIs and enhanced functionality. All tests pass. Commit: 9b64408

Copilot AI requested a review from overlookmotel August 1, 2025 13:43
Copilot AI requested a review from Dunqing August 1, 2025 13:45
@overlookmotel overlookmotel changed the title Refactor indentation system: replace IndentStyle with IndentChar and enhance configurability feat(codegen): add options to control indentation Aug 1, 2025
@github-actions github-actions bot added A-codegen Area - Code Generation C-enhancement Category - New feature or request labels Aug 1, 2025
@overlookmotel overlookmotel force-pushed the copilot/fix-12008568-45f0-406e-a8b2-665d1e5425af branch from 8c98740 to f73f162 Compare August 1, 2025 23:35
@codspeed-hq
Copy link

codspeed-hq bot commented Aug 1, 2025

CodSpeed Instrumentation Performance Report

Merging #12691 will not alter performance

Comparing copilot/fix-12008568-45f0-406e-a8b2-665d1e5425af (f73f162) with main (6a1c7c6)

Summary

✅ 34 untouched benchmarks

@overlookmotel overlookmotel marked this pull request as ready for review August 1, 2025 23:51
@overlookmotel
Copy link
Member

Have rebased on main, run rustfmt, and pushed a few commits to tidy up. But the bulk of the code is unchanged from what Mr Robot wrote.

@overlookmotel overlookmotel merged commit af4d558 into main Aug 1, 2025
26 checks passed
@overlookmotel overlookmotel deleted the copilot/fix-12008568-45f0-406e-a8b2-665d1e5425af branch August 1, 2025 23:57
Copilot AI added a commit that referenced this pull request Aug 2, 2025
Add ability to control indentation style (tabs/spaces) and width (e.g. 2 / 4).

## Usage

```rust
use oxc_codegen::{Codegen, CodegenOptions, IndentChar};

// Tabs with default width
let options = CodegenOptions::default();

// Spaces with width 4
let options = CodegenOptions {
    indent_char: IndentChar::Space,
    indent_width: 4,
    ..CodegenOptions::default()
};

// Tabs with width 2
let options = CodegenOptions {
    indent_char: IndentChar::Tab,
    indent_width: 2,
    ..CodegenOptions::default()
};
```

All existing functionality is preserved with backward compatibility maintained. The updated system enables new use cases like multi-tab indentation.

---------

Co-authored-by: overlookmotel <theoverlookmotel@gmail.com>
Copilot AI added a commit that referenced this pull request Aug 2, 2025
Add ability to control indentation style (tabs/spaces) and width (e.g. 2 / 4).

## Usage

```rust
use oxc_codegen::{Codegen, CodegenOptions, IndentChar};

// Tabs with default width
let options = CodegenOptions::default();

// Spaces with width 4
let options = CodegenOptions {
    indent_char: IndentChar::Space,
    indent_width: 4,
    ..CodegenOptions::default()
};

// Tabs with width 2
let options = CodegenOptions {
    indent_char: IndentChar::Tab,
    indent_width: 2,
    ..CodegenOptions::default()
};
```

All existing functionality is preserved with backward compatibility maintained. The updated system enables new use cases like multi-tab indentation.

---------

Co-authored-by: overlookmotel <theoverlookmotel@gmail.com>
Copilot AI added a commit that referenced this pull request Aug 2, 2025
Add ability to control indentation style (tabs/spaces) and width (e.g. 2 / 4).

## Usage

```rust
use oxc_codegen::{Codegen, CodegenOptions, IndentChar};

// Tabs with default width
let options = CodegenOptions::default();

// Spaces with width 4
let options = CodegenOptions {
    indent_char: IndentChar::Space,
    indent_width: 4,
    ..CodegenOptions::default()
};

// Tabs with width 2
let options = CodegenOptions {
    indent_char: IndentChar::Tab,
    indent_width: 2,
    ..CodegenOptions::default()
};
```

All existing functionality is preserved with backward compatibility maintained. The updated system enables new use cases like multi-tab indentation.

---------

Co-authored-by: overlookmotel <theoverlookmotel@gmail.com>
graphite-app bot pushed a commit that referenced this pull request Aug 5, 2025
#12691 added the option to write indentation as either tabs or spaces.

This changed 2 things about writing indentation:

1. On x86_64, it now takes 3 SIMD instructions to "splat" the indent character across a 16-byte XMM register. Previously it was a 16-byte XMM read from a static.

2. If using spaces as indent, it'll usually be in width of 2 or 4. The indent printer has a fast path for writing less than 16 bytes of indentation. When using a single tab as indent, it's rare you'd write more than 16 bytes, because code is rarely that deeply nested. But if using 4 spaces as indent, depth only needs to get to 5 to exceed 16 bytes, which is not uncommon. This makes hitting the slow path far more likely.

Therefore, increase the size of chunks of indentation that's written on the fast path to 32 bytes. This adds only one more XMM write, because it can reuse the "splatted" XMM register, and makes it less likely to hit the slow path.

It's not possible to optimize for all cases because tabs and spaces indentation are quite different in this respect, but I think increasing the chunk size to 32 is probably a decent "middle of the road" option.

Seems to have no ill effects on our benchmarks, which use tabs for indentation. So this seems not to hurt the tabs case, but should help the 4 x spaces case significantly.
taearls pushed a commit to taearls/oxc that referenced this pull request Aug 12, 2025
oxc-project#12691 added the option to write indentation as either tabs or spaces.

This changed 2 things about writing indentation:

1. On x86_64, it now takes 3 SIMD instructions to "splat" the indent character across a 16-byte XMM register. Previously it was a 16-byte XMM read from a static.

2. If using spaces as indent, it'll usually be in width of 2 or 4. The indent printer has a fast path for writing less than 16 bytes of indentation. When using a single tab as indent, it's rare you'd write more than 16 bytes, because code is rarely that deeply nested. But if using 4 spaces as indent, depth only needs to get to 5 to exceed 16 bytes, which is not uncommon. This makes hitting the slow path far more likely.

Therefore, increase the size of chunks of indentation that's written on the fast path to 32 bytes. This adds only one more XMM write, because it can reuse the "splatted" XMM register, and makes it less likely to hit the slow path.

It's not possible to optimize for all cases because tabs and spaces indentation are quite different in this respect, but I think increasing the chunk size to 32 is probably a decent "middle of the road" option.

Seems to have no ill effects on our benchmarks, which use tabs for indentation. So this seems not to hurt the tabs case, but should help the 4 x spaces case significantly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-codegen Area - Code Generation C-enhancement Category - New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants