Skip to content
This repository has been archived by the owner on Aug 31, 2023. It is now read-only.

perf(rome_js_formatter): Reduce the String allocations for Tokens #2462

Merged
merged 3 commits into from
Apr 20, 2022

Conversation

MichaReiser
Copy link
Contributor

This PR reduces the amount of String allocation necessary for FormatElement::Tokens by
making use of the observation that most tokens match the text of a SyntaxToken. For example,
identifiers, or punctuation tokens are kept by the formatter as is. This is even true for string literal tokens if they already use the right quotes.

The way this is implemented is by introducing a new SyntaxTokenText that is Send + Sync and allows referencing a slice in a SyntaxToken without worrying about the &str's lifetime. The PR further extends FormatElement::Token to make use of this new introduced SyntaxTokenText.

This change reduces overall memory consumption and improves performance:

group                                    format-element                         token
-----                                    --------------                         -----
formatter/checker.ts                     1.04    250.2±5.03ms    10.4 MB/sec    1.00    239.5±1.76ms    10.9 MB/sec
formatter/compiler.js                    1.07    145.6±1.30ms     7.2 MB/sec    1.00    136.5±1.43ms     7.7 MB/sec
formatter/d3.min.js                      1.07    117.4±3.70ms     2.2 MB/sec    1.00    109.6±1.24ms     2.4 MB/sec
formatter/dojo.js                        1.03      7.4±0.15ms     9.2 MB/sec    1.00      7.2±0.03ms     9.5 MB/sec
formatter/ios.d.ts                       1.05    181.2±1.95ms    10.3 MB/sec    1.00    172.8±2.23ms    10.8 MB/sec
formatter/jquery.min.js                  1.02     29.1±0.55ms     2.8 MB/sec    1.00     28.5±0.07ms     2.9 MB/sec
formatter/math.js                        1.05    233.1±4.69ms     2.8 MB/sec    1.00    222.8±1.79ms     2.9 MB/sec
formatter/parser.ts                      1.03      5.3±0.15ms     9.2 MB/sec    1.00      5.1±0.01ms     9.5 MB/sec
formatter/pixi.min.js                    1.10    131.0±7.11ms     3.3 MB/sec    1.00    119.3±2.12ms     3.7 MB/sec
formatter/react-dom.production.min.js    1.07     37.0±0.82ms     3.1 MB/sec    1.00     34.5±0.21ms     3.3 MB/sec
formatter/react.production.min.js        1.08  1825.1±57.85µs     3.4 MB/sec    1.00  1683.8±30.49µs     3.7 MB/sec
formatter/router.ts                      1.02      3.7±0.09ms    16.2 MB/sec    1.00      3.6±0.01ms    16.6 MB/sec
formatter/tex-chtml-full.js              1.05    288.3±5.19ms     3.2 MB/sec    1.00    273.4±1.29ms     3.3 MB/sec
formatter/three.min.js                   1.11    155.7±3.79ms     3.8 MB/sec    1.00    139.7±1.76ms     4.2 MB/sec
formatter/typescript.js                  1.04    945.2±6.64ms    10.1 MB/sec    1.00    909.3±7.16ms    10.4 MB/sec
formatter/vue.global.prod.js             1.07     49.1±1.49ms     2.5 MB/sec    1.00     45.8±0.20ms     2.6 MB/sec

@MichaReiser MichaReiser requested a review from leops April 19, 2022 06:10
@@ -1125,6 +1127,13 @@ pub enum Token {
// The position of the dynamic token in the unformatted source code
source_position: TextSize,
},
// A token that is taken 1:1 from the source code
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// A token that is taken 1:1 from the source code
/// A token that is taken 1:1 from the source code

What does "taken" mean exactly here? Computed? Extracted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can try to improve the existing documentation. It was neither computed nor extracted. It just means it's a text that we keep as is, the formatting doesn't change it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "borrowed" could be a more familiar wording, although in Rust it implies some lifetime semantics that are not present here

crates/rome_rowan/src/syntax_token_text.rs Show resolved Hide resolved
crates/rome_rowan/src/syntax/token.rs Show resolved Hide resolved
crates/rome_rowan/src/cursor/token.rs Show resolved Hide resolved
crates/rome_formatter/src/format_element.rs Outdated Show resolved Hide resolved
crates/rome_formatter/src/format_element.rs Outdated Show resolved Hide resolved
crates/rome_formatter/src/format_element.rs Show resolved Hide resolved
crates/rome_formatter/src/format_element.rs Outdated Show resolved Hide resolved
crates/rome_js_formatter/src/utils/mod.rs Show resolved Hide resolved
@@ -1125,6 +1127,13 @@ pub enum Token {
// The position of the dynamic token in the unformatted source code
source_position: TextSize,
},
// A token that is taken 1:1 from the source code
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "borrowed" could be a more familiar wording, although in Rust it implies some lifetime semantics that are not present here

crates/rome_formatter/src/format_element.rs Outdated Show resolved Hide resolved
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Apr 19, 2022

Deploying with  Cloudflare Pages  Cloudflare Pages

Latest commit: a93522b
Status: ✅  Deploy successful!
Preview URL: https://57a4bc55.tools-8rn.pages.dev

View logs

@MichaReiser MichaReiser requested review from ematipico and leops April 20, 2022 12:28
Base automatically changed from perf/concat-elements to main April 20, 2022 14:01
This PR reduces the amount of `String` allocation necessary for `FormatElement::Token`s by
making use of the observation that most tokens match the text of a `SyntaxToken`. For example,
identifiers, or punctuation tokens are kept by the formatter as is. This is even true for string literal tokens if they already use the right quotes.

The way this is implemented is by introducing a new `SyntaxTokenText` that is `Send + Sync` and allows referencing a slice in a `SyntaxToken` without worrying about the `&str`'s lifetime. The PR further extends `FormatElement::Token` to make use of this new introduced `SyntaxTokenText`.

This change reduces overall memory consumption and improves performance:

```
group                                    format-element                         token
-----                                    --------------                         -----
formatter/checker.ts                     1.04    250.2±5.03ms    10.4 MB/sec    1.00    239.5±1.76ms    10.9 MB/sec
formatter/compiler.js                    1.07    145.6±1.30ms     7.2 MB/sec    1.00    136.5±1.43ms     7.7 MB/sec
formatter/d3.min.js                      1.07    117.4±3.70ms     2.2 MB/sec    1.00    109.6±1.24ms     2.4 MB/sec
formatter/dojo.js                        1.03      7.4±0.15ms     9.2 MB/sec    1.00      7.2±0.03ms     9.5 MB/sec
formatter/ios.d.ts                       1.05    181.2±1.95ms    10.3 MB/sec    1.00    172.8±2.23ms    10.8 MB/sec
formatter/jquery.min.js                  1.02     29.1±0.55ms     2.8 MB/sec    1.00     28.5±0.07ms     2.9 MB/sec
formatter/math.js                        1.05    233.1±4.69ms     2.8 MB/sec    1.00    222.8±1.79ms     2.9 MB/sec
formatter/parser.ts                      1.03      5.3±0.15ms     9.2 MB/sec    1.00      5.1±0.01ms     9.5 MB/sec
formatter/pixi.min.js                    1.10    131.0±7.11ms     3.3 MB/sec    1.00    119.3±2.12ms     3.7 MB/sec
formatter/react-dom.production.min.js    1.07     37.0±0.82ms     3.1 MB/sec    1.00     34.5±0.21ms     3.3 MB/sec
formatter/react.production.min.js        1.08  1825.1±57.85µs     3.4 MB/sec    1.00  1683.8±30.49µs     3.7 MB/sec
formatter/router.ts                      1.02      3.7±0.09ms    16.2 MB/sec    1.00      3.6±0.01ms    16.6 MB/sec
formatter/tex-chtml-full.js              1.05    288.3±5.19ms     3.2 MB/sec    1.00    273.4±1.29ms     3.3 MB/sec
formatter/three.min.js                   1.11    155.7±3.79ms     3.8 MB/sec    1.00    139.7±1.76ms     4.2 MB/sec
formatter/typescript.js                  1.04    945.2±6.64ms    10.1 MB/sec    1.00    909.3±7.16ms    10.4 MB/sec
formatter/vue.global.prod.js             1.07     49.1±1.49ms     2.5 MB/sec    1.00     45.8±0.20ms     2.6 MB/sec

```
@MichaReiser MichaReiser force-pushed the perf/syntax-token-slice branch from 245c0ea to a93522b Compare April 20, 2022 14:03
@MichaReiser MichaReiser temporarily deployed to aws April 20, 2022 14:04 Inactive
@github-actions
Copy link

Parser conformance results on ubuntu-latest

js/262

Test result main count This PR count Difference
Total 45323 45323 0
Passed 44383 44383 0
Failed 940 940 0
Panics 0 0 0
Coverage 97.93% 97.93% 0.00%

jsx/babel

Test result main count This PR count Difference
Total 39 39 0
Passed 36 36 0
Failed 3 3 0
Panics 0 0 0
Coverage 92.31% 92.31% 0.00%

ts/babel

Test result main count This PR count Difference
Total 588 588 0
Passed 519 519 0
Failed 69 69 0
Panics 0 0 0
Coverage 88.27% 88.27% 0.00%

ts/microsoft

Test result main count This PR count Difference
Total 16171 16171 0
Passed 12315 12315 0
Failed 3856 3856 0
Panics 0 0 0
Coverage 76.15% 76.15% 0.00%

@github-actions
Copy link

@MichaReiser MichaReiser merged commit 6cdcad6 into main Apr 20, 2022
@MichaReiser MichaReiser deleted the perf/syntax-token-slice branch April 20, 2022 14:16
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants