Simple lexer for formatter #4922

MichaReiser · 2023-06-07T10:45:35Z

Summary

This PR introduces a simple zero-allocation lexer specific for the needs of the formatter.
It supports a very limited set of tokens for optimized performance.

The main reason for introducing a lexer is that Charlie correctly pointed out that the current implementation for testing if an expression is parenthesized is not sufficient.
Having a more proper lexer will simplify testing for the right tokens.

Test Plan

cargo test, new unit tests

MichaReiser · 2023-06-07T10:45:53Z

Current dependencies on/for this PR:

main
- PR Add basic Constant formatting #4954
  - PR Trailing own line comments before func or class #4921
    - PR Simple lexer for formatter #4922 👈
      - PR Format Function definitions #4951
        
        PR Format if statements #4961
        
        PR Fix binary expression formatting with leading comments #4964
        
        PR Track formatted comments #4979
        
        PR perf(formatter): Skip bodies without comments #4978

This comment was auto-generated by Graphite.

github-actions · 2023-06-07T11:06:23Z

PR Check Results

Ecosystem

✅ ecosystem check detected no changes.

Benchmark

Linux

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.00      5.9±0.01ms     6.9 MB/sec    1.01      5.9±0.01ms     6.9 MB/sec
formatter/numpy/ctypeslib.py               1.00   1168.5±1.88µs    14.2 MB/sec    1.00   1174.1±3.30µs    14.2 MB/sec
formatter/numpy/globals.py                 1.00    137.6±0.24µs    21.4 MB/sec    1.01    138.9±1.45µs    21.2 MB/sec
formatter/pydantic/types.py                1.00      2.6±0.01ms     9.7 MB/sec    1.00      2.6±0.01ms     9.7 MB/sec
linter/all-rules/large/dataset.py          1.01     15.2±0.05ms     2.7 MB/sec    1.00     15.0±0.16ms     2.7 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.01      3.6±0.00ms     4.6 MB/sec    1.00      3.6±0.02ms     4.6 MB/sec
linter/all-rules/numpy/globals.py          1.00    366.9±1.05µs     8.0 MB/sec    1.00    365.4±7.79µs     8.1 MB/sec
linter/all-rules/pydantic/types.py         1.01      6.3±0.02ms     4.0 MB/sec    1.00      6.2±0.02ms     4.1 MB/sec
linter/default-rules/large/dataset.py      1.03      7.6±0.01ms     5.4 MB/sec    1.00      7.3±0.01ms     5.5 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.02   1572.0±2.69µs    10.6 MB/sec    1.00   1536.9±2.72µs    10.8 MB/sec
linter/default-rules/numpy/globals.py      1.02    167.9±0.37µs    17.6 MB/sec    1.00    165.4±0.51µs    17.8 MB/sec
linter/default-rules/pydantic/types.py     1.03      3.4±0.01ms     7.6 MB/sec    1.00      3.3±0.00ms     7.8 MB/sec

Windows

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.34     11.1±0.38ms     3.7 MB/sec    1.00      8.3±0.31ms     4.9 MB/sec
formatter/numpy/ctypeslib.py               1.29      2.1±0.08ms     8.0 MB/sec    1.00  1615.3±68.19µs    10.3 MB/sec
formatter/numpy/globals.py                 1.15    220.5±9.28µs    13.4 MB/sec    1.00   192.2±13.95µs    15.3 MB/sec
formatter/pydantic/types.py                1.27      4.7±0.20ms     5.5 MB/sec    1.00      3.7±0.17ms     6.9 MB/sec
linter/all-rules/large/dataset.py          1.00     20.1±0.78ms     2.0 MB/sec    1.09     22.0±0.87ms  1892.6 KB/sec
linter/all-rules/numpy/ctypeslib.py        1.00      5.2±0.25ms     3.2 MB/sec    1.03      5.3±0.19ms     3.1 MB/sec
linter/all-rules/numpy/globals.py          1.00   602.2±29.52µs     4.9 MB/sec    1.03   617.3±54.08µs     4.8 MB/sec
linter/all-rules/pydantic/types.py         1.00      8.5±0.34ms     3.0 MB/sec    1.06      9.0±0.27ms     2.8 MB/sec
linter/default-rules/large/dataset.py      1.00      9.8±0.28ms     4.1 MB/sec    1.16     11.3±0.42ms     3.6 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.00      2.1±0.07ms     8.1 MB/sec    1.13      2.3±0.09ms     7.2 MB/sec
linter/default-rules/numpy/globals.py      1.00   248.7±12.40µs    11.9 MB/sec    1.05   259.9±11.77µs    11.4 MB/sec
linter/default-rules/pydantic/types.py     1.00      4.4±0.16ms     5.8 MB/sec    1.12      4.9±0.20ms     5.2 MB/sec

MichaReiser · 2023-06-07T11:08:43Z

crates/ruff_python_formatter/src/trivia.rs

+    Other,
+
+    /// Returned for each character after [`TokenKind::Other`] has been returned once.
+    Bogus,


Should we name this Unknown instead?

MichaReiser · 2023-06-08T05:54:51Z

crates/ruff_python_formatter/src/trivia.rs

+    let tokens = SimpleTokenizer::up_to(offset, code);
+    let mut newlines = 0u32;
+
+    for token in tokens.rev() {


One of the main benefits is that we no longer repeat the same logic over and over again (and e.g. correctly handle continuation tokens)

MichaReiser · 2023-06-08T13:39:05Z

@charliermarsh let me know if you have time to look at this PR. I'll otherwise go ahead and merge it, as there's already a lot of code depending on it

charliermarsh · 2023-06-08T13:39:50Z

@MichaReiser - Go for it, I won't have time for a few hours.

This was referenced Jun 7, 2023

Format binary expressions #4862

Merged

Correctly handle newlines after/before comments #4895

Merged

Replace verbatim text with NOT_YET_IMPLEMENTED #4904

Merged

Trailing own line comments before func or class #4921

Merged

MichaReiser force-pushed the simple-lexer branch from bb61932 to 581f2ed Compare June 7, 2023 10:51

MichaReiser force-pushed the simple-lexer branch from 581f2ed to 3b99dfc Compare June 7, 2023 11:08

MichaReiser commented Jun 7, 2023

View reviewed changes

MichaReiser force-pushed the trailing-comment-before-func-or-class branch from bb0517f to 2448287 Compare June 7, 2023 14:14

MichaReiser force-pushed the simple-lexer branch from 3b99dfc to 0943ad1 Compare June 7, 2023 14:14

MichaReiser closed this Jun 7, 2023

MichaReiser reopened this Jun 7, 2023

MichaReiser force-pushed the trailing-comment-before-func-or-class branch from 2448287 to 5a9709d Compare June 8, 2023 05:51

MichaReiser force-pushed the simple-lexer branch from 0943ad1 to 34fcbec Compare June 8, 2023 05:52

MichaReiser mentioned this pull request Jun 8, 2023

Upgrade RustPython #4900

Merged

MichaReiser added internal An internal refactor or improvement formatter Related to the formatter labels Jun 8, 2023

MichaReiser commented Jun 8, 2023

View reviewed changes

MichaReiser force-pushed the trailing-comment-before-func-or-class branch from 5a9709d to 894b045 Compare June 8, 2023 05:57

MichaReiser force-pushed the simple-lexer branch from 34fcbec to 62f994d Compare June 8, 2023 05:57

MichaReiser mentioned this pull request Jun 8, 2023

Format Function definitions #4951

Merged

MichaReiser force-pushed the trailing-comment-before-func-or-class branch from 894b045 to f5240f7 Compare June 8, 2023 10:58

MichaReiser force-pushed the simple-lexer branch from 62f994d to a01666a Compare June 8, 2023 10:58

MichaReiser mentioned this pull request Jun 8, 2023

Add basic Constant formatting #4954

Merged

MichaReiser force-pushed the trailing-comment-before-func-or-class branch from f5240f7 to 1b58a53 Compare June 8, 2023 11:05

MichaReiser force-pushed the simple-lexer branch from a01666a to 9df0377 Compare June 8, 2023 11:05

MichaReiser force-pushed the trailing-comment-before-func-or-class branch from 1b58a53 to 6a13e21 Compare June 8, 2023 11:44

Base automatically changed from trailing-comment-before-func-or-class to main June 8, 2023 12:50

MichaReiser force-pushed the simple-lexer branch from 9df0377 to e6a75c7 Compare June 8, 2023 13:04

MichaReiser requested a review from charliermarsh June 8, 2023 13:16

MichaReiser mentioned this pull request Jun 8, 2023

Format if statements #4961

Merged

Simple lexer for formatter

153fd8c

MichaReiser force-pushed the simple-lexer branch from e6a75c7 to 153fd8c Compare June 8, 2023 15:27

MichaReiser mentioned this pull request Jun 8, 2023

Fix binary expression formatting with leading comments #4964

Merged

MichaReiser merged commit 9c3fb23 into main Jun 8, 2023

MichaReiser deleted the simple-lexer branch June 8, 2023 15:37

This was referenced Jun 9, 2023

perf(formatter): Skip bodies without comments #4978

Merged

Track formatted comments #4979

Merged

konstin pushed a commit that referenced this pull request Jun 13, 2023

Simple lexer for formatter (#4922)

4cbe1a4

zanieb mentioned this pull request Jun 29, 2024

Should cursor credit rustc? #12107

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simple lexer for formatter #4922

Simple lexer for formatter #4922

MichaReiser commented Jun 7, 2023

MichaReiser commented Jun 7, 2023 •

edited

Loading

github-actions bot commented Jun 7, 2023 •

edited

Loading

MichaReiser Jun 7, 2023

MichaReiser Jun 8, 2023

MichaReiser commented Jun 8, 2023

charliermarsh commented Jun 8, 2023

Simple lexer for formatter #4922

Simple lexer for formatter #4922

Conversation

MichaReiser commented Jun 7, 2023

Summary

Test Plan

MichaReiser commented Jun 7, 2023 • edited Loading

github-actions bot commented Jun 7, 2023 • edited Loading

PR Check Results

Ecosystem

Benchmark

Linux

Windows

MichaReiser Jun 7, 2023

Choose a reason for hiding this comment

MichaReiser Jun 8, 2023

Choose a reason for hiding this comment

MichaReiser commented Jun 8, 2023

charliermarsh commented Jun 8, 2023

MichaReiser commented Jun 7, 2023 •

edited

Loading

github-actions bot commented Jun 7, 2023 •

edited

Loading