Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple lexer for formatter #4922

Merged
merged 1 commit into from
Jun 8, 2023
Merged

Simple lexer for formatter #4922

merged 1 commit into from
Jun 8, 2023

Conversation

MichaReiser
Copy link
Member

Summary

This PR introduces a simple zero-allocation lexer specific for the needs of the formatter.
It supports a very limited set of tokens for optimized performance.

The main reason for introducing a lexer is that Charlie correctly pointed out that the current implementation for testing if an expression is parenthesized is not sufficient.
Having a more proper lexer will simplify testing for the right tokens.

Test Plan

cargo test, new unit tests

@github-actions
Copy link
Contributor

github-actions bot commented Jun 7, 2023

PR Check Results

Ecosystem

✅ ecosystem check detected no changes.

Benchmark

Linux

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.00      5.9±0.01ms     6.9 MB/sec    1.01      5.9±0.01ms     6.9 MB/sec
formatter/numpy/ctypeslib.py               1.00   1168.5±1.88µs    14.2 MB/sec    1.00   1174.1±3.30µs    14.2 MB/sec
formatter/numpy/globals.py                 1.00    137.6±0.24µs    21.4 MB/sec    1.01    138.9±1.45µs    21.2 MB/sec
formatter/pydantic/types.py                1.00      2.6±0.01ms     9.7 MB/sec    1.00      2.6±0.01ms     9.7 MB/sec
linter/all-rules/large/dataset.py          1.01     15.2±0.05ms     2.7 MB/sec    1.00     15.0±0.16ms     2.7 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.01      3.6±0.00ms     4.6 MB/sec    1.00      3.6±0.02ms     4.6 MB/sec
linter/all-rules/numpy/globals.py          1.00    366.9±1.05µs     8.0 MB/sec    1.00    365.4±7.79µs     8.1 MB/sec
linter/all-rules/pydantic/types.py         1.01      6.3±0.02ms     4.0 MB/sec    1.00      6.2±0.02ms     4.1 MB/sec
linter/default-rules/large/dataset.py      1.03      7.6±0.01ms     5.4 MB/sec    1.00      7.3±0.01ms     5.5 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.02   1572.0±2.69µs    10.6 MB/sec    1.00   1536.9±2.72µs    10.8 MB/sec
linter/default-rules/numpy/globals.py      1.02    167.9±0.37µs    17.6 MB/sec    1.00    165.4±0.51µs    17.8 MB/sec
linter/default-rules/pydantic/types.py     1.03      3.4±0.01ms     7.6 MB/sec    1.00      3.3±0.00ms     7.8 MB/sec

Windows

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.34     11.1±0.38ms     3.7 MB/sec    1.00      8.3±0.31ms     4.9 MB/sec
formatter/numpy/ctypeslib.py               1.29      2.1±0.08ms     8.0 MB/sec    1.00  1615.3±68.19µs    10.3 MB/sec
formatter/numpy/globals.py                 1.15    220.5±9.28µs    13.4 MB/sec    1.00   192.2±13.95µs    15.3 MB/sec
formatter/pydantic/types.py                1.27      4.7±0.20ms     5.5 MB/sec    1.00      3.7±0.17ms     6.9 MB/sec
linter/all-rules/large/dataset.py          1.00     20.1±0.78ms     2.0 MB/sec    1.09     22.0±0.87ms  1892.6 KB/sec
linter/all-rules/numpy/ctypeslib.py        1.00      5.2±0.25ms     3.2 MB/sec    1.03      5.3±0.19ms     3.1 MB/sec
linter/all-rules/numpy/globals.py          1.00   602.2±29.52µs     4.9 MB/sec    1.03   617.3±54.08µs     4.8 MB/sec
linter/all-rules/pydantic/types.py         1.00      8.5±0.34ms     3.0 MB/sec    1.06      9.0±0.27ms     2.8 MB/sec
linter/default-rules/large/dataset.py      1.00      9.8±0.28ms     4.1 MB/sec    1.16     11.3±0.42ms     3.6 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.00      2.1±0.07ms     8.1 MB/sec    1.13      2.3±0.09ms     7.2 MB/sec
linter/default-rules/numpy/globals.py      1.00   248.7±12.40µs    11.9 MB/sec    1.05   259.9±11.77µs    11.4 MB/sec
linter/default-rules/pydantic/types.py     1.00      4.4±0.16ms     5.8 MB/sec    1.12      4.9±0.20ms     5.2 MB/sec

Other,

/// Returned for each character after [`TokenKind::Other`] has been returned once.
Bogus,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we name this Unknown instead?

@MichaReiser MichaReiser force-pushed the trailing-comment-before-func-or-class branch from bb0517f to 2448287 Compare June 7, 2023 14:14
@MichaReiser MichaReiser closed this Jun 7, 2023
@MichaReiser MichaReiser reopened this Jun 7, 2023
@MichaReiser MichaReiser force-pushed the trailing-comment-before-func-or-class branch from 2448287 to 5a9709d Compare June 8, 2023 05:51
@MichaReiser MichaReiser mentioned this pull request Jun 8, 2023
@MichaReiser MichaReiser added internal An internal refactor or improvement formatter Related to the formatter labels Jun 8, 2023
let tokens = SimpleTokenizer::up_to(offset, code);
let mut newlines = 0u32;

for token in tokens.rev() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the main benefits is that we no longer repeat the same logic over and over again (and e.g. correctly handle continuation tokens)

@MichaReiser MichaReiser force-pushed the trailing-comment-before-func-or-class branch from 5a9709d to 894b045 Compare June 8, 2023 05:57
@MichaReiser MichaReiser force-pushed the trailing-comment-before-func-or-class branch from 894b045 to f5240f7 Compare June 8, 2023 10:58
@MichaReiser MichaReiser force-pushed the trailing-comment-before-func-or-class branch from f5240f7 to 1b58a53 Compare June 8, 2023 11:05
@MichaReiser MichaReiser force-pushed the trailing-comment-before-func-or-class branch from 1b58a53 to 6a13e21 Compare June 8, 2023 11:44
Base automatically changed from trailing-comment-before-func-or-class to main June 8, 2023 12:50
@MichaReiser
Copy link
Member Author

@charliermarsh let me know if you have time to look at this PR. I'll otherwise go ahead and merge it, as there's already a lot of code depending on it

@charliermarsh
Copy link
Member

@MichaReiser - Go for it, I won't have time for a few hours.

@MichaReiser MichaReiser mentioned this pull request Jun 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
formatter Related to the formatter internal An internal refactor or improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants