Skip to content

Use narrow type for string parsing patterns#7211

Merged
dhruvmanila merged 20 commits intodhruv/pep-701from
dhruv/fstring-parser-2
Sep 14, 2023
Merged

Use narrow type for string parsing patterns#7211
dhruvmanila merged 20 commits intodhruv/pep-701from
dhruv/fstring-parser-2

Conversation

@dhruvmanila
Copy link
Member

@dhruvmanila dhruvmanila commented Sep 7, 2023

Summary

This PR adds a new enum type StringType which is either a string literal, byte literal or f-string. The motivation behind this is to have a narrow type which is accepted in concatenate_strings as that function is only applicable for the mentioned 3 types. This makes the code more readable and easy to reason about.

A future improvement (which was prototyped here and removed) is to split the current string literal pattern in LALRPOP definition into two parts:

  1. A single string literal or a f-string: This means no checking for bytes / non-bytes and other unnecessary compution
  2. Two or more of string/byte/f-string: This will call the concatenate_strings function.

The reason for removing the second change is because of how ranges work. The range for an individual string/byte is the entire range which includes the quotes as well but if the same string/byte is part of a f-string, then it only includes the range for the content (without the quotes / inner range). The current string parser returns with the former range.

To give an example, for "foo", the range of the string would be 0..5, but for f"foo" the range of the string would be 2..5 while the range for the f-string expression would be 0..6. The ranges are correct but they differ in the context the string constant itself is being used for. Is it part of a f-string or is it a standalone string?

Test Plan

cargo test

@dhruvmanila dhruvmanila force-pushed the dhruv/fstring-parser-2 branch 3 times, most recently from 90844fd to bddcd81 Compare September 7, 2023 07:21
@dhruvmanila dhruvmanila added the parser Related to the parser label Sep 7, 2023
@dhruvmanila dhruvmanila force-pushed the dhruv/fstring-parser-2 branch 2 times, most recently from 0f3792b to c1dd71e Compare September 7, 2023 12:19
@dhruvmanila dhruvmanila changed the title Use narrow types for string parsing patterns Use narrow type for string parsing patterns Sep 7, 2023
@dhruvmanila dhruvmanila force-pushed the dhruv/fstring-parser-2 branch 2 times, most recently from 0d6dd65 to b4505db Compare September 7, 2023 12:30
@dhruvmanila dhruvmanila marked this pull request as ready for review September 7, 2023 12:31
@codspeed-hq
Copy link

codspeed-hq bot commented Sep 14, 2023

CodSpeed Performance Report

Merging #7211 will degrade performances by 9.7%

⚠️ No base runs were found

Falling back to comparing dhruv/fstring-parser-2 (5f17468) with main (04183b0)

Summary

❌ 8 regressions
✅ 17 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main dhruv/fstring-parser-2 Change
lexer[numpy/globals.py] 233.2 µs 252.9 µs -7.8%
lexer[large/dataset.py] 9.8 ms 10.7 ms -8.4%
lexer[pydantic/types.py] 4.1 ms 4.6 ms -9.7%
parser[numpy/ctypeslib.py] 12.4 ms 12.7 ms -2.31%
parser[large/dataset.py] 68.4 ms 70 ms -2.29%
lexer[numpy/ctypeslib.py] 2 ms 2.1 ms -7.71%
lexer[unicode/pypinyin.py] 620.2 µs 673.4 µs -7.89%
parser[unicode/pypinyin.py] 4.3 ms 4.4 ms -2.53%

Base automatically changed from dhruv/fstring-parser to dhruv/pep-701 September 14, 2023 02:00
@dhruvmanila dhruvmanila merged commit 8993443 into dhruv/pep-701 Sep 14, 2023
@dhruvmanila dhruvmanila deleted the dhruv/fstring-parser-2 branch September 14, 2023 02:07
@dhruvmanila dhruvmanila linked an issue Sep 15, 2023 that may be closed by this pull request
dhruvmanila added a commit that referenced this pull request Sep 18, 2023
## Summary

This PR adds a new enum type `StringType` which is either a string
literal, byte literal or f-string. The motivation behind this is to have
a narrow type which is accepted in `concatenate_strings` as that
function is only applicable for the mentioned 3 types. This makes the
code more readable and easy to reason about.

A future improvement (which was prototyped here and removed) is to split
the current string literal pattern in LALRPOP definition into two parts:
1. A single string literal or a f-string: This means no checking for
bytes / non-bytes and other unnecessary compution
2. Two or more of string/byte/f-string: This will call the
`concatenate_strings` function.

The reason for removing the second change is because of how ranges work.
The range for an individual string/byte is the entire range which
includes the quotes as well but if the same string/byte is part of a
f-string, then it only includes the range for the content (without the
quotes / inner range). The current string parser returns with the former
range.

To give an example, for `"foo"`, the range of the string would be
`0..5`, but for `f"foo"` the range of the string would be `2..5` while
the range for the f-string expression would be `0..6`. The ranges are
correct but they differ in the context the string constant itself is
being used for. Is it part of a f-string or is it a standalone string?

## Test Plan

`cargo test`
dhruvmanila added a commit that referenced this pull request Sep 19, 2023
## Summary

This PR adds a new enum type `StringType` which is either a string
literal, byte literal or f-string. The motivation behind this is to have
a narrow type which is accepted in `concatenate_strings` as that
function is only applicable for the mentioned 3 types. This makes the
code more readable and easy to reason about.

A future improvement (which was prototyped here and removed) is to split
the current string literal pattern in LALRPOP definition into two parts:
1. A single string literal or a f-string: This means no checking for
bytes / non-bytes and other unnecessary compution
2. Two or more of string/byte/f-string: This will call the
`concatenate_strings` function.

The reason for removing the second change is because of how ranges work.
The range for an individual string/byte is the entire range which
includes the quotes as well but if the same string/byte is part of a
f-string, then it only includes the range for the content (without the
quotes / inner range). The current string parser returns with the former
range.

To give an example, for `"foo"`, the range of the string would be
`0..5`, but for `f"foo"` the range of the string would be `2..5` while
the range for the f-string expression would be `0..6`. The ranges are
correct but they differ in the context the string constant itself is
being used for. Is it part of a f-string or is it a standalone string?

## Test Plan

`cargo test`
dhruvmanila added a commit that referenced this pull request Sep 20, 2023
## Summary

This PR adds a new enum type `StringType` which is either a string
literal, byte literal or f-string. The motivation behind this is to have
a narrow type which is accepted in `concatenate_strings` as that
function is only applicable for the mentioned 3 types. This makes the
code more readable and easy to reason about.

A future improvement (which was prototyped here and removed) is to split
the current string literal pattern in LALRPOP definition into two parts:
1. A single string literal or a f-string: This means no checking for
bytes / non-bytes and other unnecessary compution
2. Two or more of string/byte/f-string: This will call the
`concatenate_strings` function.

The reason for removing the second change is because of how ranges work.
The range for an individual string/byte is the entire range which
includes the quotes as well but if the same string/byte is part of a
f-string, then it only includes the range for the content (without the
quotes / inner range). The current string parser returns with the former
range.

To give an example, for `"foo"`, the range of the string would be
`0..5`, but for `f"foo"` the range of the string would be `2..5` while
the range for the f-string expression would be `0..6`. The ranges are
correct but they differ in the context the string constant itself is
being used for. Is it part of a f-string or is it a standalone string?

## Test Plan

`cargo test`
dhruvmanila added a commit that referenced this pull request Sep 22, 2023
This PR adds a new enum type `StringType` which is either a string
literal, byte literal or f-string. The motivation behind this is to have
a narrow type which is accepted in `concatenate_strings` as that
function is only applicable for the mentioned 3 types. This makes the
code more readable and easy to reason about.

A future improvement (which was prototyped here and removed) is to split
the current string literal pattern in LALRPOP definition into two parts:
1. A single string literal or a f-string: This means no checking for
bytes / non-bytes and other unnecessary compution
2. Two or more of string/byte/f-string: This will call the
`concatenate_strings` function.

The reason for removing the second change is because of how ranges work.
The range for an individual string/byte is the entire range which
includes the quotes as well but if the same string/byte is part of a
f-string, then it only includes the range for the content (without the
quotes / inner range). The current string parser returns with the former
range.

To give an example, for `"foo"`, the range of the string would be
`0..5`, but for `f"foo"` the range of the string would be `2..5` while
the range for the f-string expression would be `0..6`. The ranges are
correct but they differ in the context the string constant itself is
being used for. Is it part of a f-string or is it a standalone string?

`cargo test`
dhruvmanila added a commit that referenced this pull request Sep 22, 2023
This PR adds a new enum type `StringType` which is either a string
literal, byte literal or f-string. The motivation behind this is to have
a narrow type which is accepted in `concatenate_strings` as that
function is only applicable for the mentioned 3 types. This makes the
code more readable and easy to reason about.

A future improvement (which was prototyped here and removed) is to split
the current string literal pattern in LALRPOP definition into two parts:
1. A single string literal or a f-string: This means no checking for
bytes / non-bytes and other unnecessary compution
2. Two or more of string/byte/f-string: This will call the
`concatenate_strings` function.

The reason for removing the second change is because of how ranges work.
The range for an individual string/byte is the entire range which
includes the quotes as well but if the same string/byte is part of a
f-string, then it only includes the range for the content (without the
quotes / inner range). The current string parser returns with the former
range.

To give an example, for `"foo"`, the range of the string would be
`0..5`, but for `f"foo"` the range of the string would be `2..5` while
the range for the f-string expression would be `0..6`. The ranges are
correct but they differ in the context the string constant itself is
being used for. Is it part of a f-string or is it a standalone string?

`cargo test`
dhruvmanila added a commit that referenced this pull request Sep 22, 2023
This PR adds a new enum type `StringType` which is either a string
literal, byte literal or f-string. The motivation behind this is to have
a narrow type which is accepted in `concatenate_strings` as that
function is only applicable for the mentioned 3 types. This makes the
code more readable and easy to reason about.

A future improvement (which was prototyped here and removed) is to split
the current string literal pattern in LALRPOP definition into two parts:
1. A single string literal or a f-string: This means no checking for
bytes / non-bytes and other unnecessary compution
2. Two or more of string/byte/f-string: This will call the
`concatenate_strings` function.

The reason for removing the second change is because of how ranges work.
The range for an individual string/byte is the entire range which
includes the quotes as well but if the same string/byte is part of a
f-string, then it only includes the range for the content (without the
quotes / inner range). The current string parser returns with the former
range.

To give an example, for `"foo"`, the range of the string would be
`0..5`, but for `f"foo"` the range of the string would be `2..5` while
the range for the f-string expression would be `0..6`. The ranges are
correct but they differ in the context the string constant itself is
being used for. Is it part of a f-string or is it a standalone string?

`cargo test`
dhruvmanila added a commit that referenced this pull request Sep 26, 2023
This PR adds a new enum type `StringType` which is either a string
literal, byte literal or f-string. The motivation behind this is to have
a narrow type which is accepted in `concatenate_strings` as that
function is only applicable for the mentioned 3 types. This makes the
code more readable and easy to reason about.

A future improvement (which was prototyped here and removed) is to split
the current string literal pattern in LALRPOP definition into two parts:
1. A single string literal or a f-string: This means no checking for
bytes / non-bytes and other unnecessary compution
2. Two or more of string/byte/f-string: This will call the
`concatenate_strings` function.

The reason for removing the second change is because of how ranges work.
The range for an individual string/byte is the entire range which
includes the quotes as well but if the same string/byte is part of a
f-string, then it only includes the range for the content (without the
quotes / inner range). The current string parser returns with the former
range.

To give an example, for `"foo"`, the range of the string would be
`0..5`, but for `f"foo"` the range of the string would be `2..5` while
the range for the f-string expression would be `0..6`. The ranges are
correct but they differ in the context the string constant itself is
being used for. Is it part of a f-string or is it a standalone string?

`cargo test`
dhruvmanila added a commit that referenced this pull request Sep 27, 2023
This PR adds a new enum type `StringType` which is either a string
literal, byte literal or f-string. The motivation behind this is to have
a narrow type which is accepted in `concatenate_strings` as that
function is only applicable for the mentioned 3 types. This makes the
code more readable and easy to reason about.

A future improvement (which was prototyped here and removed) is to split
the current string literal pattern in LALRPOP definition into two parts:
1. A single string literal or a f-string: This means no checking for
bytes / non-bytes and other unnecessary compution
2. Two or more of string/byte/f-string: This will call the
`concatenate_strings` function.

The reason for removing the second change is because of how ranges work.
The range for an individual string/byte is the entire range which
includes the quotes as well but if the same string/byte is part of a
f-string, then it only includes the range for the content (without the
quotes / inner range). The current string parser returns with the former
range.

To give an example, for `"foo"`, the range of the string would be
`0..5`, but for `f"foo"` the range of the string would be `2..5` while
the range for the f-string expression would be `0..6`. The ranges are
correct but they differ in the context the string constant itself is
being used for. Is it part of a f-string or is it a standalone string?

`cargo test`
dhruvmanila added a commit that referenced this pull request Sep 28, 2023
This PR adds a new enum type `StringType` which is either a string
literal, byte literal or f-string. The motivation behind this is to have
a narrow type which is accepted in `concatenate_strings` as that
function is only applicable for the mentioned 3 types. This makes the
code more readable and easy to reason about.

A future improvement (which was prototyped here and removed) is to split
the current string literal pattern in LALRPOP definition into two parts:
1. A single string literal or a f-string: This means no checking for
bytes / non-bytes and other unnecessary compution
2. Two or more of string/byte/f-string: This will call the
`concatenate_strings` function.

The reason for removing the second change is because of how ranges work.
The range for an individual string/byte is the entire range which
includes the quotes as well but if the same string/byte is part of a
f-string, then it only includes the range for the content (without the
quotes / inner range). The current string parser returns with the former
range.

To give an example, for `"foo"`, the range of the string would be
`0..5`, but for `f"foo"` the range of the string would be `2..5` while
the range for the f-string expression would be `0..6`. The ranges are
correct but they differ in the context the string constant itself is
being used for. Is it part of a f-string or is it a standalone string?

`cargo test`
dhruvmanila added a commit that referenced this pull request Sep 29, 2023
This PR adds a new enum type `StringType` which is either a string
literal, byte literal or f-string. The motivation behind this is to have
a narrow type which is accepted in `concatenate_strings` as that
function is only applicable for the mentioned 3 types. This makes the
code more readable and easy to reason about.

A future improvement (which was prototyped here and removed) is to split
the current string literal pattern in LALRPOP definition into two parts:
1. A single string literal or a f-string: This means no checking for
bytes / non-bytes and other unnecessary compution
2. Two or more of string/byte/f-string: This will call the
`concatenate_strings` function.

The reason for removing the second change is because of how ranges work.
The range for an individual string/byte is the entire range which
includes the quotes as well but if the same string/byte is part of a
f-string, then it only includes the range for the content (without the
quotes / inner range). The current string parser returns with the former
range.

To give an example, for `"foo"`, the range of the string would be
`0..5`, but for `f"foo"` the range of the string would be `2..5` while
the range for the f-string expression would be `0..6`. The ranges are
correct but they differ in the context the string constant itself is
being used for. Is it part of a f-string or is it a standalone string?

`cargo test`
dhruvmanila added a commit that referenced this pull request Sep 29, 2023
This PR adds a new enum type `StringType` which is either a string
literal, byte literal or f-string. The motivation behind this is to have
a narrow type which is accepted in `concatenate_strings` as that
function is only applicable for the mentioned 3 types. This makes the
code more readable and easy to reason about.

A future improvement (which was prototyped here and removed) is to split
the current string literal pattern in LALRPOP definition into two parts:
1. A single string literal or a f-string: This means no checking for
bytes / non-bytes and other unnecessary compution
2. Two or more of string/byte/f-string: This will call the
`concatenate_strings` function.

The reason for removing the second change is because of how ranges work.
The range for an individual string/byte is the entire range which
includes the quotes as well but if the same string/byte is part of a
f-string, then it only includes the range for the content (without the
quotes / inner range). The current string parser returns with the former
range.

To give an example, for `"foo"`, the range of the string would be
`0..5`, but for `f"foo"` the range of the string would be `2..5` while
the range for the f-string expression would be `0..6`. The ranges are
correct but they differ in the context the string constant itself is
being used for. Is it part of a f-string or is it a standalone string?

`cargo test`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

internal An internal refactor or improvement parser Related to the parser python312 Related to Python 3.12

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for PEP 701 in the parser

2 participants