Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-compile regexps #338

Merged
merged 1 commit into from
May 2, 2024
Merged

Pre-compile regexps #338

merged 1 commit into from
May 2, 2024

Conversation

olejorgenb
Copy link
Contributor

@olejorgenb olejorgenb commented Apr 25, 2024

Note: This is untested and I don't have time/energy to do any more on this, so if no one wants to wrap it up, feel free to close it.

Noticed this while profiling ics-py which felt really slow (even considering large ics files and it being python).

Would speed up https://github.com/ics-py/ics-py 3% ~7% so not that much, but maybe worth a few line changes.

image (the 4 first call-sites are related to the comment regexs)

(github refuse to upload the py-spy profile..)

@apalala apalala merged commit 62e2597 into neogeny:master May 2, 2024
0 of 2 checks passed
vfazio added a commit to vfazio/TatSu that referenced this pull request Dec 29, 2024
Previously, when scanning for matches to a regex, if the type of the
pattern was `str`, the pattern was always compiled with `re.MULTILINE`.

Recent changes to `ParserConfig` [0] changed the type used for regex
matches in generated code from `str` to `re.Pattern` which could lead to
a difference in behavior from previous versions where a defined comments
or eol_comments may have been implicitly relying on the `re.MULTILINE`
flag.

After discussion [1], it has been determined that usage of `re` flags
within TatSu should be deprecated in favor of users specifying the
necessary flags within patterns.

As such, drop the `re.MULTILINE` flag for strings compiled on the fly.

[0]: neogeny#338
[1]: neogeny#351 (comment)
vfazio added a commit to vfazio/TatSu that referenced this pull request Dec 29, 2024
Make the default eol_comments regex use multiline matching.

Recent changes to `ParserConfig` [0] now use a precompiled regex (an
`re.Pattern`) instead of compiling the `str` regex on the fly.

The `Tokenizer` previously assumed `str` type regexes should all be
`re.MULTILINE` regardless of options defined in the regex itself when
compiling the pattern. This behavior has since changed to no longer
automatically apply and thus requires configurations to specify the
option in the pattern.

[0]: neogeny#338
vfazio added a commit to vfazio/TatSu that referenced this pull request Dec 29, 2024
Make the default eol_comments regex use multiline matching.

Recent changes to `ParserConfig` [0] now use a precompiled regex (an
`re.Pattern`) instead of compiling the `str` regex on the fly.

The `Tokenizer` previously assumed `str` type regexes should all be
`re.MULTILINE` regardless of options defined in the regex itself when
compiling the pattern. This behavior has since changed to no longer
automatically apply and thus requires configurations to specify the
option in the pattern.

[0]: neogeny#338
apalala pushed a commit that referenced this pull request Dec 29, 2024
…fig` (#352)

* [buffering] drop forced multiline match for string patterns

Previously, when scanning for matches to a regex, if the type of the
pattern was `str`, the pattern was always compiled with `re.MULTILINE`.

Recent changes to `ParserConfig` [0] changed the type used for regex
matches in generated code from `str` to `re.Pattern` which could lead to
a difference in behavior from previous versions where a defined comments
or eol_comments may have been implicitly relying on the `re.MULTILINE`
flag.

After discussion [1], it has been determined that usage of `re` flags
within TatSu should be deprecated in favor of users specifying the
necessary flags within patterns.

As such, drop the `re.MULTILINE` flag for strings compiled on the fly.

[0]: #338
[1]: #351 (comment)

* [grammar] make eol_comments multiline match

Make the default eol_comments regex use multiline matching.

Recent changes to `ParserConfig` [0] now use a precompiled regex (an
`re.Pattern`) instead of compiling the `str` regex on the fly.

The `Tokenizer` previously assumed `str` type regexes should all be
`re.MULTILINE` regardless of options defined in the regex itself when
compiling the pattern. This behavior has since changed to no longer
automatically apply and thus requires configurations to specify the
option in the pattern.

[0]: #338

* [infos] make {eol_}comments_re read-only attributes

Previously, the `eol_comments_re` and `comments_re` attributes were
public init arguments, were modifiable, and could thus become out of
sync with the `eol_comments` and `comments` attributes.

Also, with recent changes to `ParserConfig` [0], there were two ways to
initialize the regex values for comments and eol_comments directives;
either via the constructor using the *_re variables or by using the
sister string arguments and relying on `__post_init__` to compile the
values which trumped the explicit *_re argument values.

Now, the constructor interface has been simplified to not take either
`eol_comments_re` or `comments_re` as arguments. Callers may only use
`eol_comments` and `comments`.

The `eol_comments_re` and `comments_re` attributes are still
public, but are read-only so they are always a reflection of their
sister string values passed into the constructor.

[0]: #200

* [codegen] migrate to {eol_}comments

* [ngcodegen] migrate to {eol_}comments

* [bootstrap] migrate to {eol_}comments

* [lint] resolve errors

* [docs] note {eol_}comments directive behavior changes

* [docs] update syntax to reflect {eol_}comments arguments

* [test] fix test_parse_hash to use eol_comments

* [test] explicitly use multiline match in test_patterns_with_newlines
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants