Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make UCP option the default for regex matching #27189

Merged
merged 2 commits into from
May 22, 2018
Merged

Conversation

Keno
Copy link
Member

@Keno Keno commented May 21, 2018

Fixes #27084. Regexes now match based on unicode character properties,
rather than just ASCII character properties, e.g. match(r"\w+", "café")
will now match the entire word (and not just caf). This behavior can
be disabled with the a flag to the regex string macro (e.g. r"\w+"a).

@Keno Keno added this to the 1.0 milestone May 21, 2018
@mbauman mbauman added breaking This change will break code needs news A NEWS entry is required for this change labels May 21, 2018
base/regex.jl Outdated
@@ -72,8 +76,12 @@ after the ending quote, to change its behaviour:
- `s` allows the `.` modifier to match newlines.
- `x` enables "comment mode": whitespace is enabled except when escaped with `\\`, and `#`
is treated as starting a comment.
- `a` disables `UCP` mode (enables ASCII mode). By default `\\B`, `\\b`, `\\D`, `\\d`, `\\S`,
`\\s`, `\\W`, `\\w`, etc match based on unicode character properties. With this option,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"etc." and "Unicode". "are recognized" -> "these sequences only match"

Keno added 2 commits May 21, 2018 15:34
\w matches numbers as well (which are not valid drive letters) and will match unicode
characters once we turn on UCP.
Fixes #27084. Regexes now match based on unicode character properties,
rather than just ASCII character properties, e.g. `match(r"\w+", "café")`
will now match the entire word (and not just `caf`). This behavior can
be disabled with the `a` flag to the regex string macro (e.g. `r"\w+"a`).
@Keno Keno removed the needs news A NEWS entry is required for this change label May 21, 2018
@JeffBezanson JeffBezanson merged commit 2f728b8 into master May 22, 2018
@JeffBezanson JeffBezanson deleted the kf/regexucp branch May 22, 2018 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking This change will break code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants