-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PCRE compilation error for patterns ending with an escaped backslash #28175
Comments
This also seems to be a problem with substitution strings, e.g. in 0.6:
while in 0.7:
but:
In general, I'd say that any |
I got a similar problem while testing a package on a windows VM as well, the error message was
The offending string was something like |
Was this being passed to |
a Regex(...). Here's a contrived example which would lead to problems on windows but not on other systems, I'm definitely not saying it's the right way to do stuff. The aim of the function is to merge two folders. function mergefolders(src, dst)
for (root, _, files) ∈ walkdir(src)
for file ∈ files
newpath = replace(root, Regex("^$(escape_string(src))")=>"$dst")
isdir(newpath) || mkpath(newpath)
newpathfile = joinpath(newpath, file)
cp(joinpath(root, file), newpathfile; force=true)
end
end
end with the I wouldn't say it's a bug in this case but it definitely surprised me and the error message made little sense to me until I found this error and realised that there may be |
I think @tlienart's problem really has very little to do with this issue beyond also involving regex compilation errors and escaping. I would say that any time you're interpolating a string into a regex with the intent that it should match exactly and not be interpreted as a regex itself, and you're not absolutely sure that that string contains no special regex characters like |
That's fair enough (and I mentioned that I did not believe what I encountered was a bug), the comment was more pointing out that the error message was (to me) rather cryptic. |
All you have to do to fix this is to replace PCRE compiles regexes this way, so it's not an issue with Julia itself. |
@zdroid just because there is a workaround for the issue doesn't mean there's no issue (you'll note I included a different workaround in the original description of this issue). And it's not an issue with PCRE. Perl itself doesn't complain about the equivalent Perl code #!/usr/bin/perl
("\\" =~ qr"\\") && print "match\n"; The equivalent Julia program doesn't compile: occursin(r"\\", "\\") && print("match\n") |
I didn't say there is no issue, just that there is a simple fix. About PCRE, well, look at the source code yourself: https://github.com/JuliaLang/julia/blob/master/base/pcre.jl#L124. It's the line that produces the error. Either the file was substantially changed for Julia 0.7 (which https://github.com/JuliaLang/julia/commits/master/base/pcre.jl doesn't agree with, as far as I can tell), or PCRE was updated to a newer version. |
I don't think what changed in Julia 0.7 to cause this was anything to do with the actual call into the PCRE library that you point out. Rather it was a change in how custom string literals (which include regexes, substitutions, and others) are processed in general. The issue that change was supposed to fix is #22926, the change is f356869, and another (still open) issue that change seems to have caused besides this one is #28261. It might be that a fix for the present issue involves changing the definition of |
Another example: julia> Regex("\\\\")
r"\\"
julia> r"\\"
ERROR: LoadError: PCRE compilation error: \ at end of pattern at offset 1
Stacktrace:
[1] error(::String) at ./error.jl:33
[2] compile(::String, ::UInt32) at ./pcre.jl:128
[3] compile(::Regex) at ./regex.jl:72
[4] Regex(::String, ::UInt32, ::UInt32) at ./regex.jl:37
[5] Regex(::String) at ./regex.jl:60
[6] @r_str(::LineNumberNode, ::Module, ::Any) at ./regex.jl:109
in expression starting at REPL[5]:1 Julia 1.5.3 |
The various parsing and printing issues mentioned here have been fixed |
Have they, though? I just got around to testing this out, but the original example still didn't work how I expected it to in the latest release (1.6.0). So I got the nightly build and tested it too, and got the same results: julia> r"\\"
ERROR: LoadError: PCRE compilation error: \ at end of pattern at offset 1
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:33
[2] compile(pattern::String, options::UInt32)
@ Base.PCRE ./pcre.jl:155
[3] compile(regex::Regex)
@ Base ./regex.jl:82
[4] Regex(pattern::String, compile_options::UInt32, match_options::UInt32)
@ Base ./regex.jl:47
[5] Regex(pattern::String)
@ Base ./regex.jl:70
[6] var"@r_str"(__source__::LineNumberNode, __module__::Module, pattern::Any, flags::Vararg{Any})
@ Base ./regex.jl:119
in expression starting at REPL[1]:1
julia> versioninfo()
Julia Version 1.7.0-DEV.847
Commit fedefe913a* (2021-04-06 03:03 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i5 CPU 760 @ 2.80GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, nehalem) and some of the later examples: julia> replace("foo", r"(o+)" => s"/\1\\")
ERROR: Bad replacement string: /\1\
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:33
[2] replace_err(repl::String)
@ Base ./regex.jl:527
[3] _replace(io::IOBuffer, repl_s::SubstitutionString{String}, str::String, r::UnitRange{Int64}, re::Base.RegexAndMatchData)
@ Base ./regex.jl:552
[4] replace(str::String, pat_repl::Pair{Regex, SubstitutionString{String}}; count::Int64)
@ Base ./strings/util.jl:542
[5] replace(str::String, pat_repl::Pair{Regex, SubstitutionString{String}})
@ Base ./strings/util.jl:525
[6] top-level scope
@ REPL[3]:1
julia> occursin(r"\\", "\\") && print("match\n")
ERROR: LoadError: PCRE compilation error: \ at end of pattern at offset 1
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:33
[2] compile(pattern::String, options::UInt32)
@ Base.PCRE ./pcre.jl:155
[3] compile(regex::Regex)
@ Base ./regex.jl:82
[4] Regex(pattern::String, compile_options::UInt32, match_options::UInt32)
@ Base ./regex.jl:47
[5] Regex(pattern::String)
@ Base ./regex.jl:70
[6] var"@r_str"(__source__::LineNumberNode, __module__::Module, pattern::Any, flags::Vararg{Any})
@ Base ./regex.jl:119
in expression starting at REPL[4]:1
julia> Regex("\\\\")
r"\\\\" This last one seems to be the only one that really changed, and I'm not sure that's even the right change. It does at least mean that the printed representation of a Regex can be parsed back as the same Regex. But I would prefer to see Perl: ("\\" =~ qr"\\") && print "match\n";
# or more idiomatically:
("\\" =~ /\\/) && print "match\n"; Ruby: ("\\" =~ %r"\\") and puts "match"
("\\" =~ /\\/) and puts "match" JavaScript: /\\/.test("\\") && console.log("match") All of the above lines print "match", and don't cause compilation errors. Also, the equivalents of Ruby: irb(main):001:0> Regexp.new("\\\\")
=> /\\/ JavaScript: > new RegExp("\\\\")
/\\/ |
That is not a parser issue, though it is a possible breaking change. |
Maybe I should have said "evaluate to" instead of "parse as" in my last comment. But in the original description of this issue I didn't say it was only an issue with parsing and printing. So why did you close the issue when you considered only the parsing and printing issues to be fixed? |
If a Regex literal ends with a (singly) escaped backslash, you'll get this error:
It's possible to get around this by doubly escaping the backslash:
But that's surprising to me for two reasons. One is that it's only for backslashes at the end of the pattern:
The other is that even at the end of the pattern this double escaping was unnecessary in 0.6:
The text was updated successfully, but these errors were encountered: