-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade to PCRE2 #11447
Upgrade to PCRE2 #11447
Conversation
@@ -12,13 +12,13 @@ else | |||
CPP_STDOUT = $(CPP) -E | |||
endif | |||
|
|||
all: pcre_h.jl errno_h.jl build_h.jl.phony fenv_constants.jl file_constants.jl uv_constants.jl version_git.jl.phony | |||
all: errno_h.jl build_h.jl.phony fenv_constants.jl file_constants.jl uv_constants.jl version_git.jl.phony |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why deleting the pcre_h.jl
dependency here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just temporary, since the pcre_h.jl autogenerator was failing on pcre2.h - it would run without error, but output a blank pcre_h.jl. I'm looking into it.
8424882
to
504f78b
Compare
This is ready @tkelman, with two caveats:
|
Since PCRE is relatively small and all C, we could just build it within this PR until it gets merged and into the nightlies. Try adding julia/contrib/windows/msys_build.sh Line 187 in e97588d
|
function __init__() | ||
JIT_STACK_START_SIZE = 32768 | ||
JIT_STACK_MAX_SIZE = 1048576 | ||
global JIT_STACK = ccall((:pcre_jit_stack_alloc, :libpcre), Ptr{Void}, | ||
(Cint, Cint), JIT_STACK_START_SIZE, JIT_STACK_MAX_SIZE) | ||
global JIT_STACK = ccall((:pcre2_jit_stack_create_8, "libpcre2-8"), Ptr{Void}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this libpcre2-8
likely to change in the future? If so, would be best to make a single const variable for it.
That's a bit annoying. Is there some optional configuration flag we could set when we build the library to get that back? |
I found a match flag that re-enables it, intuitively named
Done, although there is still the issue of |
The Travis failure seems to be caused by a random parallel-processing error not related to this PR. |
Try commenting out these 4 lines - https://github.com/malmaud/julia/blob/afa14048a5551fec9ee79f9472a16fb10e260b56/contrib/windows/msys_build.sh#L94-L97 If you're downloading pcre sources, then we'll have the headers. Using |
If we can get this passing on AppVeyor, then is there any concern that this could break any regex-using packages in subtle ways? If it should be more or less compatible from the Julia side and it's just about ready, @StefanKarpinski could you take a look and make a judgment on whether or not this should be 0.4 material? |
The API on the Julia side is totally unchanged, and at least officially the regex syntax and semantics of PCRE2 are not different than PCRE1. So the risk of breakage seems minimal. |
I'll give it a try. |
Looks like that did the trick on appveyor, and this should be good to go now. No clue what's wrong with Travis though, that failure is pretty bizarre.
Takes about 90 seconds or so to download and build the library on appveyor, so once this gets merged and into the nightlies I'll put the appveyor script back the way it was to cut down on build time. If you need to update the library down the line in some way that needs a from-scratch version of I'll let Stefan make the final call, but thumbs up from me. Thanks for this! |
Thanks for working through it with me! I'm looking forward to the improved regex functionality this should enable down the line. |
Oh yeah @nalimilan any concerns about getting pcre2 packaged on any older rhel/fedora distros? |
Bump @StefanKarpinski could you take a look at this? The Travis error is unrelated. |
👍 |
Just to be paranoid, I'm restarting that failed job (although I do believe that it's unrelated). After that, we can merge this. Thanks for putting in the elbow grease to make the upgrade happen, @malmaud! |
Ok, tests pass but I'm seeing this on startup:
Investigating... |
function exec(re,subject,offset,options,match_data) | ||
rc = ccall((:pcre2_match_8, PCRE_LIB), Cint, | ||
(Ptr{Void}, Cstring, Csize_t, Csize_t, Cuint, Ptr{Void}, Ptr{Void}), | ||
re, subject, sizeof(subject), offset, options, match_data, MATCH_CONTEXT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this is subtle but I think that using Cstring
here may be inappropriate since we're a) passing the length of the data as another argument, rather than relying on NUL-termination of the data and b) apparently sometimes passing data that contains NUL bytes? cc: @stevengj
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @StefanKarpinski , it should not use Cstring, if the interface is with the length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be revealing a more troubling problem, which is that junk strings seem to be getting passed in here – such as "\t:(Int^M:N)\0|>dump\n"
– note the embedded NUL byte. The real issue, of course, is what the heck is this data and why is it being passed into the exec
function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this is not as nefarious as I thought it was – subject
is just the string we're looking for matches in. But Cstring
is inappropriate here and possibly in a few other places in this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You guys are totally right. And when you're right you're right.
On Monday, June 1, 2015, Stefan Karpinski [email protected] wrote:
In base/pcre.jl
#11447 (comment):end
-function exec(regex::Ptr{Void}, extra::Ptr{Void}, str::ByteString, offset::Integer,
options::Integer, ovec::Vector{Int32})
- return exec(regex, extra, str, 0, offset, sizeof(str), options, ovec)
+function exec(re,subject,offset,options,match_data)- rc = ccall((:pcre2_match_8, PCRE_LIB), Cint,
(Ptr{Void}, Cstring, Csize_t, Csize_t, Cuint, Ptr{Void}, Ptr{Void}),
re, subject, sizeof(subject), offset, options, match_data, MATCH_CONTEXT)
Ok, this is not as nefarious as I thought it was – subject is just the
string we're looking for matches in. But Cstring is inappropriate here
and possibly in a few other places in this change.—
Reply to this email directly or view it on GitHub
https://github.com/JuliaLang/julia/pull/11447/files#r31456497.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm working on a patch that fixes this and a variety of other ccall signature problems. Some of which are probably my fault since they're inherited from the original PCRE code. We didn't have all the necessary Cfoo type aliases back then (or any of them, actually), so I just used the types that happened to be correct on my platform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing the ccall problems and merging this, Stefan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem. Thanks again for pushing it forward.
I haven't checked yet, but I guess it is parallel-installable with PCRE1, so shouldn't be an issue. |
The relevant change to Regex was made in JuliaLang/julia#11447
This will hopefully lay the groundwork for supporting named subpatterns (#11362) and general improvements in the regex functionality.