Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support regex backreferences in replace() function #1820

Closed
GlenHertz opened this issue Dec 22, 2012 · 7 comments
Closed

Support regex backreferences in replace() function #1820

GlenHertz opened this issue Dec 22, 2012 · 7 comments

Comments

@GlenHertz
Copy link
Contributor

Hi,

It is very useful to be able to replace a string using matches from a regex capture group. For example:

julia> replace("a=5", r"(\w+)=(\d+)", L"\1=35")
a=35

Perlre also allow backreferences in the pattern but that isn't as commonly used as backreference in the replacement string. Search for backreference here for more info:

http://perldoc.perl.org/perlre.html

Glen

@quinnj
Copy link
Member

quinnj commented Jun 25, 2014

First of all, sorry for such a delayed response (!!).

With regards to backreferences, I agree it would probably be nice to support, but I'm not sure on the interface (perl uses the syntax s/regex/replacement/modifiers, so we could possibly support replace(s::String,re::Regex) and somehow check that the regex has both match and replacement parts. Otherwise, we might need a re"..." regex string that would be used for search and replace perl style. I'm not super familiar with the regex internals, so I'm not even sure how the interface to PCRE would (or could) be tweaked to handle this. @dcjones, I know you recently did some work on regex performance, do you have any opinion/input on this?

@kmsquire
Copy link
Member

@quinnj, I created a PR for this a while back (on my phone, so it's hard to find right now). The interface I proposed wasn't liked, but it might provide some inspiration for how this could be done.

@quinnj
Copy link
Member

quinnj commented Jun 25, 2014

@kmsquire, you've made a lot of pull requests! I think #3146 is the one you're referring to. I really like that interface, but it sounds like we should have #1289 first.

@dhoegh
Copy link
Contributor

dhoegh commented Feb 28, 2015

This feature would be nice to have. I actually already thought Julia supported this but discovered Julia did not. The solution would probably be to change from PCRE to PCRE2 which seems to support this feature natively with pcre2_substitute function, see: http://pcre.org/current/doc/html/. Are there any reason to not use PCRE2 except it requires a rewrite of the reqex part in Julia?

@kmsquire
Copy link
Member

@StefanKarpinski threatened to rewrite regular expressions in pure Julia at some point, but I'm pretty sure he's been busy with other things. ;-)

Are there any reason to not use PCRE2 except it requires a rewrite of the reqex part in Julia?

I suspect it's because no one in the know has found/made the time to do so. If you're up to it, why don't you open a separate, feeler issue, to see if there is interest, and if so (and if you have time), submit such a PR?

@kmsquire
Copy link
Member

I haven't had a strong need for regular expressions recently, but while you're at it, the interface itself could also use some love. It's one of the oldest interfaces, and it's type unstable--returning nothing when nothing is found, and RegexMatch when something is found--so you have to check the return type against nothing. Compared to the rest of Julia, it feels a little awkward.

@dhoegh
Copy link
Contributor

dhoegh commented Jun 6, 2015

Since Julia recently have bumped to use PCRE2 I have wrapped the substitute function from PCRE2. I thought I would post it here. I will maybe turn it into a pull request if I have time and can figure out a consistent interface that match together with the existing replace function.

function my_replace(s::AbstractString, pat::Regex, rep_in; limit=0)
    options = pat.match_options
    (limit == 0) && (options |= Base.PCRE.SUBSTITUTE_GLOBAL) # Add substitute all
    (limit==0 || limit==1) || error("Regex can only substitute all or first occurrence")
    offset=0
    subject = bytestring(s)
    rep = bytestring(rep_in)
    buffer = Array(UInt8, sizeof(subject)*2) # buffer size
    re = pat.regex
    length = Ref{Csize_t}(sizeof(buffer))
    rc = ccall((:pcre2_substitute_8, Base.PCRE.PCRE_LIB), Cint,
               (Ptr{Void}, Ptr{UInt8}, Csize_t, Csize_t, Cuint, Ptr{Void}, 
                    Ptr{Void}, Ptr{UInt8}, Csize_t, Ptr{UInt8}, Ref{Csize_t}),
                re, subject, sizeof(subject), offset, options, pat.match_data,
                    Base.PCRE.MATCH_CONTEXT, rep, sizeof(rep), buffer, length)
    println(rc) 
    if rc == -48
        error("buffer is to small")
    elseif rc == -35 || rc == -49
        error("substitution pattern is wrong")
    end

    println((length[])) #print how much of the buffer is used
    bytestring(pointer(buffer))
end
my_replace("quote 10 digits", r"(\d+)", "\"\${1}\"")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants