fixed Replacer trait to work more like BufRead::read_line #151

eminence · 2016-01-16T16:39:09Z

I'm trying to write something like the following:

struct MyReplacer;

impl Replacer for MyReplacer {
    fn reg_replace(&mut self, caps: &regex::Captures) -> Cow<str>  {
        let cap: &str = caps.at(1).unwrap();

        Cow::Borrowed(cap)

    }
}

fn main() {
    let re = regex::Regex::new("foo(.*)bar").unwrap();
    let my_data = "foocatsbar";
    let result = re.replace_all(my_data, MyReplacer);
}

But I'm running into the following lifetime error:

src/lib.rs:11:30: 11:35 error: cannot infer an appropriate lifetime for lifetime parameter `'t` due to conflicting requirements [E0495]
src/lib.rs:11         let cap: &str = caps.at(1).unwrap();
                                           ^~~~~

There are some suggestions about adding explicit lifetime parameters, but it seems like nothing I tried compiled.

What I'm trying to do is replace some parts of a String, but it's not possible to write a regex that will exactly represent what I want. So in my MyReplacer, I want to extract the capture group, and based on the value of the capture group, either build a new string to return as a Cow::Owned(String), or return the capture group as Cow::Borrowed(&str) as to avoid an allocation.

Do you have any hints about how to do this? I'm on rust nightly (1.7)

Thanks!

The text was updated successfully, but these errors were encountered:

BurntSushi · 2016-01-16T17:36:42Z

This is a tricky one. I think the fundamental problem you're running into is in the definition of the Replacer trait:

pub trait Replacer {
    fn reg_replace(&mut self, caps: &Captures) -> Cow<str>;

    fn no_expand(&mut self) -> Option<Cow<str>> { None }
}

If we apply lifetime elision rules, then I think it looks like this:

pub trait Replacer {
    fn reg_replace<'a, 'b, 'c>(&'a mut self, caps: &'b Captures<'c>) -> Cow<'a, str>;

    fn no_expand<'a>(&'a mut self) -> Option<Cow<'a, str>> { None }
}

This means that the returned string has a lifetime tied to self, but your code is trying to return something with a lifetime tied to the return value of caps.at, which translates to the lifetime 'c in this code. I claim that this indeed what I intended when I initially wrote the trait definition. Namely, 'a refers to the lifetime of the string that is used as the replacement, while 'c in this definition refers to the lifetime of the text that is being modified (i.e., the original search text). This configuration is important with respect to replacement expansion. For example, if your replacement string is "$name", then a new copy must be made for each match since $name is replaced with the named sub-capture match name (e.g., (?P<name>[0-9]+)). If a replacement string is wrapped in NoExpand, then it never needs to have expansion done, so a copy of it is unnecessary. Hence, the reason why Cow is returned with a lifetime tied to self.

This means that, given the Replacer definition, it is impossible to return a Cow with a lifetime tied to the search text. That's unfortunate, because it does seem like something you might occasionally want to do. I'm not sure whether that's easy to fix or not. I think we would have to at least abandon Cow for something with three variants and two parameterized lifetimes:

enum Replacement<'r, 't> {
    /// A borrow of the replacement string. Matches current semantics.
    ReplaceBorrowed(&'r str),
    /// A borrow of the input text string.
    TextBorrowed(&'t str),
    /// An owned string.
    Owned(String),
}

This would let you return one of three types of strings: one tied to the replacement string (available today), one tied to the original search text (unavailable today) or one that is owned (available today).

Of course, this doesn't help you today. Fortunately, the replacement functions are relatively straight-forward to implement. Here's your example expanded with the above definition, a better trait definition and tweaked replacement functions to make everything work:

extern crate regex;

use std::borrow::Cow;
use std::ops::Deref;

use regex::{Captures, Regex};

trait BetterReplacer {
    fn reg_replace<'r, 't>(
        &'r mut self,
        caps: &Captures<'t>,
    ) -> CowReplacement<'r, 't>;

    fn no_expand(&mut self) -> Option<Cow<str>> { None }
}

struct MyReplacer;

impl BetterReplacer for MyReplacer {
    fn reg_replace<'r, 't>(
        &'r mut self,
        caps: &regex::Captures<'t>,
    ) -> CowReplacement<'r, 't>  {
        CowReplacement::BorrowedSearchText(caps.at(1).unwrap())
    }
}

#[allow(dead_code)]
enum CowReplacement<'r, 't> {
    /// A borrow of the replacement string. Matches current semantics.
    BorrowedReplacement(&'r str),
    /// A borrow of the searched string.
    BorrowedSearchText(&'t str),
    /// An owned string.
    Owned(String),
}

impl<'r, 't> Deref for CowReplacement<'r, 't> {
    type Target = str;

    fn deref(&self) -> &Self::Target {
        use self::CowReplacement::*;
        match *self {
            BorrowedReplacement(s) => s,
            BorrowedSearchText(s) => s,
            Owned(ref s) => &**s,
        }
    }
}

fn replace_all<R: BetterReplacer>(re: &Regex, text: &str, rep: R) -> String {
    replacen(re, text, 0, rep)
}

fn replacen<R: BetterReplacer>(
    re: &Regex,
    text: &str,
    limit: usize,
    mut rep: R,
) -> String {
    let mut new = String::with_capacity(text.len());
    let mut last_match = 0;

    if rep.no_expand().is_some() {
        // borrow checker pains. `rep` is borrowed mutably in the `else`
        // branch below.
        let rep = rep.no_expand().unwrap();
        for (i, (s, e)) in re.find_iter(text).enumerate() {
            if limit > 0 && i >= limit {
                break
            }
            new.push_str(&text[last_match..s]);
            new.push_str(&rep);
            last_match = e;
        }
    } else {
        for (i, cap) in re.captures_iter(text).enumerate() {
            if limit > 0 && i >= limit {
                break
            }
            // unwrap on 0 is OK because captures only reports matches
            let (s, e) = cap.pos(0).unwrap();
            new.push_str(&text[last_match..s]);
            new.push_str(&rep.reg_replace(&cap));
            last_match = e;
        }
    }
    new.push_str(&text[last_match..]);
    new
}

fn main() {
    let re = Regex::new("foo(.*)bar").unwrap();
    let my_data = "foocatsbar";
    println!("{}", replace_all(&re, my_data, MyReplacer));
}

Thanks very much for this report. Hopefully we'll be able to get a fix into the library proper! (Although it will be a breaking change.)

eminence · 2016-01-16T18:41:08Z

Thanks for the clear and detailed reply! I suspected something like this was going on, but without rigorously applying the lifetime elision rules, I fell short of a complete understanding.

I look forward to seeing this use-case handled in the future!

BurntSushi · 2016-02-28T23:07:20Z

I came up with a much better solution to this problem that both avoids the lifetime issue and gives more explicit control over allocation. Instead of controlling allocation indirectly via borrows, we should just have the implementor write to the destination buffer explicitly. Here's the new trait definition:

pub trait Replacer {
    fn replace_append(&mut self, caps: &Captures, dst: &mut Vec<u8>);

    fn no_expansion<'r>(&'r mut self) -> Option<Cow<'r, [u8]>> {
        None
    }
}

In your case, all you'd need to do is write the contents of cap.at(1) directly to dst and then never worry about whether the borrowing is right or not.

This removes the need for an extra type and makes implementations of Replacer quite a bit simpler!

BurntSushi · 2016-02-28T23:13:21Z

(Sorry, that code should read &mut String. I'm working on the bytes API at the moment.)

BurntSushi · 2016-03-13T19:03:28Z

Note that this is fixed in the bytes::Regex API (using the replace_append variant of the Replacer trait).

This uses the new Replacer trait essentially as defined in the `bytes` sub-module and described in #151. Fixes #151

regex 0.2 0.2.0 ===== This is a new major release of the regex crate, and is an implementation of the [regex 1.0 RFC](https://github.com/rust-lang/rfcs/blob/master/text/1620-regex-1.0.md). We are releasing a `0.2` first, and if there are no major problems, we will release a `1.0` shortly. For `0.2`, the minimum *supported* Rust version is 1.12. There are a number of **breaking changes** in `0.2`. They are split into two types. The first type correspond to breaking changes in regular expression syntax. The second type correspond to breaking changes in the API. Breaking changes for regex syntax: * POSIX character classes now require double bracketing. Previously, the regex `[:upper:]` would parse as the `upper` POSIX character class. Now it parses as the character class containing the characters `:upper:`. The fix to this change is to use `[[:upper:]]` instead. Note that variants like `[[:upper:][:blank:]]` continue to work. * The character `[` must always be escaped inside a character class. * The characters `&`, `-` and `~` must be escaped if any one of them are repeated consecutively. For example, `[&]`, `[\&]`, `[\&\&]`, `[&-&]` are all equivalent while `[&&]` is illegal. (The motivation for this and the prior change is to provide a backwards compatible path for adding character class set notation.) * A `bytes::Regex` now has Unicode mode enabled by default (like the main `Regex` type). This means regexes compiled with `bytes::Regex::new` that don't have the Unicode flag set should add `(?-u)` to recover the original behavior. Breaking changes for the regex API: * `find` and `find_iter` now **return `Match` values instead of `(usize, usize)`.** `Match` values have `start` and `end` methods, which return the match offsets. `Match` values also have an `as_str` method, which returns the text of the match itself. * The `Captures` type now only provides a single iterator over all capturing matches, which should replace uses of `iter` and `iter_pos`. Uses of `iter_named` should use the `capture_names` method on `Regex`. * The `replace` methods now return `Cow` values. The `Cow::Borrowed` variant is returned when no replacements are made. * The `Replacer` trait has been completely overhauled. This should only impact clients that implement this trait explicitly. Standard uses of the `replace` methods should continue to work unchanged. * The `quote` free function has been renamed to `escape`. * The `Regex::with_size_limit` method has been removed. It is replaced by `RegexBuilder::size_limit`. * The `RegexBuilder` type has switched from owned `self` method receivers to `&mut self` method receivers. Most uses will continue to work unchanged, but some code may require naming an intermediate variable to hold the builder. * The free `is_match` function has been removed. It is replaced by compiling a `Regex` and calling its `is_match` method. * The `PartialEq` and `Eq` impls on `Regex` have been dropped. If you relied on these impls, the fix is to define a wrapper type around `Regex`, impl `Deref` on it and provide the necessary impls. * The `is_empty` method on `Captures` has been removed. This always returns `false`, so its use is superfluous. * The `Syntax` variant of the `Error` type now contains a string instead of a `regex_syntax::Error`. If you were examining syntax errors more closely, you'll need to explicitly use the `regex_syntax` crate to re-parse the regex. * The `InvalidSet` variant of the `Error` type has been removed since it is no longer used. * Most of the iterator types have been renamed to match conventions. If you were using these iterator types explicitly, please consult the documentation for its new name. For example, `RegexSplits` has been renamed to `Split`. A number of bugs have been fixed: * [BUG #151](#151): The `Replacer` trait has been changed to permit the caller to control allocation. * [BUG #165](#165): Remove the free `is_match` function. * [BUG #166](#166): Expose more knobs (available in `0.1`) and remove `with_size_limit`. * [BUG #168](#168): Iterators produced by `Captures` now have the correct lifetime parameters. * [BUG #175](#175): Fix a corner case in the parsing of POSIX character classes. * [BUG #178](#178): Drop the `PartialEq` and `Eq` impls on `Regex`. * [BUG #179](#179): Remove `is_empty` from `Captures` since it always returns false. * [BUG #276](#276): Position of named capture can now be retrieved from a `Captures`. * [BUG #296](#296): Remove winapi/kernel32-sys dependency on UNIX. * [BUG #307](#307): Fix error on emscripten.

BurntSushi added bug question labels Jan 16, 2016

BurntSushi modified the milestone: 1.0 Jan 31, 2016

BurntSushi mentioned this issue Feb 1, 2016

Storing mapping from names to group indices into Regex #158

Merged

BurntSushi mentioned this issue Mar 10, 2016

Add regex matching for &[u8]. #183

Merged

BurntSushi changed the title ~~Need help with Replacer and Captures lifetimes~~ fixed Replacer trait to work more like BufRead::read_line Mar 16, 2016

BurntSushi added a commit that referenced this issue May 7, 2016

Update Replacer trait for Unicode regexes.

30d6b93

This uses the new Replacer trait essentially as defined in the `bytes` sub-module and described in #151. Fixes #151

BurntSushi added a commit that referenced this issue May 18, 2016

Update Replacer trait for Unicode regexes.

1b3f519

This uses the new Replacer trait essentially as defined in the `bytes` sub-module and described in #151. Fixes #151

BurntSushi mentioned this issue May 27, 2016

Lifetime issue with Replacer #242

Closed

BurntSushi mentioned this issue Jul 13, 2016

Update Replacer to return a Result<_, T> #267

Closed

BurntSushi added a commit that referenced this issue Aug 5, 2016

Update Replacer trait for Unicode regexes.

8b18b29

This uses the new Replacer trait essentially as defined in the `bytes` sub-module and described in #151. Fixes #151

BurntSushi mentioned this issue Dec 31, 2016

regex 0.2 #310

Merged

bors closed this as completed in ebd26e9 Dec 31, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fixed Replacer trait to work more like BufRead::read_line #151

fixed Replacer trait to work more like BufRead::read_line #151

eminence commented Jan 16, 2016

BurntSushi commented Jan 16, 2016

eminence commented Jan 16, 2016

BurntSushi commented Feb 28, 2016

BurntSushi commented Feb 28, 2016

BurntSushi commented Mar 13, 2016

fixed Replacer trait to work more like BufRead::read_line #151

fixed Replacer trait to work more like BufRead::read_line #151

Comments

eminence commented Jan 16, 2016

BurntSushi commented Jan 16, 2016

eminence commented Jan 16, 2016

BurntSushi commented Feb 28, 2016

BurntSushi commented Feb 28, 2016

BurntSushi commented Mar 13, 2016