Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add method String::retain #43500

Merged
merged 1 commit into from
Aug 15, 2017
Merged

Add method String::retain #43500

merged 1 commit into from
Aug 15, 2017

Conversation

murarth
Copy link
Contributor

@murarth murarth commented Jul 27, 2017

Behaves like Vec::retain, accepting a predicate FnMut(char) -> bool
and reducing the string to only characters for which the predicate
returns true.

@rust-highfive
Copy link
Collaborator

r? @BurntSushi

(rust_highfive has picked a reviewer for you, use r? to override)

@murarth
Copy link
Contributor Author

murarth commented Jul 27, 2017

The motivation for this method is similar to that of Vec::retain. It transforms in-place for efficiency. It would be useful when one wants to strip a set of illegal characters from an input string.

#[inline]
#[unstable(feature = "string_retain", issue = "0")]
pub fn retain<F>(&mut self, mut f: F)
where F: FnMut(char) -> bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In HashMap::retain, we used F: FnMut(&K, &mut V) -> bool, in case we need to mutate the value in iteration. @alexcrichton also thinks it is better if we can improve Vec::retain to be F: FnMut(&mut T) -> bool, though it is too late to do this. So I guess it could be better if we use F: FnMut(&mut char)->bool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Vec and HashMap, it's trivial to pass a &mut instead of &. For String, however, the code would have to be changed to write the decoded char back into the buffer. Because the new char could be larger than the original, it would significantly increase the complexity and perhaps require the allocation of a second buffer to manage it. I don't think that's the right direction to go in, in this case.

Copy link
Contributor

@F001 F001 Jul 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed that we can't just simply modify chars in place. It is not worth to do this. Please ignore my comments.

@alexcrichton alexcrichton added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 27, 2017
@aidanhs
Copy link
Member

aidanhs commented Aug 3, 2017

I've pinged @BurntSushi on irc for a review - we'll check in now and again to make sure this gets reviewed soon @murarth!

@BurntSushi
Copy link
Member

Sorry I missed this! This seems like a reasonable API to have to me. With that said, it does seem like &mut char would be a nice thing to have if possible. It seems like it should be possible to implement, but would certainly be more complex than the implementation in this PR. Would it also entail a performance trade off too?

cc @rust-lang/libs

@murarth
Copy link
Contributor Author

murarth commented Aug 4, 2017

@BurntSushi: I wrote a &mut char implementation and a few benchmarks. Here's the comparison doing no mutation on the mutate-incapable versus mutate-capable implementations:

test bench_retain_alpha_simple              ... bench:         149 ns/iter (+/- 0)
test bench_retain_alpha_mutate_check        ... bench:         238 ns/iter (+/- 18)

Here's the full implementation: https://gist.github.com/murarth/ac22d58c75602ceb1cc5f79e87474c37

Edit: Updated benchmark results. Ha ha, forgot to compile the benchmarks with -O. I'm used to cargo bench doing that for me.

@arielb1
Copy link
Contributor

arielb1 commented Aug 8, 2017

Hi @BurntSushi - are you reviewing this? Just a friendly ping to keep this on your radar.

@carols10cents
Copy link
Member

ping for review @BurntSushi @rust-lang/libs !

@alexcrichton
Copy link
Member

Ah sorry for the delay @murarth! I think that not having &mut char here makes sense to me at least because there's nothing to get a reference to, and mutation seems best done through other APIs perhaps. @murarth want to file a tracking issue for this? This is still unstable so we can debate &mut char as well on the tracking issue, and otherwise this seems ok to land to me.

Behaves like `Vec::retain`, accepting a predicate `FnMut(char) -> bool`
and reducing the string to only characters for which the predicate
returns `true`.
@murarth
Copy link
Contributor Author

murarth commented Aug 15, 2017

@alexcrichton: Created tracking issue #43874 and updated the code.


while idx < len {
let ch = unsafe {
self.slice_unchecked(idx, len).chars().next().unwrap()
Copy link

@leonardo-m leonardo-m Aug 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a self.slice_unchecked(idx, len).chars().next().unwrap() followed by a ch.len_utf8() generating a small enough amount of code for the optimizer to optimize away? Or is it better to use/add something leaner that gets compiled faster?

@alexcrichton
Copy link
Member

@bors: r+

Thanks @murarth! I figure we can probably update perf and such as this is unstable, so no need to block longer!

@bors
Copy link
Contributor

bors commented Aug 15, 2017

📌 Commit 618ac89 has been approved by alexcrichton

@bors
Copy link
Contributor

bors commented Aug 15, 2017

⌛ Testing commit 618ac89 with merge 82be83c...

bors added a commit that referenced this pull request Aug 15, 2017
Add method `String::retain`

Behaves like `Vec::retain`, accepting a predicate `FnMut(char) -> bool`
and reducing the string to only characters for which the predicate
returns `true`.
@bors
Copy link
Contributor

bors commented Aug 15, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing 82be83c to master...

@bors bors merged commit 618ac89 into rust-lang:master Aug 15, 2017
@murarth murarth deleted the string-retain branch August 15, 2017 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants