Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add .as_str() to str::Chars and str::CharIndices #27900

Merged
merged 1 commit into from
Aug 29, 2015

Conversation

SimonSapin
Copy link
Contributor

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @alexcrichton (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. The way Github handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@SimonSapin
Copy link
Contributor Author

#27775 proposes removing slice::Iter::as_slice. I argued there that while this functionality is rarely useful, it sometimes is but is impossible to reasonably reproduce outside of std (where we don’t have access to private slice::Iter fields).

I think that not only this functionality should be stabilized as-is, but should also be extended to str iterators.

@@ -292,6 +292,13 @@ impl<'a> DoubleEndedIterator for Chars<'a> {
}
}

impl<'a> Chars<'a> {
#[unstable(feature = "iter_to_slice", issue = "27775")]
pub fn as_str(&self) -> &str {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be -> &'a str, so that you can continue modifying the iterator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Fixed.

@SimonSapin SimonSapin force-pushed the Chars_as_str branch 2 times, most recently from 2a8b926 to bc57162 Compare August 19, 2015 16:04
@alexcrichton
Copy link
Member

I would personally rather hold off on these kinds of conversions until we have our story a bit more thought out. Right now it's pretty piecemeal what iterators can be converted back to the original item, and it would be nice to have consistency across the board instead of adding one-off unstable iterators here and there.

I feel that in the case of #27775 this kind of functionality is nice, but so rarely used that it's not worth it right now. Almost no iterators have actual methods on the iterators (they're all just straight implementors of Iterator), so this is an idiom we don't have much experience with moving forward.

@bluss
Copy link
Member

bluss commented Aug 20, 2015

The slice iterator is kind of more than fundamental, it's one of the most important types. Sure, we don't have a great story figured out in particular for random access. I'd love to see these methods on both slice and str iterators (str will not have random access of course, but maybe iteration mixed with other searching methods).

@alexcrichton alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Aug 21, 2015
@alexcrichton
Copy link
Member

We discussed this at the libs meeting today and the conclusion was to merge this and push forward with these sorts of conversions. @SimonSapin could you make sure that the other string iterators in std::str have this sort of conversion as well (where it makes sense)?

@SimonSapin
Copy link
Contributor Author

Ok, I think that’s all of them. However to make it happen I’ve had to:

  • Make the iter field of std::iter::Filter and std::iter::Map public. An accessor method (possibly &I instead of &mut I) would also work. If we want to go in that direction, we may want to do the same across all iterator adaptors.
  • Add a remaining_haystack(&self) -> &'a str method to std::str::pattern::Searcher<'a>, with the same semantics as Chars::as_str.

What do you all think?

StrSearcherImpl::Empty(EmptyNeedle { position, end, .. }) |
StrSearcherImpl::TwoWay(TwoWaySearcher { position, end, .. }) => {
unsafe {
self.haystack.slice_unchecked(position, end)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't work for the StrSearcher: StrSearcher is not a double ended searcher*, just a Searcher and RevereseSearcher: the searching ends are independent. This means that position and end may be at arbitrary positions inside haystack i.e not position <= end.

The remaining haystack is both haystack[position..] and haystack[..end].

Oh and the position and end may end up on non-character boundaries depending on how the algorithm is used (next() vs .next_match() etc), even if Match and Reject are only emitted with character boundary offsets.

(*) It's not double ended because it would be a very expensive algorithm. Right now, "aa" in "aaa" returns a single match (0, 2) from the front and a single match (1, 3) from the back.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can work if it has separate methods for the front and back case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I would like to not do this, it's another restriction on the string search algorithm (that it can seek to an appropriate place to cut the remaining part, and that it hasn't jumped far ahead for some reason).

@SimonSapin
Copy link
Contributor Author

One option is to add the method to Chars, CharsIndices, and Bytes, but not Lines, LinesAny, Split, RSplit, SplitTerminator, RSplitTerminator, SplitN, RSplitN, MatchIndices, RMatchIndices, Matches, RMatches, or SplitWhitespace.

/// iterator can continue to be used while this exists.
#[unstable(feature = "iter_to_slice", issue = "27775")]
#[inline]
pub fn as_str(&self) -> &'a str {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this safely be done? We don't know that iteration isn't in the middle of a multi byte character, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, that’s a good point. I’ll make this one return &'a [u8].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

… or remove it entirely because it accesses the inner iterator in Map.

@alexcrichton
Copy link
Member

I think for now it's best to start off with the "easy" iterators like chars/charindices, we can hold off on the inherently more complicated pattern-like iterators for now I think.

@SimonSapin SimonSapin changed the title Add Chars::as_str. Add .as_str() to str::Chars and str::CharIndices Aug 28, 2015
@SimonSapin
Copy link
Contributor Author

Ok, we’re back to just str::Chars and str::CharIndices.

@alexcrichton
Copy link
Member

@bors: r+ e33650c

@bors
Copy link
Contributor

bors commented Aug 28, 2015

⌛ Testing commit e33650c with merge 3c100de...

@bors bors merged commit e33650c into rust-lang:master Aug 29, 2015
@SimonSapin SimonSapin deleted the Chars_as_str branch August 29, 2015 06:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants