Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EscapeAscii no longer panics due to incorrect ExactSizeIterator #99913

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 8 additions & 3 deletions library/core/src/slice/ascii.rs
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,6 @@ impl [u8] {
/// # Examples
///
/// ```
///
/// let s = b"0\t\r\n'\"\\\x9d";
/// let escaped = s.escape_ascii().to_string();
/// assert_eq!(escaped, "0\\t\\r\\n\\'\\\"\\\\\\x9d");
Expand All @@ -76,7 +75,10 @@ impl [u8] {
without modifying the original"]
#[stable(feature = "inherent_ascii_escape", since = "1.60.0")]
pub fn escape_ascii(&self) -> EscapeAscii<'_> {
EscapeAscii { inner: self.iter().flat_map(EscapeByte) }
EscapeAscii {
inner: self.iter().flat_map(EscapeByte),
len: self.iter().map(|c| EscapeByte(c).len()).sum()
}
}

/// Returns a byte slice with leading ASCII whitespace bytes removed.
Expand Down Expand Up @@ -174,6 +176,9 @@ impl_fn_for_zst! {
#[must_use = "iterators are lazy and do nothing unless consumed"]
pub struct EscapeAscii<'a> {
inner: iter::FlatMap<super::Iter<'a, u8>, ascii::EscapeDefault, EscapeByte>,

/// Cached length of the iterator to implement `ExactSizeIterator`.
len: usize,
Copy link
Member

@the8472 the8472 Jul 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can't cache the value because len() must be updated every time an item is consumed from the iterator.

https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.size_hint

Returns the bounds on the remaining length of the iterator.

This was also recently clarified for ExactsizeIterator, as you can see in the nightly docs

Returns the exact remaining length of the iterator.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing out my oversight. I will recommit.

Copy link
Member

@the8472 the8472 Jul 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, caching may still be cheaper than recomputing it every time, but you'll at least have to update the cached value every time an item is consumed.
But the problem with that is that it makes every call to next() more expensive, so which approach is beneficial depends on how often len() is called.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel caching is the better choice even with the extra overhead of an decrement each next() or next_back() and an extra usize for each EscapeAscii. Caching avoids degenerate worst cases like users calling len() or size_hint() after each byte received from a slow source.

It's unfortunately to even have to go through the source iterator more than once but there isn't any other way to implement ExactSizeIterator correctly and it can't be removed because it's been released in a stable version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and it can't be removed because it's been released in a stable version.

The stability policy does have exceptions for accidental stabilizations and critical bugfixes. So removing it is an option. The crater run in #99880 will tell us whether it's worthwhile taking that option.

}

#[stable(feature = "inherent_ascii_escape", since = "1.60.0")]
Expand All @@ -185,7 +190,7 @@ impl<'a> iter::Iterator for EscapeAscii<'a> {
}
#[inline]
fn size_hint(&self) -> (usize, Option<usize>) {
self.inner.size_hint()
(self.len, Some(self.len))
}
#[inline]
fn try_fold<Acc, Fold, R>(&mut self, init: Acc, fold: Fold) -> R
Expand Down
9 changes: 9 additions & 0 deletions library/core/tests/slice.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2245,6 +2245,7 @@ fn test_copy_within_panics_src_inverted() {
// 2 is greater than 1, so this range is invalid.
bytes.copy_within(2..1, 0);
}

#[test]
#[should_panic(expected = "attempted to index slice up to maximum usize")]
fn test_copy_within_panics_src_out_of_bounds() {
Expand Down Expand Up @@ -2574,3 +2575,11 @@ fn test_flatten_mut_size_overflow() {
let x = &mut [[(); usize::MAX]; 2][..];
let _ = x.flatten_mut();
}

#[test]
fn test_escape_ascii() {
assert_eq!([].escape_ascii().len(), 0);
assert_eq!([b'X'].escape_ascii().len(), 1);
assert_eq!([b'\t'].escape_ascii().to_string(), "\\t");
assert_eq!([b'\t'].escape_ascii().size_hint(), (2, Some(2)));
}