Skip to content

Commit

Permalink
Auto merge of #33090 - bluss:special-zip-2, r=aturon
Browse files Browse the repository at this point in the history
Specialize .zip() for efficient slice and slice iteration

The idea is to introduce a private trait TrustedRandomAccess and specialize .zip() for random access iterators into a counted loop.

The implementation in the PR is internal and has no visible effect in the API

Why a counted loop? To have each slice iterator compile to just a pointer, and both pointers are indexed with the same loop counter value in the generated code. When this succeeds, copying loops are readily recognized and replaced with memcpy and addition loops autovectorize well.

The TrustedRandomAccess approach works very well on the surface. Microbenchmarks optimize well, following the ideas above, and that is a dramatic improvement of .zip()'s codegen.

```rust
// old zip before this PR: bad, byte-for-byte loop
// with specialized zip: memcpy
pub fn copy_zip(xs: &[u8], ys: &mut [u8]) {
    for (a, b) in ys.iter_mut().zip(xs) {
        *a = *b;
    }
}

// old zip before this PR: single addition per iteration
// with specialized zip: vectorized
pub fn add_zip(xs: &[f32], ys: &mut [f32]) {
    for (a, b) in ys.iter_mut().zip(xs) { *a += *b; }
}

// old zip before this PR: single addition per iteration
// with specialized zip: vectorized (!!)
pub fn add_zip3(xs: &[f32], ys: &[f32], zs: &mut [f32]) {
    for ((a, b), c) in zs.iter_mut().zip(xs).zip(ys) { *a += *b * *c; }
}
```

Yet in more complex situations, the .zip() loop can still fall back to its old behavior where phantom null checks throw in fake premature end of the loop conditionals. Remember that a NULL inside
Option<(&T, &T)> makes it a `None` value and a premature (in this case)
end of the loop.

So even if we have 1) an explicit `Some` in the code and 2) the types of the pointers are `&T` or `&mut T` which are nonnull, we can still get a phantom null check at that point.

One example that illustrates the difference is `copy_zip` with slice versus Vec arguments. The involved iterator types are exactly the same, but the Vec version doesn't compile down to memcpy. Investigating into this, the function argument metadata emitted to llvm plays the biggest role. As eddyb summarized, we need nonnull for the loop to autovectorize and noalias for it to replace with memcpy.

There was an experiment to use `assume` to add a non-null assumption on each of the two elements in the specialized zip iterator, but this only helped in some of the test cases and regressed others. Instead I think the nonnull/noalias metadata issue is something we need to solve separately anyway.

These have conditionally implemented TrustedRandomAccess

- Enumerate
- Zip

These have not implemented it

- Map is sideeffectful. The forward case would be workable, but the double ended case is complicated.
- Chain, exact length semantics unclear
- Filter, FilterMap, FlatMap and many others don't offer random access and/or exact length
  • Loading branch information
bors authored Jun 17, 2016
2 parents be203ac + 5df05c6 commit c8eff68
Show file tree
Hide file tree
Showing 7 changed files with 263 additions and 21 deletions.
3 changes: 2 additions & 1 deletion src/libcore/iter/iterator.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ use super::{Chain, Cycle, Cloned, Enumerate, Filter, FilterMap, FlatMap, Fuse,
use super::ChainState;
use super::{DoubleEndedIterator, ExactSizeIterator, Extend, FromIterator,
IntoIterator};
use super::ZipImpl;

fn _assert_is_object_safe(_: &Iterator<Item=()>) {}

Expand Down Expand Up @@ -383,7 +384,7 @@ pub trait Iterator {
fn zip<U>(self, other: U) -> Zip<Self, U::IntoIter> where
Self: Sized, U: IntoIterator
{
Zip{a: self, b: other.into_iter()}
Zip::new(self, other.into_iter())
}

/// Takes a closure and creates an iterator which calls that closure on each
Expand Down
185 changes: 165 additions & 20 deletions src/libcore/iter/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -301,7 +301,9 @@

use clone::Clone;
use cmp;
use default::Default;
use fmt;
use iter_private::TrustedRandomAccess;
use ops::FnMut;
use option::Option::{self, Some, None};
use usize;
Expand Down Expand Up @@ -622,7 +624,8 @@ impl<A, B> DoubleEndedIterator for Chain<A, B> where
#[stable(feature = "rust1", since = "1.0.0")]
pub struct Zip<A, B> {
a: A,
b: B
b: B,
spec: <(A, B) as ZipImplData>::Data,
}

#[stable(feature = "rust1", since = "1.0.0")]
Expand All @@ -631,29 +634,13 @@ impl<A, B> Iterator for Zip<A, B> where A: Iterator, B: Iterator
type Item = (A::Item, B::Item);

#[inline]
fn next(&mut self) -> Option<(A::Item, B::Item)> {
self.a.next().and_then(|x| {
self.b.next().and_then(|y| {
Some((x, y))
})
})
fn next(&mut self) -> Option<Self::Item> {
ZipImpl::next(self)
}

#[inline]
fn size_hint(&self) -> (usize, Option<usize>) {
let (a_lower, a_upper) = self.a.size_hint();
let (b_lower, b_upper) = self.b.size_hint();

let lower = cmp::min(a_lower, b_lower);

let upper = match (a_upper, b_upper) {
(Some(x), Some(y)) => Some(cmp::min(x,y)),
(Some(x), None) => Some(x),
(None, Some(y)) => Some(y),
(None, None) => None
};

(lower, upper)
ZipImpl::size_hint(self)
}
}

Expand All @@ -664,6 +651,61 @@ impl<A, B> DoubleEndedIterator for Zip<A, B> where
{
#[inline]
fn next_back(&mut self) -> Option<(A::Item, B::Item)> {
ZipImpl::next_back(self)
}
}

// Zip specialization trait
#[doc(hidden)]
trait ZipImpl<A, B> {
type Item;
fn new(a: A, b: B) -> Self;
fn next(&mut self) -> Option<Self::Item>;
fn size_hint(&self) -> (usize, Option<usize>);
fn next_back(&mut self) -> Option<Self::Item>
where A: DoubleEndedIterator + ExactSizeIterator,
B: DoubleEndedIterator + ExactSizeIterator;
}

// Zip specialization data members
#[doc(hidden)]
trait ZipImplData {
type Data: 'static + Clone + Default + fmt::Debug;
}

#[doc(hidden)]
impl<T> ZipImplData for T {
default type Data = ();
}

// General Zip impl
#[doc(hidden)]
impl<A, B> ZipImpl<A, B> for Zip<A, B>
where A: Iterator, B: Iterator
{
type Item = (A::Item, B::Item);
default fn new(a: A, b: B) -> Self {
Zip {
a: a,
b: b,
spec: Default::default(), // unused
}
}

#[inline]
default fn next(&mut self) -> Option<(A::Item, B::Item)> {
self.a.next().and_then(|x| {
self.b.next().and_then(|y| {
Some((x, y))
})
})
}

#[inline]
default fn next_back(&mut self) -> Option<(A::Item, B::Item)>
where A: DoubleEndedIterator + ExactSizeIterator,
B: DoubleEndedIterator + ExactSizeIterator
{
let a_sz = self.a.len();
let b_sz = self.b.len();
if a_sz != b_sz {
Expand All @@ -680,12 +722,106 @@ impl<A, B> DoubleEndedIterator for Zip<A, B> where
_ => unreachable!(),
}
}

#[inline]
default fn size_hint(&self) -> (usize, Option<usize>) {
let (a_lower, a_upper) = self.a.size_hint();
let (b_lower, b_upper) = self.b.size_hint();

let lower = cmp::min(a_lower, b_lower);

let upper = match (a_upper, b_upper) {
(Some(x), Some(y)) => Some(cmp::min(x,y)),
(Some(x), None) => Some(x),
(None, Some(y)) => Some(y),
(None, None) => None
};

(lower, upper)
}
}

#[doc(hidden)]
#[derive(Default, Debug, Clone)]
struct ZipImplFields {
index: usize,
len: usize,
}

#[doc(hidden)]
impl<A, B> ZipImplData for (A, B)
where A: TrustedRandomAccess, B: TrustedRandomAccess
{
type Data = ZipImplFields;
}

#[doc(hidden)]
impl<A, B> ZipImpl<A, B> for Zip<A, B>
where A: TrustedRandomAccess, B: TrustedRandomAccess
{
fn new(a: A, b: B) -> Self {
let len = cmp::min(a.len(), b.len());
Zip {
a: a,
b: b,
spec: ZipImplFields {
index: 0,
len: len,
}
}
}

#[inline]
fn next(&mut self) -> Option<(A::Item, B::Item)> {
if self.spec.index < self.spec.len {
let i = self.spec.index;
self.spec.index += 1;
unsafe {
Some((self.a.get_unchecked(i), self.b.get_unchecked(i)))
}
} else {
None
}
}

#[inline]
fn size_hint(&self) -> (usize, Option<usize>) {
let len = self.spec.len - self.spec.index;
(len, Some(len))
}

#[inline]
fn next_back(&mut self) -> Option<(A::Item, B::Item)>
where A: DoubleEndedIterator + ExactSizeIterator,
B: DoubleEndedIterator + ExactSizeIterator
{
if self.spec.index < self.spec.len {
self.spec.len -= 1;
let i = self.spec.len;
unsafe {
Some((self.a.get_unchecked(i), self.b.get_unchecked(i)))
}
} else {
None
}
}
}

#[stable(feature = "rust1", since = "1.0.0")]
impl<A, B> ExactSizeIterator for Zip<A, B>
where A: ExactSizeIterator, B: ExactSizeIterator {}

#[doc(hidden)]
unsafe impl<A, B> TrustedRandomAccess for Zip<A, B>
where A: TrustedRandomAccess,
B: TrustedRandomAccess,
{
unsafe fn get_unchecked(&mut self, i: usize) -> (A::Item, B::Item) {
(self.a.get_unchecked(i), self.b.get_unchecked(i))
}

}

/// An iterator that maps the values of `iter` with `f`.
///
/// This `struct` is created by the [`map()`] method on [`Iterator`]. See its
Expand Down Expand Up @@ -982,6 +1118,15 @@ impl<I> DoubleEndedIterator for Enumerate<I> where
#[stable(feature = "rust1", since = "1.0.0")]
impl<I> ExactSizeIterator for Enumerate<I> where I: ExactSizeIterator {}

#[doc(hidden)]
unsafe impl<I> TrustedRandomAccess for Enumerate<I>
where I: TrustedRandomAccess
{
unsafe fn get_unchecked(&mut self, i: usize) -> (usize, I::Item) {
(self.count + i, self.iter.get_unchecked(i))
}
}

/// An iterator with a `peek()` that returns an optional reference to the next
/// element.
///
Expand Down
27 changes: 27 additions & 0 deletions src/libcore/iter_private.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
// Copyright 2016 The Rust Project Developers. See the COPYRIGHT
// file at the top-level directory of this distribution and at
// http://rust-lang.org/COPYRIGHT.
//
// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
// http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
// <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
// option. This file may not be copied, modified, or distributed
// except according to those terms.


use iter::ExactSizeIterator;

/// An iterator whose items are random accessible efficiently
///
/// # Safety
///
/// The iterator's .len() and size_hint() must be exact.
///
/// .get_unchecked() must return distinct mutable references for distinct
/// indices (if applicable), and must return a valid reference if index is in
/// 0..self.len().
#[doc(hidden)]
pub unsafe trait TrustedRandomAccess : ExactSizeIterator {
unsafe fn get_unchecked(&mut self, i: usize) -> Self::Item;
}

1 change: 1 addition & 0 deletions src/libcore/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -156,4 +156,5 @@ pub mod hash;
pub mod fmt;

// note: does not need to be public
mod iter_private;
mod tuple;
15 changes: 15 additions & 0 deletions src/libcore/slice.rs
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ use result::Result::{Ok, Err};
use ptr;
use mem;
use marker::{Copy, Send, Sync, self};
use iter_private::TrustedRandomAccess;

#[repr(C)]
struct Repr<T> {
Expand Down Expand Up @@ -1942,3 +1943,17 @@ macro_rules! impl_marker_for {

impl_marker_for!(BytewiseEquality,
u8 i8 u16 i16 u32 i32 u64 i64 usize isize char bool);

#[doc(hidden)]
unsafe impl<'a, T> TrustedRandomAccess for Iter<'a, T> {
unsafe fn get_unchecked(&mut self, i: usize) -> &'a T {
&*self.ptr.offset(i as isize)
}
}

#[doc(hidden)]
unsafe impl<'a, T> TrustedRandomAccess for IterMut<'a, T> {
unsafe fn get_unchecked(&mut self, i: usize) -> &'a mut T {
&mut *self.ptr.offset(i as isize)
}
}
31 changes: 31 additions & 0 deletions src/libcoretest/iter.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ use core::{i8, i16, isize};
use core::usize;

use test::Bencher;
use test::black_box;

#[test]
fn test_lt() {
Expand Down Expand Up @@ -1030,3 +1031,33 @@ fn bench_max(b: &mut Bencher) {
it.map(scatter).max()
})
}

pub fn copy_zip(xs: &[u8], ys: &mut [u8]) {
for (a, b) in ys.iter_mut().zip(xs) {
*a = *b;
}
}

pub fn add_zip(xs: &[f32], ys: &mut [f32]) {
for (a, b) in ys.iter_mut().zip(xs) {
*a += *b;
}
}

#[bench]
fn bench_zip_copy(b: &mut Bencher) {
let source = vec![0u8; 16 * 1024];
let mut dst = black_box(vec![0u8; 16 * 1024]);
b.iter(|| {
copy_zip(&source, &mut dst)
})
}

#[bench]
fn bench_zip_add(b: &mut Bencher) {
let source = vec![1.; 16 * 1024];
let mut dst = vec![0.; 16 * 1024];
b.iter(|| {
add_zip(&source, &mut dst)
});
}
22 changes: 22 additions & 0 deletions src/test/codegen/zip.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
// Copyright 2016 The Rust Project Developers. See the COPYRIGHT
// file at the top-level directory of this distribution and at
// http://rust-lang.org/COPYRIGHT.
//
// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
// http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
// <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
// option. This file may not be copied, modified, or distributed
// except according to those terms.

// compile-flags: -C no-prepopulate-passes -O

#![crate_type = "lib"]

// CHECK-LABEL: @zip_copy
#[no_mangle]
pub fn zip_copy(xs: &[u8], ys: &mut [u8]) {
// CHECK: memcpy
for (x, y) in xs.iter().zip(ys) {
*y = *x;
}
}

0 comments on commit c8eff68

Please sign in to comment.