Generic storage #3

vbarrielle · 2015-09-28T18:27:37Z

Hi,

I've been experimenting with ndarrays in rust recently, and I figured I should probably try and contribute to your crate rather than create yet another experimental library in rust.

I'm a bit concerned about requiring the storage for ndarrays to be Rc<Vec>. Indeed, most usages of ndarrays don't require shared ownership. Also it makes working with views over foreign data a bit harder in my opinion. Also, I find the implicit copy on write semantics surprising, and would prefer having an explicit Cow if I need these semantics.

So here's my proposal: have the Array struct generic over its storage, provided that the storage can be Deref to a slice.

This then gives three main Array types: type ArrayOwned = Array<Vec>, type ArrayView = Array<&[A]>, and type ArrayViewMut = Array<&mut [A]>.
This makes the ownership clear, and it doesn't seem to overcomplicate the API.

What do you think?

bluss · 2015-09-28T21:40:14Z

Wow, this is very interesting!

The whole thing is just inspired from numpy's ndarray, so that's where the copy on write comes from. For sure I have contemplated how to create array views and making the shared data optional.

Rust has a different model than Python for sure, and our slices and shared references may be the saner way to have cheap array views. I wouldn't want to rip out Rc-sharing myself, but I don't think I need it. We can probably do something like this..

If you look around, I bet you can see that the whole crate needs modernization. It's pretty much all pre-Rust 1.0, with only changes to keep it compiling.

Apart from cheap view types like this, my other big priority would be efficient iteration. I don't think the current is quite where we want it to be.

vbarrielle · 2015-10-01T17:56:57Z

The whole thing is just inspired from numpy's ndarray, so that's where the copy on write comes from. For sure I have contemplated how to create array views and making the shared data optional.

Yes I guessed this came from numpy, though I haven't witnessed copy on write behavior in numpy yet.

If you look around, I bet you can see that the whole crate needs modernization. It's pretty much all pre-Rust 1.0, with only changes to keep it compiling.

There's probably one point that feels tricky to me: right now arrays have a pointer member, which was probably necessary with an rc storage, but I don't think it's necessary once the Rc storage is removed (slice indexing can be used instead). That would remove quite a lot of unsafe code.

Apart from cheap view types like this, my other big priority would be efficient iteration. I don't think the current is quite where we want it to be.

I haven't looked at it yet, what are the problems with the current implementation?

bluss · 2015-10-01T18:58:18Z

The pointer is needed for general slicing / stride computations. The pointer is recalculated if any dimension is sliced with a negative stride. It also avoids double indirection in case of a Vec<A> or Rc<A> storage.

I haven't looked at it yet, what are the problems with the current implementation?

It calculates indices for each lap of an iteration, ideally iteration over a regular layout 2d array should be just as fast as over a 1d array, as a first measure.

vbarrielle · 2015-10-01T20:39:57Z

I see, I had not thought about negative strides.

For iteration, it seems there are two different cases: when the storage is contiguous (or not contiguous only on the dimension the array is iterated on), it can be done by simply offseting a pointer.
However, if eg slicing has produced an array that is not contiguous over many dimensions, it's a lot less clear to me but I think there's no better solution than computing the offset to the base pointer at each iteration.

bluss · 2015-10-02T12:21:04Z

You understand that I can't outright merge this, since it would break all client code.

I'll make sure to publish the current version on crates.io so that it's possible for us and clients to version our changes properly.

vbarrielle · 2015-10-02T12:36:45Z

Of course, no problem with that.

Also I've been thinking about it some more, and I see that some nice to have features coming from numpy can't be achieved without reference counting.

So maybe I could enhance the state of this pull request by re-introducing the Rc based storage (and maybe an Arc based storage as well?) to take care of these cases.

For instance:

from a vec based storage, one could get a Rc based storage using fn share(self) -> ArrayRc.
ArrayRc would have the method fn reshape(&self, ...) -> ArrayRc, that would work on all cases
all Array versions would have the method fn reshape_view(&self, ...) -> ArrayView that can panic under bad cases (or maybe it's better to return a Result here)
ArrayRc can't be mutated, but an owning copy can be obtained (fn to_owned(&self) -> ArrayOwned).

That way maybe client code can opt into Rc for the more ergonomic API, and drop down to finer grained control where needed.

If that's ok with you I'll perform the changes and append them to the pull request.

bluss · 2015-10-02T12:53:19Z

I'm not convinced the Rc features are that useful, but they are interesting. They provide the versatility of the current solution (Array can be both an "owner" or just a view of array data). Moving to explicit owner / view dichotomy takes us closer to idiomatic rust, but it actually complicates the user interface for the most part.

I'm thinking it would be nice to generalize this with traits and explicit view types if it's possible. The Array<Storage> generalization is nice, I wonder if it's a good solution or if we should prefer encapsulation.

vbarrielle · 2015-10-06T11:49:56Z

Moving to explicit owner / view dichotomy takes us closer to idiomatic rust, but it actually complicates the user interface for the most part.

In my opinion, the parts where the owner/view dichotomy make the API complicated are tricky points in the numpy API as well. For instance, in numpy writing to a reshaped array can either overwrite the original data, or write into a copy, silently. This means that for lots of programs, there is a need to be sure whether a reshape will cause a copy or not. With a reshape returning an array view, this is made explicit in the API.
However there is no such problem for immutable data, so if Rc storage is here, it's probably better as a read-only storage (ie I don't like the implicit copies numpy might make behind my back).

I've been experimenting with more storages in the Array<Storage> scheme, with Rc and Arc storages. You can have a look on my branch.
I've encountered a pain point with the Array<Storage> scheme in the process: Rc<Vec<A>> does not deref to [A], which means some of the methods available on other storages are not available for Rc based arrays. I'm not sure whether this can be expressed by having a more complex bound on the Storage type.

I'm thinking it would be nice to generalize this with traits and explicit view types if it's possible. The Array<Storage> generalization is nice, I wonder if it's a good solution or if we should prefer encapsulation.

Apart from the pain point above, I find the Array<Storage> approach nice from a code factorization point of view: this enables implementing methods once for all Array types. I'm not sure that would be possible with an explicit view type.

bluss · 2015-12-15T00:53:40Z

I'm looking at your PR while remaking it to a large part. I'll go the route of preserving the shared / cow array, so that it's a smooth changeover.

bluss · 2015-12-15T01:09:40Z

FWIW, using Deref is not possible anyway due to the usual trait trap: Deref is not an unsafe trait, so you can't trust any raw pointer you take from using it: malicious Deref impls can then crash your program. 😦 So as usual, you need a custom unsafe marked trait anyway, to signify that you trust Vec, slices etc to be sensible backends. This is touched upon here: rust-lang/rust/issues/29701

bluss · 2015-12-16T06:31:47Z

Thank you for all the ideas. I do believe this idea proved itself as soon as I tried implementing it.

The view types also have some benefits of representing states that the Cow array could not, for example, the read only broadcasted to larger size array view, or a mutable subview.

Improve pairwise summation

A different approach

vbarrielle and others added 4 commits September 25, 2015 18:48

start removing Rc WIP (lib compiles but not tests)

355154e

generic storage: made tests compile, not yet passing

e5e308a

fix typo on Array::view()

efa7b40

generic storage: all tests passing

fd7f03a

bluss self-assigned this Sep 28, 2015

bluss mentioned this pull request Dec 15, 2015

Introduce ArrayBase and provide both Array, OwnedArray, ArrayView and ArrayViewMut #8

Merged

bluss closed this Dec 16, 2015

bluss mentioned this pull request Apr 18, 2016

Rayon experiment with matrix multiplication #190

Closed

jturner314 pushed a commit to jturner314/ndarray that referenced this pull request Feb 3, 2019

Merge pull request rust-ndarray#3 from jturner314/pairwise-summation

1d51f70

Improve pairwise summation

LukeMathWalker pushed a commit that referenced this pull request Apr 26, 2019

Merge pull request #3 from LukeMathWalker/refactoring-smarter-debugging

72e05d7

A different approach

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generic storage #3

Generic storage #3

vbarrielle commented Sep 28, 2015

bluss commented Sep 28, 2015

vbarrielle commented Oct 1, 2015

bluss commented Oct 1, 2015

vbarrielle commented Oct 1, 2015

bluss commented Oct 2, 2015

vbarrielle commented Oct 2, 2015

bluss commented Oct 2, 2015

vbarrielle commented Oct 6, 2015

bluss commented Dec 15, 2015

bluss commented Dec 15, 2015

bluss commented Dec 16, 2015

Generic storage #3

Generic storage #3

Conversation

vbarrielle commented Sep 28, 2015

bluss commented Sep 28, 2015

vbarrielle commented Oct 1, 2015

bluss commented Oct 1, 2015

vbarrielle commented Oct 1, 2015

bluss commented Oct 2, 2015

vbarrielle commented Oct 2, 2015

bluss commented Oct 2, 2015

vbarrielle commented Oct 6, 2015

bluss commented Dec 15, 2015

bluss commented Dec 15, 2015

bluss commented Dec 16, 2015