Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic storage #3

Closed
wants to merge 4 commits into from
Closed

Conversation

vbarrielle
Copy link
Contributor

Hi,

I've been experimenting with ndarrays in rust recently, and I figured I should probably try and contribute to your crate rather than create yet another experimental library in rust.

I'm a bit concerned about requiring the storage for ndarrays to be Rc<Vec>. Indeed, most usages of ndarrays don't require shared ownership. Also it makes working with views over foreign data a bit harder in my opinion. Also, I find the implicit copy on write semantics surprising, and would prefer having an explicit Cow if I need these semantics.

So here's my proposal: have the Array struct generic over its storage, provided that the storage can be Deref to a slice.

This then gives three main Array types: type ArrayOwned = Array<Vec>, type ArrayView = Array<&[A]>, and type ArrayViewMut = Array<&mut [A]>.
This makes the ownership clear, and it doesn't seem to overcomplicate the API.

What do you think?

@bluss
Copy link
Member

bluss commented Sep 28, 2015

Wow, this is very interesting!

The whole thing is just inspired from numpy's ndarray, so that's where the copy on write comes from. For sure I have contemplated how to create array views and making the shared data optional.

Rust has a different model than Python for sure, and our slices and shared references may be the saner way to have cheap array views. I wouldn't want to rip out Rc-sharing myself, but I don't think I need it. We can probably do something like this..

If you look around, I bet you can see that the whole crate needs modernization. It's pretty much all pre-Rust 1.0, with only changes to keep it compiling.

Apart from cheap view types like this, my other big priority would be efficient iteration. I don't think the current is quite where we want it to be.

@bluss bluss self-assigned this Sep 28, 2015
@vbarrielle
Copy link
Contributor Author

The whole thing is just inspired from numpy's ndarray, so that's where the copy on write comes from. For sure I have contemplated how to create array views and making the shared data optional.

Yes I guessed this came from numpy, though I haven't witnessed copy on write behavior in numpy yet.

If you look around, I bet you can see that the whole crate needs modernization. It's pretty much all pre-Rust 1.0, with only changes to keep it compiling.

There's probably one point that feels tricky to me: right now arrays have a pointer member, which was probably necessary with an rc storage, but I don't think it's necessary once the Rc storage is removed (slice indexing can be used instead). That would remove quite a lot of unsafe code.

Apart from cheap view types like this, my other big priority would be efficient iteration. I don't think the current is quite where we want it to be.

I haven't looked at it yet, what are the problems with the current implementation?

@bluss
Copy link
Member

bluss commented Oct 1, 2015

The pointer is needed for general slicing / stride computations. The pointer is recalculated if any dimension is sliced with a negative stride. It also avoids double indirection in case of a Vec<A> or Rc<A> storage.

I haven't looked at it yet, what are the problems with the current implementation?

It calculates indices for each lap of an iteration, ideally iteration over a regular layout 2d array should be just as fast as over a 1d array, as a first measure.

@vbarrielle
Copy link
Contributor Author

I see, I had not thought about negative strides.

For iteration, it seems there are two different cases: when the storage is contiguous (or not contiguous only on the dimension the array is iterated on), it can be done by simply offseting a pointer.
However, if eg slicing has produced an array that is not contiguous over many dimensions, it's a lot less clear to me but I think there's no better solution than computing the offset to the base pointer at each iteration.

@bluss
Copy link
Member

bluss commented Oct 2, 2015

You understand that I can't outright merge this, since it would break all client code.

I'll make sure to publish the current version on crates.io so that it's possible for us and clients to version our changes properly.

@vbarrielle
Copy link
Contributor Author

Of course, no problem with that.

Also I've been thinking about it some more, and I see that some nice to have features coming from numpy can't be achieved without reference counting.

So maybe I could enhance the state of this pull request by re-introducing the Rc based storage (and maybe an Arc based storage as well?) to take care of these cases.

For instance:

  • from a vec based storage, one could get a Rc based storage using fn share(self) -> ArrayRc.
  • ArrayRc would have the method fn reshape(&self, ...) -> ArrayRc, that would work on all cases
  • all Array versions would have the method fn reshape_view(&self, ...) -> ArrayView that can panic under bad cases (or maybe it's better to return a Result here)
  • ArrayRc can't be mutated, but an owning copy can be obtained (fn to_owned(&self) -> ArrayOwned).

That way maybe client code can opt into Rc for the more ergonomic API, and drop down to finer grained control where needed.

If that's ok with you I'll perform the changes and append them to the pull request.

@bluss
Copy link
Member

bluss commented Oct 2, 2015

I'm not convinced the Rc features are that useful, but they are interesting. They provide the versatility of the current solution (Array can be both an "owner" or just a view of array data). Moving to explicit owner / view dichotomy takes us closer to idiomatic rust, but it actually complicates the user interface for the most part.

I'm thinking it would be nice to generalize this with traits and explicit view types if it's possible. The Array<Storage> generalization is nice, I wonder if it's a good solution or if we should prefer encapsulation.

@vbarrielle
Copy link
Contributor Author

Moving to explicit owner / view dichotomy takes us closer to idiomatic rust, but it actually complicates the user interface for the most part.

In my opinion, the parts where the owner/view dichotomy make the API complicated are tricky points in the numpy API as well. For instance, in numpy writing to a reshaped array can either overwrite the original data, or write into a copy, silently. This means that for lots of programs, there is a need to be sure whether a reshape will cause a copy or not. With a reshape returning an array view, this is made explicit in the API.
However there is no such problem for immutable data, so if Rc storage is here, it's probably better as a read-only storage (ie I don't like the implicit copies numpy might make behind my back).

I've been experimenting with more storages in the Array<Storage> scheme, with Rc and Arc storages. You can have a look on my branch.
I've encountered a pain point with the Array<Storage> scheme in the process: Rc<Vec<A>> does not deref to [A], which means some of the methods available on other storages are not available for Rc based arrays. I'm not sure whether this can be expressed by having a more complex bound on the Storage type.

I'm thinking it would be nice to generalize this with traits and explicit view types if it's possible. The Array<Storage> generalization is nice, I wonder if it's a good solution or if we should prefer encapsulation.

Apart from the pain point above, I find the Array<Storage> approach nice from a code factorization point of view: this enables implementing methods once for all Array types. I'm not sure that would be possible with an explicit view type.

@bluss
Copy link
Member

bluss commented Dec 15, 2015

I'm looking at your PR while remaking it to a large part. I'll go the route of preserving the shared / cow array, so that it's a smooth changeover.

@bluss
Copy link
Member

bluss commented Dec 15, 2015

FWIW, using Deref is not possible anyway due to the usual trait trap: Deref is not an unsafe trait, so you can't trust any raw pointer you take from using it: malicious Deref impls can then crash your program. 😦 So as usual, you need a custom unsafe marked trait anyway, to signify that you trust Vec, slices etc to be sensible backends. This is touched upon here: rust-lang/rust/issues/29701

@bluss
Copy link
Member

bluss commented Dec 16, 2015

Thank you for all the ideas. I do believe this idea proved itself as soon as I tried implementing it.

The view types also have some benefits of representing states that the Cow array could not, for example, the read only broadcasted to larger size array view, or a mutable subview.

@bluss bluss closed this Dec 16, 2015
jturner314 pushed a commit to jturner314/ndarray that referenced this pull request Feb 3, 2019
LukeMathWalker pushed a commit that referenced this pull request Apr 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants