-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Berlin discussion follow up #1
Comments
Ready to start creating a project scaffold! |
typing.... |
... |
Hi all, I just uploaded a bit of a scaffold for building an FsTensor package, and I wrote in the Wiki a summary of our discussions. At this stage it probably makes more sense to still play around scripting, but if the implementation details become more concrete I will iron out the last details to make it a package ready for publishing. The last 3 days were fun, and I hope next year we have something to show for it :) . |
tagging @WhiteBlackGoose and @Happypig375 because i remember you guys discussing this topic some time back, maybe you guys have some thoughts on this? |
I remember @pkese and @matthewcrews also having a lot of inspiration :) |
Can recommend the DiffSharp RawTensor implementation. @muehlhaus knows the gig, I went over it with him. |
let x: Tensor<int, 2, 3> = tensor [[1; 2; 3]; [4; 5; 6]]
let y = x |> Tensor.map ((+) 1) // tensor [[2; 3; 4]; [5; 6; 7]]
let z = y[3,3] // Error: constant indexing out of bounds This can further be improved by fsharp/fslang-suggestions#1086 let x = [[1; 2; 3]; [4; 5; 6]] // type inferred as tensor
let y = x |> Tensor.map ((+) 1) // tensor [[2; 3; 4]; [5; 6; 7]]
let z = y[3,3] // Error: constant indexing out of bounds |
I think one of the important things that we need to flesh out is the access patterns we want to be able to support for the various operations we want to support. The way we need to access data will determine the API we want to create and the underlying storage format. My ideal scenario for the base Tensor library is to describe how we need to be able to access data (item lookup, iteration, slicing, subsetting, etc.). We can provide a default implementation that is entirely managed, and Providers can provide optimized implementations based on the underlying constraints. Another critical thing is deciding whether we consider a Tensor a mutable or immutable collection. I can argue either way, but I strongly veer toward immutability, given the ethos of the F# language. Supporting random updates of a Tensor would make the backend much more complex. Can we do it? Yes. Do I want do? No 😂. Another aspect that I would like to support is surfacing the dimensionality of the Tensor in the type system but without having to hand-code N different Tensor types (Tensor1D, Tensor2D, Tensor3D, ... TensorND). I've taken this approach in the past, and it leads to a significant amount of boilerplate that is difficult to maintain. The true ideal would be that the F# compiler would be able to check the dimensionality and the length of the dimensions. I do not believe this is feasible, though. I believe it would require something more exotic than what is currently in F#, something like Dependent Types. I also believe we should start with having the programmer specify the type of Tensor, Dense vs. Sparse. Later, we could possibly abstract this away and allow the library to choose a format based on the shape of the data. I don't think that is a good place to start, though. This means we would start with a Tensor<'T>, which describes the API for accessing values. DenseTensor<'T> and SparseTensor<'T> would be concrete implementations that actually know how to store data. Personally, I almost exclusively work with sparse data, but I know a large part of the community works with dense. Feel free to disagree with any of this. I've already bled quite a bit on this problem, so my opinion is informed by scars that may no longer be correct. |
My experience from TorchSharp and DiffSharp is different
I know this is all counter-cultural but I've been down this rabbit hole more times than I care to count and what we did in DiffSharp RawTensor was the best solution I've come up with at the base level when you balance the factors in this difficult space. All the above are essential, and pushing any more complexity or concerns into the base layer kills you. |
If this is the direction we decide to go (which I have no problem with), it limits the utility for my applications. Again, this is absolutely fine, but if we restrict the types that a Tensor can contain to a finite set, it's not useful for me. Because I work with things like expressions (LinearExpression, BooleanExpression, etc.). If that is what is best for the community though, that's what should be done. I've long supported my own collection so I'm no worse off. If I'm misunderstanding this statement, please let me know 😊. I believe I'm the only person who would receive any value from this so I propose that we drop it as a requirement. |
(DiffSharp RawTensor doesn't actually use Python naming - it's not really user-facing - but both the user-facing TorchSharp and DiffSharp Tensor types do) |
Tensor programming is popular for one main reason: the enormous efficiency of batch processing of videos (4D), images (3D), matrices (2D) with GPUs - the batch adding one dimension to each of these, giving 5D, 4D, 3D objects. GPUs work over very limited ranges of datatypes. There's some value in 3D, 4D, 5D numerically-indexed collections of other things that can't be directly processed by GPUs. This applies particularly when applying the typesafe-indexing techniques you mentioned in your talk in Berlin. Though the contents of the tensors can in theory be indexes into another table.
It's OK to have a derived library giving a type What I'm really recommending is that as you build this from the ground up, the "backend" dimension of extensibility should be solved first, at the foundational layer, and then hidden - and this is best done by different implementations of dynamically-typed-and-shape-checked-and-device-moved The DiffSharp RawTensor code is here btw. It's not perfect but does allow a LibTorch implementation The definition order is
The rest is programming, but some notable parts are:
All of this is defined as backend-neutral. The backends are separate |
On immutable v. immutable - I went back and forth on this for DiffSharp. It's painful: some specific tensors must absolutely be mutable, notably the enormous number of "model" parameters in any serious training, and several "local" tensors in loops accumulating sums or adjoints (in back-prop) etc. Equally mutability is corrosive and every single use of mutation on tensors needs extreme justification. I proposed a system where tensors were immutable by default, with a need to specifically convert unsafely to a mutable tensor "register", so you could at least track and reason about where mutation was happening. We didn't end up checking it in, partly because you quickly end up duplicating a lot of operations, and partly because my collaborator was fundamentally OK with tensors being mutable, coming from PyTorch and all (and having bigger things to worry about). Adding a dynamic flag akin to shape/backend/dtype indicating mutability is probably possible, then building up things on top of that. |
As an example to help you think where you might want to go with this, I stripped out the bespoke forward/backward differentiation from DiffSharp leaving just a fully-fledged "raw" tensor library with a Reference and LibTorch backend branch: https://github.com/dsyme/DiffSharp/tree/dsyme/tensors The parts stripped out are
The backends actually don't change at all. The rest is the same: so this is now a fully-fledged tensor library and API with two backends (one Reference, one LibTorch) just without any differentiation/gradient/optimization support (unless you're using LibTorch backend, in which case you could in theory use the gradients integrated into LibTorch tensors). You could easily add other backends for slimmer C++ tensor libraries that don't provide any gradient capabilities, and the reference backend could be progressed to be a much faster managed implementation. I hope it's useful to you all. If you take this shape of thing it should use a different name, and of course if you use any of this code it should respect the license etc. (BTW the diff effectively shows exactly what it means to add DiffSharp-style differentition to a tensor API - it's both impressively minimal, but also impressively subtle, and in particular requires that each primitive binary and unary tensor operation supported - either for necessity or performance - declare its necessary derivatives. ) (Note, the tests may be independently useful for you all too) After stripping back a bit more, this is the size of the resulting DLLs:
and for the backends:
plus of course the vast TorchSharp and LibTorch binaries. |
Linking another .NET tensor-related discussion |
and a mention in the RC2 announcement about .NET tensor primitives https://devblogs.microsoft.com/dotnet/announcing-dotnet-8-rc2/#introducing-tensor-primitives-for-net |
Berlin conference hackathon follow up
The text was updated successfully, but these errors were encountered: