Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C# bindings #152

Open
bramhoven opened this issue Sep 28, 2023 · 20 comments
Open

C# bindings #152

bramhoven opened this issue Sep 28, 2023 · 20 comments
Labels
new language Adding support for a new programming language

Comments

@bramhoven
Copy link

Looks like a very interesting project especially if it has bindings for most of the popular programming languages.

I saw on the site that you were planning to have bindings for C#. I have never created such a project to bind Rust to C#, but I would love to give it a shot.

@VivekPanyam
Copy link
Owner

Thanks, that sounds great to me!

If you haven't already taken a look, the contributing and architecture docs are probably worth checking out.

I'm not super familiar with C# so I don't know whether Rust <-> C/C++ <-> C# would be preferable to Rust <-> C# directly.

I'm working on C++ bindings so either approach could work. Do you (or anyone else reading) have thoughts on the best way to do this?

@bramhoven
Copy link
Author

I am not super familiar with Rust so maybe we can learn from eachother!
From what I read Rust <-> C# should be easily possible. I was looking for what binding file I could reuse for the C# binding lib, but from what I saw is that the Py and NodeJS binding libs are pretty specific and use other libraries to more easily integrate into those languages. Is there already a binding file which is not specific for a language?

I came across https://blog.datalust.co/rust-at-datalust-how-we-integrate-rust-with-csharp/ which also integrates into C#'s GC. Could be usefull.

Something like this would be how we can interact with a Rust library.
image
Ofcourse the arguments and return types would have to be correct too, but that was something I wasn't too familiar with and was researching.

@VivekPanyam
Copy link
Owner

I am not super familiar with Rust so maybe we can learn from eachother!

Sounds good :)

From what I read Rust <-> C# should be easily possible.

Good! A few things to keep in mind that may make bindings for Carton a little tricky:

  1. The interface is largely async. This generally can't be directly exposed to another language and requires some deeper integration in order to work (e.g. pyo3-asyncio, cxx-async) or using callbacks over the language boundary. My approach for the C++ bindings (in progress), is a hybrid between a callback-like thing and event notifications. I'll link to it here once I have something basic ready.
  2. Passing Tensors over the language boundary requires making sure that the "owner" of the tensor data doesn't deallocate it until there are no references left. This usually isn't too complex, it just requires a bit of care.

I was looking for what binding file I could reuse for the C# binding lib, but from what I saw is that the Py and NodeJS binding libs are pretty specific and use other libraries to more easily integrate into those languages. Is there already a binding file which is not specific for a language?

I'm not entirely sure what you mean by a "binding file". Do you mean something like what SWIG uses?

If all the functions and methods in this file can be exposed to C#, that basically gives us complete bindings. In order to do that, you'll need to expose many of the data types in these files because they are used in function signatures.

I'm actively working on C++ bindings (using CXX) so another option is for you to build on top of those to create an idiomatic C# library.

I came across https://blog.datalust.co/rust-at-datalust-how-we-integrate-rust-with-csharp/ which also integrates into C#'s GC. Could be usefull.

I skimmed it and it looks pretty comprehensive.

Also, just to make sure you're aware of this: Carton currently doesn't have Windows support. If that's important/interesting to you, maybe that's another thing to explore.

Thanks for looking into this!

@bramhoven
Copy link
Author

  1. The interface is largely async. This generally can't be directly exposed to another language and requires some deeper integration in order to work (e.g. pyo3-asyncio, cxx-async) or using callbacks over the language boundary. My approach for the C++ bindings (in progress), is a hybrid between a callback-like thing and event notifications. I'll link to it here once I have something basic ready.

That will make it more difficult, but quessing that your C++ approach will work for C# as well. So Might be smart to wait for you to finish that.

2. Passing Tensors over the language boundary requires making sure that the "owner" of the tensor data doesn't deallocate it until there are no references left. This usually isn't too complex, it just requires a bit of care.

If I read it properly the blog post I mentioned had a solution for that. Good to keep in mind indeed and should not be too hard.

I'm not entirely sure what you mean by a "binding file". Do you mean something like what SWIG uses?

With the binding file I meant the lib.rs. To me it felt like the "binding" between Carton and other languages that's why I called it that.

If all the functions and methods in this file can be exposed to C#, that basically gives us complete bindings. In order to do that, you'll need to expose many of the data types in these files because they are used in function signatures.

That's what I was looking for!

Also, just to make sure you're aware of this: Carton currently doesn't have Windows support. If that's important/interesting to you, maybe that's another thing to explore.

Regarding this. I have no clue yet what it would take but since I am running Windows I will try to look at it.

@bramhoven
Copy link
Author

Btw I found csbindgen. That will probably make it a little easier :)

@VivekPanyam
Copy link
Owner

That will make it more difficult, but quessing that your C++ approach will work for C# as well. So Might be smart to wait for you to finish that.

Yeah something similar should work for C#. Hopefully I'll have something ready to share for C and/or C++ tonight or tomorrow.

If I read it properly the blog post I mentioned had a solution for that. Good to keep in mind indeed and should not be too hard.

Yeah it did talk about that problem; I just wanted to highlight it since Tensor data is basically a large chunk of data shared across language boundaries (possibly several times).

That's what I was looking for!

Great!

Regarding this. I have no clue yet what it would take but since I am running Windows I will try to look at it.

Windows support will probably take a bit of work to ship. I mentioned a few details in this comment on another issue: #159 (comment)

You could probably prototype something fairly quickly, but getting it merged might take a bit of iteration since some of the changes will touch the internals of the runner interface (i.e. the IPC system). There's some overhead to making breaking changes to the runner interface so we need to be pretty thoughtful about design.

Btw I found csbindgen. That will probably make it a little easier :)

Oh, cool! That looks helpful.

@VivekPanyam
Copy link
Owner

Here are the C bindings: #169 (the PR is larger than I hoped; sorry :/)

The included README.md provides an overview of the async approach for C.

My C++ PR will talk about design tradeoffs a little more, but the short version is that I'm likely going to have async Rust functions return std::futures in C++ and also provide something like CartonAsyncNotifier from the C interface (because polling several std::futures is inefficient).

It would be cool if we can provide native async functions in C#. This seems possible.

Once you get a chance to look at the PR I linked, I'm curious what your thoughts on the following are:

I'm using cbindgen in the C PR. It seems like the Rust side of things for using csbindgen is nearly identical so we could possibly reuse that code. However, csbindgen seems to do a lot more than just generating a header file. I wonder whether the resulting code will be more efficient if we manually implement an idiomatic C# API on top of the C API rather than implementing an idiomatic C# API on top of a csbindgen generated C# API.

@bramhoven
Copy link
Author

Cool to see you finalized the initial implementation for C!
I will take a look at it tomorrow.

My C++ PR will talk about design tradeoffs a little more, but the short version is that I'm likely going to have async Rust functions return std::futures in C++ and also provide something like CartonAsyncNotifier from the C interface (because polling several std::futures is inefficient).

What would be the advantage of using std::future for C++ if CartonAsyncNotifier would be more efficient? I am not yet familiar with either of those so maybe it is a dumb question haha

@VivekPanyam
Copy link
Owner

What would be the advantage of using std::future for C++ if CartonAsyncNotifier would be more efficient?

Convenience primarily. The efficiency gain is only really if you have a lot of outstanding async tasks. Instead of looping through and checking each one, you can just ask Carton to tell you when one is ready (and which one is ready).

@VivekPanyam VivekPanyam added the new language Adding support for a new programming language label Oct 4, 2023
@VivekPanyam
Copy link
Owner

The C++ PR is up at #174. It's also unfortunately quite large, but the included README.md might be worth taking a look at. The C++ bindings are implemented on top of the C bindings with no additional Rust code. The readme goes into a bit more depth on why.

That said, cxx and cbindgen/csbindgen do different things so we can't directly take the rationale from my PR and apply it to the C# bindings. I just thought I'd share in case it helps you decide between csbindgen and building directly on top of the C API.

@bramhoven
Copy link
Author

I still want to take a deeper look at the C and C++ code specifically, but I have been playing around with converting the C bindings to C# using https://github.com/bottlenoselabs/c2cs. I have to try it from my Linux machine to check if it is actually working.

It might be worth just trying to build it on top of C, C++ or Rust to see what better works for C#. I'll keep you posted. I do not have much time this weekend, but next week I will reserve some time for that.

@VivekPanyam
Copy link
Owner

That makes sense. The goal is to end up with an interface that feels like it "belongs" in C#, not something that's clearly bindings for a library written in another language.

If you can do that in a completely or mostly generated way, that's great!

But we shouldn't compromise on the quality of the interface just so we can automatically generate bindings (not that you suggested that; just wanted to make priorities clear :) ).

Thank you for spending time on this!

@VivekPanyam
Copy link
Owner

It might be worth just trying to build it on top of C, C++ or Rust to see what better works for C#. I'll keep you posted. I do not have much time this weekend, but next week I will reserve some time for that.

Did you get a chance to explore this more? (No worries if not; just curious if you came to any interesting conclusions about the different approaches :) )

@bramhoven
Copy link
Author

Quick updated on the progress! Due to some personal things I haven't been able to spend as much time as I wanted on this matter, but progress is moving forward.

I have been trying to get a full test flow working with just the bindings. I have tried generating them with C2CS. To do that I used the carton.h file which was generated by the carton-bindings-c project. Most of the bindings generated by C2CS do work, but for some reason the carton_infer method is not working yet. I am still trying to fix that.

I still wanna try csbindgen to generate the bindings to see if that works and maybe try writing the bindings myself. On of them should at leas work :)

When I have fully tested the bindings and can validate them consistently I wanna start on the code to actually make the package usable and look like idiomatic C# code :)

@bramhoven
Copy link
Author

Finally got the bindings working! 🎉

I have tried a few options for generating the bindings, including writing them myself. Writing them myself did work, but finally settled on c2cs since it also provides some extra stuff for handling strings which is a nice to have.

The automated generator I've tried are:

I first tried to generate the bindings from the same Rust source as carton-binding-c, but for some reason csbindgen did not like the lib.rs only having mod's. So that did not generate any bindings.
Like I said I settled on c2cs and generating bindings from the c header file generated with carton-bindings-c which worked like a charm.

I am now going to work on the rest of the C# project to make it idiomatic and easily usable.

@bramhoven bramhoven mentioned this issue Nov 26, 2023
@bramhoven
Copy link
Author

I am struggeling a bit with the different types of tensors and how to use the for inferring and getting the output from the result. I found the library https://github.com/SciSharp/Tensor.NET which has Tensor's for dotnet, but I cannot combine multiple types of tensors into a single list. This would hinder the use of multiple types of tensors for a single model like for: https://carton.pub/stabilityai/sdxl

I could continue to extend the Tensors I have made so far to include more types and operations. That would take a while longer to implement since I currently only have a scalar string tensor.

Any ideas?

@VivekPanyam
Copy link
Owner

Under the hood, the storage of all numeric tensors should be the same (a buffer, shape, strides, and datatype). This means you could have a single numeric tensor implementation. You could then have a subclass or a wrapper that uses generics to provide typed views of this data (a uint32 tensor or a int8 tensor for example).

String tensors will probably have to be implemented separately, but all the numeric tensor types should work as described above.

Once you have these, you could create a Tensor base class or interface that both string tensors and numeric tensors implement or extend. This should let you store tensors together in collections regardless of type.

The base class could have a method that lets the bindings internally ask what the concrete tensor type is (string or numeric) and downcast so we can handle each one appropriately.

How does that sound?

@bramhoven
Copy link
Author

That sounds good! Around the same I had in mind.

I will find some time this week to start implementing the Tensors

@bramhoven
Copy link
Author

@VivekPanyam do you have an example of a model that does not require that much vram and works with numeric tensors? I was trying to get bark and sdxl to work, but my dev laptop only has 4gb of vram in the gpu so that dit not go so well. I could try it on my desktop but that runs windows atm, so that requires a bit more work.

@bramhoven
Copy link
Author

Nevermind, I think I can start with bert-base-uncased with the max_tokens input

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new language Adding support for a new programming language
Projects
None yet
Development

No branches or pull requests

2 participants