Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async API #142

Open
davidbrochart opened this issue Sep 22, 2023 · 7 comments
Open

Async API #142

davidbrochart opened this issue Sep 22, 2023 · 7 comments

Comments

@davidbrochart
Copy link
Collaborator

Since CRDTs can be CPU-intensive, I'm wondering if we could run all the Rust code in a separate thread, and have an async Python API that would not block while waiting for the CRDT operations to complete.
For instance:

async with doc.begin_transaction() as txn:
    await ytext.extend(txn, "foo")
    ...

delta = await encode_state_as_update(doc)
await apply_update(other_doc, delta)

I know that Ypy doesn't support multi-threading, but here all the Rust code would run in the same thread (but this would not be the main Python thread).
I think this would be a nice performance gain on multi-core CPUs. On single-core CPUs the non-async API would be a better choice.

@Horusiath
Copy link

Cannot be the same done via Python threads? Just put py doc in background thread and submit requests via queue.

@davidbrochart
Copy link
Collaborator Author

davidbrochart commented Sep 23, 2023

Except that Ypy holds the GIL, which prevents the thread from running in parallel.
For instance, the following code shows no more than 100% CPU:

from threading import Thread
import y_py as Y

def main():
    doc = Y.YDoc()
    text = doc.get_text("text")

    while True:
        with doc.begin_transaction() as txn:
            text.extend(txn, "foo")

        with doc.begin_transaction() as txn:
            text.delete_range(txn, 0, 3)

t = Thread(target=main)
t.start()

while True:
    pass

@Horusiath
Copy link

Horusiath commented Sep 27, 2023

Btw. I was thinking, that this would be a pain in the ass if we were to implement it on every method available. However if we would limit ourselves to a subset of operations ie. sync messages passed over the network and register for changes, then we could create a something like an Archive storing documents and performing operations on them directly using Rust thread pool.

The purpose of such archive is to serve on the server side as a hub, with its own multi-threaded document dispatch and update broadcast. It could even implement something like LRU cache and potentially load docs from disk on demand when they are touched and unload the least frequently used - releasing memory - when the resources are getting thin.

It could be implemented as a feature flag in y-sync crate and pulled from there. This crate already provides an utility methods for managing update broadcasts.

@davidbrochart
Copy link
Collaborator Author

For now I'd just like to achieve parallelism in Rust-only code, since Yrs should allow it, but I'm facing issues. I opened https://github.com/davidbrochart/pycrdt/pull/6 to illustrate the problem, it would be great if you could look at it.

@junoriosity
Copy link

@davidbrochart It would be terrific to get to know whether there has been any progress on that. 🙂

@davidbrochart
Copy link
Collaborator Author

I'm working on pycrdt now. I'm thinking about an async API but not for performances, just to provide better integration with async frameworks.

@Horusiath
Copy link

Regarding issue mentioned in pycrdt repo - I'll be fixing this part: we already talked about it with Lucio and Sebastian last week. Core issue is that y-types are using wrappers around raw pointers underneath, which Rust considers unsafe in many contexts. I'm going to replace them with atomic ref-counted pointers which don't have this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants