Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single, async-first code base #624

Open
davidbrochart opened this issue Mar 9, 2021 · 7 comments
Open

Single, async-first code base #624

davidbrochart opened this issue Mar 9, 2021 · 7 comments

Comments

@davidbrochart
Copy link
Member

davidbrochart commented Mar 9, 2021

In #623 (comment) we talked about using unasync to support sync/async in jupyter-client. The way it works is by having a single "source of truth", the async code, and automatically generate the sync code from it.
This would allow us to get rid of (manually) duplicated code, and it might be a better option than wrapping async code and run it until complete to get a blocking version, as we did in nbclient.
Any thought?

@davidbrochart davidbrochart changed the title Single, async-native code base Single, async-first code base Mar 9, 2021
@kevin-bates
Copy link
Member

Thanks for raising this David.

I totatlly agree with the sentiment of this and proposed similar in the Kernel Provisioning proposal:

Although orthogonal to provisioning, we should probably discuss deprecating the non-async classes and begin that transition. This will substantially reduce the support load and allow future development to be purely async.

In my very brief look at the unasync approach, it appears to impose a particular file structure and I'm not sure how this would impact class inheritance (IOLoopKernelManager and the restarters come to mind). I'm also a little skeptical that the sync-generated code is seamless.

Also, when we talk about deprecation and general evolution within the Jupyter ecosystem, the config system always seems to come into play, and by subclassing the sync classes with async versions, the async classes have been able to leverage the same configuration. As a result, I think we'd probably want to retain the class hierarchy for that purpose, yet essentially make the sync classes functionally abstract. (Not sure how formally that can be done without impacting traitlets.)

Another approach would be to bite the bullet and:

  1. duplicate the method implementations (perhaps introduce some mixin or functions - sorry, I'm not as familiar with the KernelClient methods)
  2. remove deferrals to the superclass from async classes - they become totally detached wrt implementation (not configuration)
  3. deprecate the sync class methods and
  4. be diligent that corresponding changes are made to both until the sync class methods can be removed. On occasion, we may find some changes only apply to one form of the classes.

Although this introduces a bit more support burden now, it leaves existing behaviors and custom subclasses functionally equivalent and is definitive (no magic happening or third-party behavior dependency).

Of course, I'm sure there are other ways to address this and I'm in favor of anything that gets us (ultimately) down to a single implementation. I just want to be sure we consider configuration and (to a lesser extent) custom subclasses as those directly impact our downstream clients.

@davidbrochart
Copy link
Member Author

Actually my proposal is not to get rid of the non-async classes, but to automatically generate them from the async classes. If configuration lives in e.g. KernelManager and AsyncKernelManager inherits from it, then yes configuration would have to move to AsyncKernelManager for KernelManager to have it.
This approach would work if the sync/async differences are well identified and the code is written in such a way that it minimizes these differences. I believe in our case these differences boil down to pyzmq's Context, which can be async or not (and the sleep function in time/asyncio). Some functions might have to be changed, e.g. the sync equivalent of async's ZMQSocketChannel.get_msg() is ZMQSocketChannel.get_msg(block=True), so we would need to align these APIs.
I'm not too worried about unasync imposing a particular file structure, this is actually configurable. And anything we need that wouldn't be supported, we can contribute upstream. One of them is generating the sync code on the fly (see python-trio/unasync#72).

@kevin-bates
Copy link
Member

Ok - thanks for the response. The goal for the configuration piece is that existing configuration file entries (e.g., c.KernelManager.transport = 'ipc') that are "visible" to AsyncKernelManager purely by virtue of inheritance, still continue to work. From what you describe, I think AKM and KM would each have their own copy of transport, thus requiring AKM users to update their configuration. Many folks viewed this kind of thing as a show-stopper when moving from notebook to jupyter_server. I'm not sure if the same sentiment holds for this user base given there are far less configurable traits.

This approach would work if the sync/async differences are well identified and the code is written in such a way that it minimizes these differences.

Again, I'm skeptical and I think we'll find that this restricts innovation by introducing a least-common-denominator approach. For example, I think it's unacceptable to ask kernel provisioner authors to have two implementations - just so the the sync version of kernel manager (whose clients are fewer each day) can use provisioning.

I just feel split/deprecate is the safer approach moving forward and gets us to a place of literally having one set of classes to support. But that's only my opinion.

@davidbrochart
Copy link
Member Author

Actually I found one strong reason to not go the unasync path, it is that even though a sync API looks like pure synchronous code externally, it can still take advantage of async internally. We have a good example of that in nbclient, where the logic is almost impossible to implement without async, and for sure an auto-generated sync code from this wouldn't make any sense.

@kevin-bates
Copy link
Member

Thanks for the reference David.

I just came back to add that I don't want to prohibit any kind of prototyping to occur since "touching" this kind of stuff is usually the best way to really see if it might help and better understand where the deficiencies might lie. I've just found that with this kind of tooling it's usually tough to address every need.

Another scenario I was curious about was - when some methods on the "sync" class are already async, Is there a mechanism for retaining the async decoration on the method (e.g., MappingKernelManager.start_kernel)?

Thanks again for starting this discussion - it's definitely worthwhile.

@minrk
Copy link
Member

minrk commented Mar 10, 2021

I think one hiccup for compatibility when using async -> sync via run_until_complete() is that the sync APIs are often used in async applications (most notably, the notebook server itself), where the simplest version of run_until_complete won't work because asyncio loops can't be nested. nest-asyncio may help here, or putting the run_until_complete in a background thread.

That's an implementation detail, though. I fully support the general approach.

@davidbrochart
Copy link
Member Author

davidbrochart commented Mar 10, 2021

Another scenario I was curious about was - when some methods on the "sync" class are already async, Is there a mechanism for retaining the async decoration on the method (e.g., MappingKernelManager.start_kernel)?

I guess it depends what we call "sync" or "async", in this case if the class has some async methods, is it still a sync class? You'll need an event loop to run this method, so this class is incompatible with a pure sync runtime, e.g. it couldn't run on python2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants