Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROPOSAL] Kernel Provisioning #608

Closed
kevin-bates opened this issue Jan 18, 2021 · 16 comments · Fixed by #612
Closed

[PROPOSAL] Kernel Provisioning #608

kevin-bates opened this issue Jan 18, 2021 · 16 comments · Fixed by #612
Assignees

Comments

@kevin-bates
Copy link
Member

kevin-bates commented Jan 18, 2021

This issue introduces a proposal named Kernel Provisioning. Its intent is to enable the ability for third-parties to provision the kernel's runtime environment within the current framework of jupyter_client's kernel discovery and lifecycle management.

Problem

The jupyter_client package currently provides a kernel manager class (KernelManager) to control the lifecycle of the kernel process. Lifecycle-action methods supported from a kernel manager include start_kernel, shutdown_kernel, interrupt_kernel, restart_kernel , and is_alive. All of these methods interact with the kernel process - which is a Popen subprocess - to monitor and control its lifecycle. For example,

  • start_kernel creates the Popen instance and stores that instance in the kernel manager's kernel attribute.
  • shutdown_kernel is implemented to leverage Popen's kill() and terminate() methods (depending on urgency).
  • interrupt_kernel calls Popen's send_signal() method (or sends a message if message-based interrupts are configured).
  • While is_alive is based on Popen's poll() method.
  • For completeness, restart_kernel is a combination of shtudown_kernel and start_kernel.

Today, applications that wish to launch kernels beyond those of a local Popen process (for example, into resource-managed clusters or leverage container-based environments) must instead implement their own KernelManager subclass. This introduces a number of issues:

  1. KernelManager is an application-level class. That is, functionality related to the application - across all kernels - are implemented via the kernel manager. Applications such as Notebook extend this class to allow for activity monitoring functionality, for example.
  2. Applications (e.g., Notebook, NBClient, etc) enable the ability to "bring your own" kernel manager. Because KernelManager is an application-level class, such kernel manager implementations must be a subclass of KernelManager and are kernel-specification agnostic. That is, the same kernel manager instance must manage the lifecycles of Python, R, C++ kernels, as well as kernels launched into resource-managed clusters - which is not possible via a Popen subprocess instance. However, support for the latter types of kernels requires interactions with more than just the kernel process. For example, kernel locations must be discovered within the resource-managed cluster using the resources manager's API and terminated in a similar manner - allowing the resource manager to release resources, update scheduling, etc (examples of such resource managers are Hadoop Yarn or Kubernetes). As a result, a single kernel manager cannot address the needs of the various configurations in which users want their kernels to operate.
  3. Support for highly demanded features such as parameterized kernels cannot be sustainably implemented because
    a) a given kernel manager instance cannot know about what parameters apply to all kernels and
    b) a majority of kernel parameters affect the kernel's runtime environment and, therefore, must be applied prior to the kernel's actual launch.

In essence, what is needed is the ability to associate a kernel's lifecycle management to the kernel's specification, where its environment and parameters are defined, while leaving kernel manager implementations to be the responsibility of the application.

Proposed Enhancement

This proposal abstracts the kernel process layer within the existing KernelManager implementation thereby providing the ability to create custom kernel environments across all Jupyter applications that use jupyter_client today.

In today's implementation, the Popen instance is returned by the KernelManager's _launch_kernel() method. Upon return, the method sets the manager's kernel attribute to the Popen instance, after which all lifecycle-related methods will call through to interact with the kernel process.

Instead, this proposal will introduce a layer or wrapper around the Popen instantiation such that this class instance (let's call it PopenProvisioner for now) will contain the Popen instance and return itself from the _launch_kernel() method. Because the method signatures of the PopenProvisioner will be identical to those of Popen, the kernel's process management will operate just like today. (Note that Jupyter Enterprise Gateway takes this approach with its process proxies, but this solution is limited to the EG application not generally available to the ecosystem.)

Of course, PopenProvisioner will derive from a base class that defines the various methods. These methods will look similar to the following:

class KernelProvisionerBase(LoggingConfigurable):
    """Base class defining methods for Kernel Provisioner classes.

       Theses methods model those of the Subprocess Popen class:
       https://docs.python.org/3/library/subprocess.html#popen-objects
    """
    def poll(self) -> [int, None]:
        """Checks if kernel process is still running.

         If running, None is returned, otherwise the process's integer-valued exit code is returned.
         """
        pass

    def wait(self, timeout: Optional[float] = None) -> [int, None]:
        """Waits for kernel process to terminate.  As a result, this method should be called with
        a value for timeout.

        If the kernel process does not terminate following timeout seconds, a TimeoutException will
        be raised - that can be caught and retried.  If the kernel process has terminated, its
        integer-valued exit code will be returned.

        """
        pass

    def send_signal(self, signum: int) -> None:
        """Sends signal identified by signum to the kernel process."""
        pass

    def kill(self) -> None:
        """Kills the kernel process.  This is typically accomplished via a SIGKILL signal, which
        cannot be caught.
        """
        pass

    def terminate(self) -> None:
        """Terminates the kernel process.  This is typically accomplished via a SIGTERM signal, which
        can be caught, allowing the kernel process to perform possible cleanup of resources.
        """
        pass

The class will also define other methods for its initialization, launch, cleanup, etc. In addition, these methods will be created with planned support for parameterized kernel launches - since, realistically speaking, a majority of parameters affect the kernel process's environment.

We can decide whether the base class should be abstract (probably) or not along with which methods are abstract themselves as we near implementation.

jupyter_client will provide the default KernelProvisioner implementation (e.g., PopenProvisioner) such that all existing kernels that do not specify a kernel provisioner will utilize an instance of the default class. In addition, this default will be configurable in case a given installation wishes to use a different provisioner for all kernels in which one is not currently specified.

Discovery

As noted in the problem statement, we need the ability to associate a kernel's lifecycle management (i.e., its process abstraction instance) to the kernel's specification. It is not sufficient to rely on a single abstraction instance across all configured specifications. However, because this proposal should not affect existing installations using standard kernel specifications, this only becomes an issue when explicit abstractions (i.e., those not based on the default) are necessary.

To explicitly indicate a kernel environment provisioner, one would configure the corresponding kernel specification to include an environment_provisioner stanza within the metadata stanza, similar to the following...

  "metadata": {
    "environment_provisioner": {
      "class_name": "my.provisioner.SlurmProvisioner",
      "config": {
      }
    }
  },

The KernelManager instance, with access to the KernelSpecManager, will check for the existence of such a stanza and instantiate the class associated with that's stanza's class_name entry. Should the stanza not exist, the default provisioner will be instantiated and used. Should the configured class name not be available, an exception will be raised, thereby failing the startup of the kernel. (I view this as better than deferring to the configured default provisioner since the specification's configuration stanza probably won't apply to that provisioner, etc.)

The config stanza will be passed to the provisioner's initializer and consist of configuration settings pertaining to the provisioner and its subclasses. We should also leverage whatever config-related functionality traitlets provide (assuming provisioners are subclasses of LoggingConfigurable).

Provisioner Responsibilites

Once launched, the kernel process's lifecycle-management will then be the responsibility of the instantiated provisioner. The provisioner will also be responsible for:

  • Definition and consumption of provisioner-specific parameters that apply to the kernel process's environment. This includes a chance to apply substitutions into the startup command string.
  • Provisioning of the kernel's connection. The provisioned connection information will be accessible to the KernelManager at which time it can be persisted for use in collaboration, etc.

Impact on existing implementations

If no environment provisioners are configured, there is no impact on existing implementations. They will continue to work, just like today. The difference will be that when the appropriate version of jupyter_client is installed, interaction with the kernel's process will go through an additional (nearly pass-thru) layer.

In addition, existing implementations will be able to leverage parameterized kernel launches, once available and, if kernel provisioners are configured, be able to leverage their offerings immediately.

When environment provisioners are configured, any kernel specifications they provide will be immediately available to applications.

No additional packages will be necessary - all functionality is baked into jupyter_client - and the previously installed KEP provisioning package.

Existing KernelManager subclasses

By embracing jupyter_client and its KernelManager class, this proposal doesn't introduce any migration issues and most subclasses of KernelManager should continue to work. Note that some KernelManager subclasses that completely override lifecycle-action methods will not be able to leverage this functionality - but that's their intent in the first place.

What applications subclass KernelManager today? I know that Enterprise Gateway already provides its own process abstraction via a subclass of KernelManager, and will need to coordinate with appropriate jupyter_client releases once implemented (but I have an inside scoop on that repo 😄 ).

Should I post this question to the Jupyter Google Group, Discourse, anywhere else? I know that nb_conda_kernels subclasses KernelSpecManager - as well as others - but they still leverage jupyter_client's KernelManager directly - so they should not be an issue.

Naming

Here are a few naming suggestions, some of which are more appropriate as a topic (e.g., provisioning) than an implementation (e.g., provider or provisioner).

  • Kernel Process Provider
  • Kernel Environment Provider
  • Kernel Provisioning/Provisioner
  • Kernel Environment Provisioning/Provisioner
  • Kernel Process Proxy (adopt Enterprise Gateway's terminology)
  • ???

Because this abstraction is contained within the existing KernelManager implementation, the Kernel in the name could be dropped as it's inferred.

I prefer Environment Provisioning as a topic and Environment Provisioner as an implementation name but really have no strong affinity to either and am open to suggestions. The acronym KEP could be used for abbreviations where necessary (where the 'K' for Kernel makes the inference explicit).

Alternate names for PopenProvisioner could be: JupyterClientProvisioner or GenericProvisioner. I suspect many custom provisioners will derive from this implementation.

I've gone ahead and cc'd folks with which I've shared these ideas. Please feel free to add anyone else you think might be interested.

cc: @blink1073, @echarles, @lresende, @Zsailer

@echarles
Copy link
Member

Thx a lot @kevin-bates This is a great proposal taking into account backwards compatibly that was I think missing from the previous attempts. I am planning some time to review, comment and contribute on this. cc/ @goanpeca @Carreau

@willingc
Copy link
Member

@MSeal @captainsafia FYI

@Zsailer
Copy link
Member

Zsailer commented Jan 20, 2021

@minrk @ellisonbg @rgbkrk, @takluyver too!

@willingc
Copy link
Member

cc/ @trallard FYI re: your new role at Quansight Labs

@ellisonbg
Copy link
Contributor

Thanks for the ping, I will have a look.

@MSeal
Copy link
Contributor

MSeal commented Jan 21, 2021

Great write-up @kevin-bates ! Thanks for putting that all together and incorporating feedback from prior kernel management change efforts here.

I'll drop a few comments here if it's helpful. Overall I think this is a solid proposal to move things forward and unite some efforts in a backwards compatible manner.

We can decide whether the base class should be abstract (probably) or not along with which methods are abstract themselves as we near implementation.

Yeah I agree and should probably be abstract.

Should the stanza not exist, the default provisioner will be instantiated and used.

Perhaps with a warning echo'd? I'd also be ok with erroring here unless an override is passed in to ignore the kernel defined provisioner.

If no environment provisioners are configured, there is no impact on existing implementations. They will continue to work, just like today. The difference will be that when the appropriate version of jupyter_client is installed, interaction with the kernel's process will go through an additional (nearly pass-thru) layer.

That's a really good way to introduce the change imo. Thanks for putting thought into this aspect of the change.adop

What applications subclass KernelManager today? I know that Enterprise Gateway already provides its own process abstraction via a subclass of KernelManager, and will need to coordinate with appropriate jupyter_client releases once implemented (but I have an inside scoop on that repo smile ).

I think most private Notebook products subclass KernelManager if they use jupyter_client. So while there's not public visibility to that impact that might dictate that the change be associated with a major version bump so consumers in those corners can know they should look at what changed / is new and how to adapt custom extensions to the new class pattern here.

Should I post this question to the Jupyter Google Group, Discourse, anywhere else? I know that nb_conda_kernels subclasses KernelSpecManager - as well as others - but they still leverage jupyter_client's KernelManager directly - so they should not be an issue.

Maybe also the gitter -- some folks still only ask / monitor there. I think moving most of this PR description into the docs as a new-to-version-x section / page would be a good idea as well.

Here are a few naming suggestions, some of which are more appropriate as a topic (e.g., provisioning) than an implementation (e.g., provider or provisioner).

Naming is hard ... how would ClientProvisioner sound since it's responsible for lifecycle the clients that the KernelManager has requested? I think your suggestions are also perfectly viable.

@kevin-bates
Copy link
Member Author

Thank you for your response and questions @MSeal - they are much appreciated. Here are some responses...

Should the stanza not exist, the default provisioner will be instantiated and used.

Perhaps with a warning echo'd? I'd also be ok with erroring here unless an override is passed in to ignore the kernel defined provisioner.

I was planning on logging an info message on each startup. Should the default provisioner be used because the kernelspec didn't specify a provisioner, I suppose we could add an indication of that. However, since that's the 90% use case because virtually every kernelspec will not specify a provisioner and there's nothing that should require folks to update their kernelspecs, I'm inclined to remain silent. I think it's important we maintain the status quo as much as we can.

Even when functionalities like parameterized kernels and kernel metric gathering are in place, folks could still get the benefits - albeit things like rich parameterization would likely require subclassing of the default provisioner.

Maybe also the gitter -- some folks still only ask / monitor there.

Good idea about gitter. I'll fire off a query or two shortly.

how would ClientProvisioner sound since it's responsible for lifecycle the clients that the KernelManager has requested?

I like this name over both PopenProvisioner and JupyterClientProvisioner. It's shorter than the latter and doesn't imply an implementation like the former. This would become the defacto provisioner that is both instantiated in the absence of a stanza and commonly subclassed for customizations that still rely on today's (local) kernels but where rich parameterizations can be applied.

I hope to have a minimal DRAFT PR soon so things are a bit more tangible. Thanks again.

@diurnalist
Copy link

👋🏻 I was notified about this via your message to the jupyter listserv, so great job on the wider communication :)

This is a wonderful proposal. I'm one of the people who did some work on custom subclasses of KernelManager for my own project -- what I was/am trying to build is what I refer to as a "hydra" kernel, a very wacky thing, which essentially is a proxy Kernel that can spawn additional kernels on-demand. My use case is I am writing/attempting to write some capability whereby each code cell can execute on a different remote host (our users are CS experimenters who are orchestrating research experiments across multiple provisioned hosts.)

From what I read here, this would be a welcome improvement as I'd have a more clear integration point when it comes to provisioning the kernel (I am targeting Ansible for this right now.)

👍🏻

@kevin-bates
Copy link
Member Author

Thank you Jason! This is exactly the kind of information that is extremely helpful.

I see you're overriding _launch_kernel() so I think you'd be fine with no changes - i.e., what you have should still work. Down the road, you could then look into moving your ansible-equivalent code into a subclass of the default ClientProvisioner (or whatever we name it) and would no longer need the Kernel Manager or having to keep track of changes to its superclasses.

@SylvainCorlay
Copy link
Member

Hey, thanks for opening this.

I have reservations about extending the kernelspec with Python-specific information such as

    "environment_provisioner": {
      "class_name": "my.provisioner.SlurmProvisioner",
      "config": {
      }
    }

By Python-specific, I mean that it is specific to jupyter_client which is written in Python, and makes it unlikely for such kernelspecs to be consumed from another (client-side) implementation of the kernel protocol, and even less likely if that implementation is written in another programming language.

For example, @JohanMabille is actively working on a new C++ client and Jupyter server, with the motivation of faster handling of websocket and zeromq messages by the server. It is very unlikely that his implementation will be able to make sense of "class_name": "my.provisioner.SlurmProvisioner".

I think that it makes a lot of sense for jupyter_client to have more abstract base classes for handling the life cycle of the kernel, but I don't know if the kernelspec is a place where the implementation of the provider should be specified. Would it make sense to have some kind of proxy kernels instead, fully complient with the current spec, and handling e.g. the slurm-based provisioning of the real kernel?

@kevin-bates
Copy link
Member Author

Hi @SylvainCorlay - thanks for raising your concern. It will be good to try to hash this out.

I have reservations about extending the kernelspec with Python-specific information

I recall this coming up in the handshaking proposal but never found this documented anywhere - although I completely understand your point. What is documented is this regarding the metadata stanza (in which environment_provisioner resides):

metadata (optional): A dictionary of additional attributes about this kernel; used by clients to aid in kernel selection. Metadata added here should be namespaced for the tool reading and writing that metadata.

So, although applications are free to ignore anything in the metadata stanza, applications are also free to add application-specific items. As a result, one solution would be to simply rename the stanza environment_provisioner to jupyter_client.environment_provisioner, but that strikes me as a bit argumentative (and I apologize).

Another approach that can get the python-specific class name out of the metadata.environment_provisioner stanza would be to replace it with a ProvisionerId (e.g., "provisioner_id": "slurm"). In this scenario, when jupyter_client consumes the environment_provisioner stanza, it can leverage Python entrypoints to "discover" available Python implementations of the provisioner. At the same time, a non-python based "jupyter_client" can use its own discovery mechanism to find the matching provisioner implementation in its language. With this approach, we could extend the KernelSpecManager such that a kernelspec that references a provisioner id that cannot be loaded (via entrypoints or another language's approach) will not be returned and made available for use. This would then avoid both a false sense that a provisioner-enabled kernelspec is available, when in fact it isn't and the deferred exception that would occur at kernel startup when the provisioner is found to not exist. Such kernelspecs read by language-incompatible applications (i.e., those in which the provisioner id cannot be found), would also be removed from the set of available kernelspecs (assuming those implementations of KernelSpecManager adhere to this convention).

Proposal Update: Unless strong objection, I would like to adopt this change for this proposal - replacing class_name with provisioner_id along with the behavior described above.

Please remember that the vast majority of kernelspecs will continue to not have an environment_provisioner stanza (or even a metadata stanza for that matter). Only kernelspecs, essentially installed by provisioner-based applications, will contain such stanzas.

Looking forward, the kernel parameter schema will also be specific to the provisioner, not the kernel, since parameters need to span both, and provisioners are kernel-aware anyway. As such, I believe this same stanza should contain the parameter schema or a reference to it.

@SylvainCorlay
Copy link
Member

Hey @kevin-bates sorry for the late reply, I thought I had done this already.

Regarding your remark that parameterized kernels cannot be sustainably implemented, I would disagree with it.

  • We can certainly have a field in the kernel spec listing all parameters and their types as a JSON schema. This information can be consumed to make forms in Jupyter front-end and or command-line tools.
  • Values of parameters can then be used as jinja values in the kernelspec like in the command line, environment variables etc.

@kevin-bates
Copy link
Member Author

Hi @SylvainCorlay - no worries.

  • We can certainly have a field in the kernel spec listing all parameters and their types as a JSON schema. This information can be consumed to make forms in Jupyter front-end and or command-line tools.

I completely agree. We'd definitely want a schema that describes the available parameters pertaining to that kernel specification and the kernel start-request body would contain appropriate name/value pairs. This schema would identify required values, specify the enumerated types (where applicable) and include default values - in particular, for any required properties.

  • Values of parameters can then be used as jinja values in the kernel spec like in the command line, environment variables etc.

I think this is too restrictive. Not all parameters can be plugged nicely into a jinja template. In fact, I think we'll find that a majority of the parameters involved apply to the kernel's environment. Parameters like which docker/singularity image to use, what Kubernetes namespace to create, or how many workers to specify for your Spark-based kernel(driver). All of these are examples in which the desired parameter is not associated with the kernel but, instead, with the environment in which it will run. Without kernel environment provisioning, support for these kinds of parameters (which I would argue will be the majority of parameters) cannot be implemented in a sustainable manner.

When we get to parameterization, I suspect we may need separate schemas - one that pertains directly to the kernel itself, in which command-line jinja and env values make sense because they are consumed directly by the kernel. But another pertaining to the provisioned environment in which the provisioner is responsible for interpreting and provisioning the desired environment in which the kernel will run.

@meeseeksmachine
Copy link

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/subclasses-of-jupyter-client-manager-kernelmanager/7793/1

@kevin-bates
Copy link
Member Author

Hi,

I'm encountering difficulties wrt b/c support for sub-classed Kernel Managers (although I suspect few are implemented). Given that, I think that Kernel Environment Provisioners should target a major release boundary (preferably 7.0), rather than a minor release. The release would include the following:

  • Removal of the deprecated KernelManager.cleanup() method. This is unrelated to provisioning.
  • Removal of the deprecated KernelManager.kernel_cmd trait. This would help clean things up and the deprecation has existed for 7 years.
  • Provisioners only apply to Async Kernel Management classes. Applications such as nbclient, voila and Enterprise Gateway have already switched to AsyncKM and jupyter_server will be doing so shortly. The async classes could still derive from the non-async classes (which we need for config b/c) except those methods in which provisioners are involved. This would allow us to plumb provisioners as async-enabled where we'll want that finer-grained cooperation.
    • Although orthogonal to provisioning, we should probably discuss deprecating the non-async classes and begin that transition. This will substantially reduce the support load and allow future development to be purely async.
  • I think the incorporation of the Kernel Handshaking Pattern would be great in the default provisioner (ClientProvisioner). This way, other provisioners can derive from that or might require differing launch/connection-management behaviors and can roll their own.

As far as backward compatibility support for the kernelspecs themselves, we could add a check in KernelSpecManager that ignores any kernel.json files that include an environment_provisioner stanza. This way, hosts that have both old and new installations (and a potential mixture of kernel specs) won't "see" (explicit) provisioner-enabled kernels as options. If this proposal is accepted, I would recommend we add this form of "filtering" now. Then, upon KEP's release, we'd decide whether filtering continued or discontinued by default.

(Note: We'd also want to add the same kind of check in the non-async KernelManager for those single-kernel applications that don't use a KernelSpecManager.)

I would like to proceed on #612 with these changes in mind unless there are objections (both relative to the above text and the proposal in general).

@kevin-bates kevin-bates changed the title [PROPOSAL] Kernel Environment Provisioning [PROPOSAL] Kernel Provisioning Feb 26, 2021
@kevin-bates
Copy link
Member Author

Updates

Rather than update the original description with changes, I thought I'd list them here to preserve the original proposal. (If others feel the issue's description should also hold the truth, I'm happy to apply these edits there.)

Naming:

  • In discussing this with @echarles, I'm changing the name of the proposal to Kernel Provisioning (with a noun of KernelProvisioner). We felt that environment is implied by provisioning and it's shorter and easier. The title of this issue (as well as the first paragraph) has been edited to reflect this change to avoid confusion.
  • The default implementation provisioner, referred to in the description as PopenProvisioner and later referenced as ClientProvisioner has been renamed LocalProvisioner. The term 'client' in the jupyter_client package has always struck me as odd. The use of PopenProvisioner felt too specific, while LocalProvisioner just feels like a good default. Since I believe this implementation will be commonly subclassed and the majority of kernels are local, this seems like a better name.
  • The metadata stanza object environment_provisioner has been renamed kernel_provisioner.

Release and implementation changes discussed in the previous comment still hold.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants