[ZEPPELIN-1423] Allow users to specify pre/post-execute callbacks for interpreters #1439

agoodm · 2016-09-19T23:19:42Z

What is this PR for?

One feature built-in to Jupyter's ipykernel that is currently missing in Zeppelin are pre/post-execute "callbacks". This PR allows for the possibility of entering code that is set to be executed by the interpreter before the code entered into the paragraph (pre-execute) and after (post-execute). In addition to saving some users the effort of writing code that they want executed in every paragraph, this will help pave the way for more advanced features coming soon, like automatically displaying matplotlib figures without an explicit call to a show() function. See the JIRA issue for more details on this.

The core implementation for this is primarily contained in a new class called InterpreterCallbackRegistry which heavily mimics the Angular Display system (eg, AngularObjectRegistry). While this interface alone should suffice for Interpreter maintainers on the Java/JVM side, I also thought it might be useful to expose the system directly to the users in their notebooks. Hence, an implementation that works with the spark and pyspark interpreters (via the ZeppelinContext class) is also included here.

What type of PR is it?

New Feature

Todos

- Add unit tests
- Add new documentation

(Maybe) - Find a better way to check for errors in pre-execute as this can brick the REPL (see more info below)

What is the Jira issue?

ZEPPELIN-1423

How should this be tested?

In a new note, add the following lines of code to a paragraph:

%pyspark
z.registerCallback("post_exec", "print 'This code should be executed before the parapgraph code!'")
z.registerCallback("pre_exec", "print 'This code should be executed after the paragraph code!'")

Then run any other paragraph in the note containing some other code, eg

%pyspark
print "This code should be entered into the paragraph by the user!"

The output should be:

This code should be executed before the paragraph code!
This code should be entered into the paragraph by the user!
This code should be executed after the paragraph code!

You should also test out the other two methods (getCallback() and unregisterCallback()) specified in ZeppelinContext.java.

One final caveat that should be mentioned: If there are errors in the code you specify for a pre-execute event, it will render the interpreter useless since the current implementation prepends the the code specified in pre_exec directly to the paragraph entered code before calling interpret(). The current workaround for this would be to either restart the interpreter group or call unregisterCallback() via a different REPL within the interpreter group (eg, z.unregisterCallback("pre_exec", "pyspark") from the spark interpreter). I would appreciate if anyone here would be willing to share any better approaches here.

Questions:

Does the licenses files need update? No
Is there breaking changes for older versions? No
Does this needs documentation? Yes

AhyoungRyu · 2016-09-20T03:39:14Z

...nterpreter/src/main/java/org/apache/zeppelin/interpreter/thrift/RemoteApplicationResult.java

 */
 /**
- * Autogenerated by Thrift Compiler (0.9.2)
+ * Autogenerated by Thrift Compiler (0.9.3)


Hi @agoodm :)
could you regenerate these thrift-generated files with 0.9.2as the other thrift files did?

@AhyoungRyu Done.

agoodm · 2016-09-20T21:18:24Z

Ok, TODO's are now done. @AhyoungRyu @Leemoonsoo @bzz Please review, I myself am unsure if some of the errors I am seeing in some of the Travis CI builds are related to these changes.

felixcheung · 2016-09-23T03:17:27Z

spark/src/main/resources/python/zeppelin_pyspark.py

    checkedIterables = self.z.checkbox(name, defaultCheckedIterables, optionIterables)
    return gateway.jvm.scala.collection.JavaConversions.asJavaCollection(checkedIterables)
+
+  def registerCallback(self, event, cmd, replName="pyspark"):


style? space in replName = "pyspark"
also, not sure if we should hardcode 'pyspark' here - this is configurable

@felixcheung The space is omitted in the optional argument in order to comply with PEP8, see here.

The main purpose of defaulting this argument to "pyspark" is solely for convenience, ie so you could easily call registerCallback() without specifying the REPL name manually every time it is called.

felixcheung · 2016-09-23T07:15:14Z

That's great - would you like to update the rest of the file to make the style consistent?

I understand the default value - I'm only pointing out that there are many users and use cases out there that do not use "pyspark" as name?

agoodm · 2016-09-23T07:50:55Z

@felixcheung Sure thing, I can do that in a bit.

As for the issue of the default value for replName, are you simply referring to the use case for different interpreters, or some variants of pyspark itself (eg, %<something>.pyspark)? If it's the latter, then I do agree that there should be a more elegant way to pick the default value in case the spark interpreter group is not set as the default, so any good ideas on how to go about this are appreciated. If you are just talking about how this is implemented on the scala (%spark) side though, I implemented a similar convenience method in the original ZeppelinContext class, but instead assumes "spark" is the default replName.

felixcheung · 2016-09-23T17:33:46Z

right, I was referring when the interpreter group is not the default, or there is multiple etc..

Added support for pre/post-execute callback hooks to interpreters

agoodm · 2016-09-23T22:30:29Z

@felixcheung Ok, I have changed the default behavior to now automatically detect the REPL name from the paragraph input text. The Travis CI build isn't running after I rebased my commits to resolve a merge conflict.

doanduyhai · 2016-09-24T11:46:18Z

Question, how interpreters other than Spark can leverage this new feature ? I mean you added a new registerCallback() function for ZeppelinContext object but this context is not exposed for shell or angular interpreter for example. How can people use the feature in this case ?

agoodm · 2016-09-24T15:55:06Z

@doanduyhai To be precise, I designed the callback registry system with two different types of use cases. The first one is for the different interpreter maintainers to implement in their individual subclasses of Interpreter, which would be used for use cases that are specific for better integrating certain features / libraries of that interpreter with zeppelin. Right now a prime example of this is matplotlib integration with python / pyspark as I have discussed in more detail in the JIRA issue.

The second set of use cases are those in which the API is directly exposed to notebook users. Right now this is indeed only fully available in spark / pyspark via the ZeppelinContext class. The reason for this is that getting this sort of interface to work with every interpreter would require some sort of language specific API that can be used to expose the JVM (as py4j does with python), hence one reason I labeled it as experimental in the docs. It's a similar limitation to the angularBind() and angularWatch() API. Keep in mind though that the registry can still be applied for other interpreters if the replName is manually specified, eg:

%spark
z.registerCallback("post_exec", "ls -l", "sh")

Does this help clarify things?

doanduyhai · 2016-09-24T17:54:27Z

Yes it's very clear. Indeed a bunch of internal features (Angular registry, Callback registry ...) are exposed to end-users only via Spark interpreter. I guess that each interpreter implementor have to find a way to expose these features using their own DSL

Leemoonsoo · 2016-09-25T18:39:26Z

@agoodm Thanks for the contribution.

While documentation says "Interpreter Execution Callbacks" is experimental (which i agree), i think @Experimental annotation should be used for API instead of @ZeppelinApi.

The implementation keeps callback registry in ZeppelinServer side. But while it's volatile and per interpreter, I wonder if callback registry can be placed inside of Interpreter process, that might reduce complexity (no thrift communication) and increases flexibility (able to register actual function, not only the strings). What do you think? Could you elaborate these two choices?

Use Experimental annotation for callback registry interface

agoodm · 2016-09-25T20:36:56Z

@Leemoonsoo Thanks for the feedback.

Indeed I think your first point on the Experimental annotation made sense, so I have changed it.

As for your second point, I am not really sure how to best go about it myself. When I first created the JIRA issue, I came up with a quick implementation that used kept separate registries for each Interpreter instance. It was certainly simpler, but the main problem that I ran into was that I couldn't figure out how to make sure those changes in the registry entered by the user were broadcasted to the ZeppelinServer side. I studied the implementation of AngularObjectRegistry in order to understand how this could be done in Zeppelin, and that is ultimately what led me to this current implementation which uses thrift. I am still in the process of learning the core Zeppelin codebase, so if you had any specific ideas, I would appreciate it.

Also, can you also be specific on how this change would make it possible to pass actual references to functions instead of code strings (hence why I used put "callbacks" in quotes in the original PR)? I don't consider this to be too important but am just curious.

Finally, can you provide some insight for some of the odd CI build failures I have been experiencing with more recent commits?

Thanks!

Leemoonsoo · 2016-09-25T21:49:21Z

When callback registry places in ZeppelinServer process and want to let user register callback routine from the interpreter by exposing user callback registry api, then callback routine need to be serialized from Interpreter to send to the ZeppelinServer. That's why i think you send string representation of code, which is callback routine, through thrift message.

If callback registry places inside of Interpreter instance, Interpreter implementation can directly access callback registry. So callback routine does not need to be serialized. That means not only string representation of code, but also some java object (runnable object) can be registered into the registry. And then RemoteInterpreterServer] can run callback routines before and after it invokes Interpreter.interpret() method.

So ZeppelinServer doesn't really need to know callback registry, i think. Or are there reasons and use cases that changes of callback registry need to be broadcasted to ZeppelinServer?

agoodm · 2016-09-26T05:04:48Z

Thank you very much for the detailed explanation. To be precise, I made the cmd argument that gets registered a String solely because Interpreter.interpret() accepts a String representation of the code rather than actual callable objects. I think actually being able to do that (eg, use some API like z.registerCallback("post_exec", callback_func) rather than z.registerCallback("post_exec", "callback_func()")would depend on the language that's being interpreted and what sort of libraries are available to connect it to the JVM (eg py4j with python).

Nevertheless I think you have made a very compelling point. The source of my problem with my first implementation of the registry system using the Interpreter class was that I only considered attempting an implementation of pre/post execute from a Paragraph execution perspective. This is why I thought that I needed to broadcast changes in the registry to the ZeppelinServer, as I processed the registry from within the Paragraph class rather than from the RemoteInterpreterServer class. As a result of your explanation I think I understand everything a lot better now, so I'll go ahead and work on an implementation tomorrow which works exclusively through the interpreter process.

agoodm · 2016-09-29T00:33:04Z

Going to close this for now since I have made another implementation on a separate branch.

…erpreters ### What is this PR for? See #1439. This is a second attempt based on prior feedback, particularly from Leemoonsoo who advised that it would be better to contain all of the code inside of the Interpreter Process. Aside from this, the major changes since the previous PR are: - The terminology "hooks" is used instead of "callbacks". Since the information being passed to the interpreter isn't actually a "callback", strictly speaking, the more general term "hook" is probably a better choice. All variable / class names were updated correspondingly. - The registry is now keyed by class name rather than the `replName` specified by the magic on top of the paragraph. The interface provided by `ZeppelinContext` allows for a `replName` to be converted to `className` through an internal mapping. - Two new event codes, `PRE_EXEC_DEV` and `POST_EXEC_DEV` have been added. This is primarily intended to separate use cases for the interpreter maintainers (via subclasses of `Interpreter`) and the notebook users (via the `ZeppelinContext` class), as otherwise user attempts at registering a hook would overwrite that set by the interpreter maintainer. - Global scope for hook registration is supported for the developer use cases. ### What type of PR is it? New Feature ### What is the Jira issue? [ZEPPELIN-1423](https://issues.apache.org/jira/browse/ZEPPELIN-1423) ### How should this be tested? In a new note, add the following lines of code to a paragraph: ```python %pyspark z.registerHook("post_exec", "print 'This code should be executed before the paragraph code!'") z.registerHook("pre_exec", "print 'This code should be executed after the paragraph code!'") ``` Then run any other paragraph in the note containing some other code, eg ```python %pyspark print "This code should be entered into the paragraph by the user!" ``` The output should be: ``` This code should be executed before the paragraph code! This code should be entered into the paragraph by the user! This code should be executed after the paragraph code! ``` You should also test out the other two methods (`getCallback()` and `unregisterCallback()`) specified in `ZeppelinContext.java`. One final caveat that should be mentioned: If there are errors in the code you specify for a pre-execute event, it will render the interpreter useless since the current implementation prepends the the code specified in `pre_exec` directly to the paragraph entered code before calling `interpret()`. The current workaround for this would be to either restart the interpreter group or call `unregisterHook()` via a different REPL within the interpreter group (eg, `z.unregisterHook("pre_exec", "pyspark")` from the spark interpreter). I would appreciate if anyone here would be willing to share any better approaches here. ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? Yes Author: Alex Goodman <[email protected]> Closes #1470 from agoodm/ZEPPELIN-1423v2 and squashes the following commits: 56ede60 [Alex Goodman] Automatically detect default interpreter for registerHook() 044a99d [Alex Goodman] Ensure that registered hooks are applied after call to open() 1331fe1 [Alex Goodman] Update interpreters.md 07cac65 [Alex Goodman] Implemented user-defined hook registry system for spark/pyspark interpreters 8fad936 [Alex Goodman] Added Interpreter Hooks to Interpreter Process

…erpreters ### What is this PR for? See apache#1439. This is a second attempt based on prior feedback, particularly from Leemoonsoo who advised that it would be better to contain all of the code inside of the Interpreter Process. Aside from this, the major changes since the previous PR are: - The terminology "hooks" is used instead of "callbacks". Since the information being passed to the interpreter isn't actually a "callback", strictly speaking, the more general term "hook" is probably a better choice. All variable / class names were updated correspondingly. - The registry is now keyed by class name rather than the `replName` specified by the magic on top of the paragraph. The interface provided by `ZeppelinContext` allows for a `replName` to be converted to `className` through an internal mapping. - Two new event codes, `PRE_EXEC_DEV` and `POST_EXEC_DEV` have been added. This is primarily intended to separate use cases for the interpreter maintainers (via subclasses of `Interpreter`) and the notebook users (via the `ZeppelinContext` class), as otherwise user attempts at registering a hook would overwrite that set by the interpreter maintainer. - Global scope for hook registration is supported for the developer use cases. ### What type of PR is it? New Feature ### What is the Jira issue? [ZEPPELIN-1423](https://issues.apache.org/jira/browse/ZEPPELIN-1423) ### How should this be tested? In a new note, add the following lines of code to a paragraph: ```python %pyspark z.registerHook("post_exec", "print 'This code should be executed before the paragraph code!'") z.registerHook("pre_exec", "print 'This code should be executed after the paragraph code!'") ``` Then run any other paragraph in the note containing some other code, eg ```python %pyspark print "This code should be entered into the paragraph by the user!" ``` The output should be: ``` This code should be executed before the paragraph code! This code should be entered into the paragraph by the user! This code should be executed after the paragraph code! ``` You should also test out the other two methods (`getCallback()` and `unregisterCallback()`) specified in `ZeppelinContext.java`. One final caveat that should be mentioned: If there are errors in the code you specify for a pre-execute event, it will render the interpreter useless since the current implementation prepends the the code specified in `pre_exec` directly to the paragraph entered code before calling `interpret()`. The current workaround for this would be to either restart the interpreter group or call `unregisterHook()` via a different REPL within the interpreter group (eg, `z.unregisterHook("pre_exec", "pyspark")` from the spark interpreter). I would appreciate if anyone here would be willing to share any better approaches here. ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? Yes Author: Alex Goodman <[email protected]> Closes apache#1470 from agoodm/ZEPPELIN-1423v2 and squashes the following commits: 56ede60 [Alex Goodman] Automatically detect default interpreter for registerHook() 044a99d [Alex Goodman] Ensure that registered hooks are applied after call to open() 1331fe1 [Alex Goodman] Update interpreters.md 07cac65 [Alex Goodman] Implemented user-defined hook registry system for spark/pyspark interpreters 8fad936 [Alex Goodman] Added Interpreter Hooks to Interpreter Process

AhyoungRyu reviewed Sep 20, 2016

View reviewed changes

felixcheung reviewed Sep 23, 2016

View reviewed changes

agoodm force-pushed the ZEPPELIN-1423 branch from d579c5c to fc766a9 Compare September 23, 2016 17:32

agoodm added 8 commits September 23, 2016 14:27

First attempt at adding pre/post-execute callbacks

eb26886

Added support for pre/post-execute callback hooks to interpreters

Added support for user defined callbacks in spark/pyspark interpreters

f50ecca

Deleted unused files

1594dfb

Regenerated thrift files using v0.9.2

e1ba95b

Added unit tests for InterpreterCallbackRegistry

b8265fe

Cleaned up ZeppelinContext callback registry API

3e7c852

Added documentation for callback registry system

5822ecd

Get default replName for callback registry from paragraph

ffc02f7

agoodm force-pushed the ZEPPELIN-1423 branch from 6329a6f to ffc02f7 Compare September 23, 2016 21:29

agoodm closed this Sep 24, 2016

agoodm reopened this Sep 24, 2016

Use @experimental annotation for callback registry interface

7b41a5e

Use Experimental annotation for callback registry interface

agoodm force-pushed the ZEPPELIN-1423 branch from d46d293 to 7b41a5e Compare September 25, 2016 20:18

agoodm closed this Sep 29, 2016

agoodm mentioned this pull request Sep 29, 2016

ZEPPELIN-1423 - Allow users to specify pre/post-execute hooks for interpreters #1470

Closed

[ZEPPELIN-1423] Allow users to specify pre/post-execute callbacks for interpreters #1439

[ZEPPELIN-1423] Allow users to specify pre/post-execute callbacks for interpreters #1439

Uh oh!

Conversation

agoodm commented Sep 19, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is this PR for?

What type of PR is it?

Todos

What is the Jira issue?

How should this be tested?

Questions:

Uh oh!

AhyoungRyu Sep 20, 2016

Choose a reason for hiding this comment

Uh oh!

agoodm Sep 20, 2016

Choose a reason for hiding this comment

Uh oh!

agoodm commented Sep 20, 2016

Uh oh!

felixcheung Sep 23, 2016

Choose a reason for hiding this comment

Uh oh!

agoodm Sep 23, 2016

Choose a reason for hiding this comment

Uh oh!

felixcheung commented Sep 23, 2016

Uh oh!

agoodm commented Sep 23, 2016

Uh oh!

felixcheung commented Sep 23, 2016

Uh oh!

agoodm commented Sep 23, 2016

Uh oh!

doanduyhai commented Sep 24, 2016

Uh oh!

agoodm commented Sep 24, 2016

Uh oh!

doanduyhai commented Sep 24, 2016

Uh oh!

Leemoonsoo commented Sep 25, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

agoodm commented Sep 25, 2016

Uh oh!

Leemoonsoo commented Sep 25, 2016

Uh oh!

agoodm commented Sep 26, 2016

Uh oh!

agoodm commented Sep 29, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

agoodm commented Sep 19, 2016 •

edited

Loading

Leemoonsoo commented Sep 25, 2016 •

edited

Loading