-
Notifications
You must be signed in to change notification settings - Fork 2.8k
[ZEPPELIN-1423] Allow users to specify pre/post-execute callbacks for interpreters #1439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| */ | ||
| /** | ||
| * Autogenerated by Thrift Compiler (0.9.2) | ||
| * Autogenerated by Thrift Compiler (0.9.3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @agoodm :)
could you regenerate these thrift-generated files with 0.9.2as the other thrift files did?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AhyoungRyu Done.
|
Ok, TODO's are now done. @AhyoungRyu @Leemoonsoo @bzz Please review, I myself am unsure if some of the errors I am seeing in some of the Travis CI builds are related to these changes. |
| checkedIterables = self.z.checkbox(name, defaultCheckedIterables, optionIterables) | ||
| return gateway.jvm.scala.collection.JavaConversions.asJavaCollection(checkedIterables) | ||
|
|
||
| def registerCallback(self, event, cmd, replName="pyspark"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style? space in replName = "pyspark"
also, not sure if we should hardcode 'pyspark' here - this is configurable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@felixcheung The space is omitted in the optional argument in order to comply with PEP8, see here.
The main purpose of defaulting this argument to "pyspark" is solely for convenience, ie so you could easily call registerCallback() without specifying the REPL name manually every time it is called.
|
That's great - would you like to update the rest of the file to make the style consistent? I understand the default value - I'm only pointing out that there are many users and use cases out there that do not use "pyspark" as name? |
|
@felixcheung Sure thing, I can do that in a bit. As for the issue of the default value for |
d579c5c to
fc766a9
Compare
|
right, I was referring when the interpreter group is not the default, or there is multiple etc.. |
Added support for pre/post-execute callback hooks to interpreters
6329a6f to
ffc02f7
Compare
|
@felixcheung Ok, I have changed the default behavior to now automatically detect the REPL name from the paragraph input text. The Travis CI build isn't running after I rebased my commits to resolve a merge conflict. |
|
Question, how interpreters other than Spark can leverage this new feature ? I mean you added a new |
|
@doanduyhai To be precise, I designed the callback registry system with two different types of use cases. The first one is for the different interpreter maintainers to implement in their individual subclasses of The second set of use cases are those in which the API is directly exposed to notebook users. Right now this is indeed only fully available in spark / pyspark via the %spark
z.registerCallback("post_exec", "ls -l", "sh")Does this help clarify things? |
|
Yes it's very clear. Indeed a bunch of internal features (Angular registry, Callback registry ...) are exposed to end-users only via Spark interpreter. I guess that each interpreter implementor have to find a way to expose these features using their own DSL |
|
@agoodm Thanks for the contribution. While documentation says "Interpreter Execution Callbacks" is experimental (which i agree), i think The implementation keeps callback registry in ZeppelinServer side. But while it's volatile and per interpreter, I wonder if callback registry can be placed inside of Interpreter process, that might reduce complexity (no thrift communication) and increases flexibility (able to register actual function, not only the strings). What do you think? Could you elaborate these two choices? |
Use Experimental annotation for callback registry interface
d46d293 to
7b41a5e
Compare
|
@Leemoonsoo Thanks for the feedback. Indeed I think your first point on the Experimental annotation made sense, so I have changed it. As for your second point, I am not really sure how to best go about it myself. When I first created the JIRA issue, I came up with a quick implementation that used kept separate registries for each Also, can you also be specific on how this change would make it possible to pass actual references to functions instead of code strings (hence why I used put "callbacks" in quotes in the original PR)? I don't consider this to be too important but am just curious. Finally, can you provide some insight for some of the odd CI build failures I have been experiencing with more recent commits? Thanks! |
|
When If callback registry places inside of So ZeppelinServer doesn't really need to know callback registry, i think. Or are there reasons and use cases that changes of callback registry need to be broadcasted to ZeppelinServer? |
|
Thank you very much for the detailed explanation. To be precise, I made the Nevertheless I think you have made a very compelling point. The source of my problem with my first implementation of the registry system using the |
|
Going to close this for now since I have made another implementation on a separate branch. |
…erpreters ### What is this PR for? See #1439. This is a second attempt based on prior feedback, particularly from Leemoonsoo who advised that it would be better to contain all of the code inside of the Interpreter Process. Aside from this, the major changes since the previous PR are: - The terminology "hooks" is used instead of "callbacks". Since the information being passed to the interpreter isn't actually a "callback", strictly speaking, the more general term "hook" is probably a better choice. All variable / class names were updated correspondingly. - The registry is now keyed by class name rather than the `replName` specified by the magic on top of the paragraph. The interface provided by `ZeppelinContext` allows for a `replName` to be converted to `className` through an internal mapping. - Two new event codes, `PRE_EXEC_DEV` and `POST_EXEC_DEV` have been added. This is primarily intended to separate use cases for the interpreter maintainers (via subclasses of `Interpreter`) and the notebook users (via the `ZeppelinContext` class), as otherwise user attempts at registering a hook would overwrite that set by the interpreter maintainer. - Global scope for hook registration is supported for the developer use cases. ### What type of PR is it? New Feature ### What is the Jira issue? [ZEPPELIN-1423](https://issues.apache.org/jira/browse/ZEPPELIN-1423) ### How should this be tested? In a new note, add the following lines of code to a paragraph: ```python %pyspark z.registerHook("post_exec", "print 'This code should be executed before the paragraph code!'") z.registerHook("pre_exec", "print 'This code should be executed after the paragraph code!'") ``` Then run any other paragraph in the note containing some other code, eg ```python %pyspark print "This code should be entered into the paragraph by the user!" ``` The output should be: ``` This code should be executed before the paragraph code! This code should be entered into the paragraph by the user! This code should be executed after the paragraph code! ``` You should also test out the other two methods (`getCallback()` and `unregisterCallback()`) specified in `ZeppelinContext.java`. One final caveat that should be mentioned: If there are errors in the code you specify for a pre-execute event, it will render the interpreter useless since the current implementation prepends the the code specified in `pre_exec` directly to the paragraph entered code before calling `interpret()`. The current workaround for this would be to either restart the interpreter group or call `unregisterHook()` via a different REPL within the interpreter group (eg, `z.unregisterHook("pre_exec", "pyspark")` from the spark interpreter). I would appreciate if anyone here would be willing to share any better approaches here. ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? Yes Author: Alex Goodman <[email protected]> Closes #1470 from agoodm/ZEPPELIN-1423v2 and squashes the following commits: 56ede60 [Alex Goodman] Automatically detect default interpreter for registerHook() 044a99d [Alex Goodman] Ensure that registered hooks are applied after call to open() 1331fe1 [Alex Goodman] Update interpreters.md 07cac65 [Alex Goodman] Implemented user-defined hook registry system for spark/pyspark interpreters 8fad936 [Alex Goodman] Added Interpreter Hooks to Interpreter Process
…erpreters ### What is this PR for? See apache#1439. This is a second attempt based on prior feedback, particularly from Leemoonsoo who advised that it would be better to contain all of the code inside of the Interpreter Process. Aside from this, the major changes since the previous PR are: - The terminology "hooks" is used instead of "callbacks". Since the information being passed to the interpreter isn't actually a "callback", strictly speaking, the more general term "hook" is probably a better choice. All variable / class names were updated correspondingly. - The registry is now keyed by class name rather than the `replName` specified by the magic on top of the paragraph. The interface provided by `ZeppelinContext` allows for a `replName` to be converted to `className` through an internal mapping. - Two new event codes, `PRE_EXEC_DEV` and `POST_EXEC_DEV` have been added. This is primarily intended to separate use cases for the interpreter maintainers (via subclasses of `Interpreter`) and the notebook users (via the `ZeppelinContext` class), as otherwise user attempts at registering a hook would overwrite that set by the interpreter maintainer. - Global scope for hook registration is supported for the developer use cases. ### What type of PR is it? New Feature ### What is the Jira issue? [ZEPPELIN-1423](https://issues.apache.org/jira/browse/ZEPPELIN-1423) ### How should this be tested? In a new note, add the following lines of code to a paragraph: ```python %pyspark z.registerHook("post_exec", "print 'This code should be executed before the paragraph code!'") z.registerHook("pre_exec", "print 'This code should be executed after the paragraph code!'") ``` Then run any other paragraph in the note containing some other code, eg ```python %pyspark print "This code should be entered into the paragraph by the user!" ``` The output should be: ``` This code should be executed before the paragraph code! This code should be entered into the paragraph by the user! This code should be executed after the paragraph code! ``` You should also test out the other two methods (`getCallback()` and `unregisterCallback()`) specified in `ZeppelinContext.java`. One final caveat that should be mentioned: If there are errors in the code you specify for a pre-execute event, it will render the interpreter useless since the current implementation prepends the the code specified in `pre_exec` directly to the paragraph entered code before calling `interpret()`. The current workaround for this would be to either restart the interpreter group or call `unregisterHook()` via a different REPL within the interpreter group (eg, `z.unregisterHook("pre_exec", "pyspark")` from the spark interpreter). I would appreciate if anyone here would be willing to share any better approaches here. ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? Yes Author: Alex Goodman <[email protected]> Closes apache#1470 from agoodm/ZEPPELIN-1423v2 and squashes the following commits: 56ede60 [Alex Goodman] Automatically detect default interpreter for registerHook() 044a99d [Alex Goodman] Ensure that registered hooks are applied after call to open() 1331fe1 [Alex Goodman] Update interpreters.md 07cac65 [Alex Goodman] Implemented user-defined hook registry system for spark/pyspark interpreters 8fad936 [Alex Goodman] Added Interpreter Hooks to Interpreter Process
…erpreters ### What is this PR for? See apache#1439. This is a second attempt based on prior feedback, particularly from Leemoonsoo who advised that it would be better to contain all of the code inside of the Interpreter Process. Aside from this, the major changes since the previous PR are: - The terminology "hooks" is used instead of "callbacks". Since the information being passed to the interpreter isn't actually a "callback", strictly speaking, the more general term "hook" is probably a better choice. All variable / class names were updated correspondingly. - The registry is now keyed by class name rather than the `replName` specified by the magic on top of the paragraph. The interface provided by `ZeppelinContext` allows for a `replName` to be converted to `className` through an internal mapping. - Two new event codes, `PRE_EXEC_DEV` and `POST_EXEC_DEV` have been added. This is primarily intended to separate use cases for the interpreter maintainers (via subclasses of `Interpreter`) and the notebook users (via the `ZeppelinContext` class), as otherwise user attempts at registering a hook would overwrite that set by the interpreter maintainer. - Global scope for hook registration is supported for the developer use cases. ### What type of PR is it? New Feature ### What is the Jira issue? [ZEPPELIN-1423](https://issues.apache.org/jira/browse/ZEPPELIN-1423) ### How should this be tested? In a new note, add the following lines of code to a paragraph: ```python %pyspark z.registerHook("post_exec", "print 'This code should be executed before the paragraph code!'") z.registerHook("pre_exec", "print 'This code should be executed after the paragraph code!'") ``` Then run any other paragraph in the note containing some other code, eg ```python %pyspark print "This code should be entered into the paragraph by the user!" ``` The output should be: ``` This code should be executed before the paragraph code! This code should be entered into the paragraph by the user! This code should be executed after the paragraph code! ``` You should also test out the other two methods (`getCallback()` and `unregisterCallback()`) specified in `ZeppelinContext.java`. One final caveat that should be mentioned: If there are errors in the code you specify for a pre-execute event, it will render the interpreter useless since the current implementation prepends the the code specified in `pre_exec` directly to the paragraph entered code before calling `interpret()`. The current workaround for this would be to either restart the interpreter group or call `unregisterHook()` via a different REPL within the interpreter group (eg, `z.unregisterHook("pre_exec", "pyspark")` from the spark interpreter). I would appreciate if anyone here would be willing to share any better approaches here. ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? Yes Author: Alex Goodman <[email protected]> Closes apache#1470 from agoodm/ZEPPELIN-1423v2 and squashes the following commits: 56ede60 [Alex Goodman] Automatically detect default interpreter for registerHook() 044a99d [Alex Goodman] Ensure that registered hooks are applied after call to open() 1331fe1 [Alex Goodman] Update interpreters.md 07cac65 [Alex Goodman] Implemented user-defined hook registry system for spark/pyspark interpreters 8fad936 [Alex Goodman] Added Interpreter Hooks to Interpreter Process
What is this PR for?
One feature built-in to Jupyter's ipykernel that is currently missing in Zeppelin are pre/post-execute "callbacks". This PR allows for the possibility of entering code that is set to be executed by the interpreter before the code entered into the paragraph (pre-execute) and after (post-execute). In addition to saving some users the effort of writing code that they want executed in every paragraph, this will help pave the way for more advanced features coming soon, like automatically displaying matplotlib figures without an explicit call to a
show()function. See the JIRA issue for more details on this.The core implementation for this is primarily contained in a new class called
InterpreterCallbackRegistrywhich heavily mimics the Angular Display system (eg,AngularObjectRegistry). While this interface alone should suffice for Interpreter maintainers on the Java/JVM side, I also thought it might be useful to expose the system directly to the users in their notebooks. Hence, an implementation that works with the spark and pyspark interpreters (via theZeppelinContextclass) is also included here.What type of PR is it?
New Feature
Todos
(Maybe) - Find a better way to check for errors in pre-execute as this can brick the REPL (see more info below)
What is the Jira issue?
ZEPPELIN-1423
How should this be tested?
In a new note, add the following lines of code to a paragraph:
Then run any other paragraph in the note containing some other code, eg
The output should be:
You should also test out the other two methods (
getCallback()andunregisterCallback()) specified inZeppelinContext.java.One final caveat that should be mentioned: If there are errors in the code you specify for a pre-execute event, it will render the interpreter useless since the current implementation prepends the the code specified in
pre_execdirectly to the paragraph entered code before callinginterpret(). The current workaround for this would be to either restart the interpreter group or callunregisterCallback()via a different REPL within the interpreter group (eg,z.unregisterCallback("pre_exec", "pyspark")from the spark interpreter). I would appreciate if anyone here would be willing to share any better approaches here.Questions: