-
Notifications
You must be signed in to change notification settings - Fork 2.8k
ZEPPELIN-1423 - Allow users to specify pre/post-execute hooks for interpreters #1470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@Leemoonsoo please review, thanks. |
Leemoonsoo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agoodm Thanks for the contribution. I've left some comments. Please take a look and let me know how you think.
| * @param cmd The code to be executed by the interpreter on given event | ||
| */ | ||
| @Experimental | ||
| public void registerHook(String noteId, String event, String cmd) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InterpreterHookRegistry is directly passed to ZeppelinContext on it's constructor. And ZeppelinContext is exposed to user directly. So currently Interpreter.registerHook(),Interpreter.unregiserHook(),Interpreter.getHook() are accessed from only RemoteInterpreterServer. But RemoteInterpreterServer also can get directly access InterpreterHookRegistry from InterpreterGroup. In this way, i think the feature can be implemented without modification of Interpreter.java. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of see your point, moving the convenience methods from Interpreter.java to InterpreterGroup.java would save the need for defining wrappers in LazyOpenInterpreter.java. However this would also mean that the interpreter className must always be specified. This means for example that this line:
String cmdDev = interpreter.getHook(noteId, HookType.PRE_EXEC_DEV);Would be changed to
String className = interpreter.getClassName();
String cmdDev = interpreter.getInterpreterGroup().getHook(noteId, className, HookType.PRE_EXEC_DEV);Additionally while these methods are only currently called through RemoteInterpreterServer, I will have use cases for directly calling them in the Python Interpreter subclasses (eg, PySparkInterpreter and PythonInterpreter) for my upcoming matplotlib integration work. Given this, I felt that the current syntax would be clearer.
| return runners; | ||
| } | ||
|
|
||
| public String getRequiredReplName() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Zeppelin allows use alias to select interpreter (see, https://issues.apache.org/jira/browse/ZEPPELIN-1012), this method will not only return "spark", "pyspark", etc but also any arbitrary alias. So, this value can be the key not exists in interpreterClassMap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot about this, this definitely sounds like it would be a problem. I was struggling to figure out ways to automatically detect the current interpreter class through ZeppelinContext since I thought that would be rather convenient for users, but I couldn't come up with better ideas at the time.
I think this problem can be partially addressed by passing in the interpreter class name through the current InterpreterContext instance as that could solve the problem for automatically detecting the current interpreter that's in use. However this still wouldn't allow users to manually specify aliases for another interpreter that isn't keyed into the internal mapping. Let me know what you think of this solution and if you have any other ideas.
|
@Leemoonsoo Thanks for the review, I have provided some responses. |
|
@Leemoonsoo I have addressed your second comment, let me know if this looks good to merge into master. |
|
Apologies for late response. Changes LGTM. Thanks @agoodm for making step forward to ZEPPELIN-1344! |
…erpreters ### What is this PR for? See apache#1439. This is a second attempt based on prior feedback, particularly from Leemoonsoo who advised that it would be better to contain all of the code inside of the Interpreter Process. Aside from this, the major changes since the previous PR are: - The terminology "hooks" is used instead of "callbacks". Since the information being passed to the interpreter isn't actually a "callback", strictly speaking, the more general term "hook" is probably a better choice. All variable / class names were updated correspondingly. - The registry is now keyed by class name rather than the `replName` specified by the magic on top of the paragraph. The interface provided by `ZeppelinContext` allows for a `replName` to be converted to `className` through an internal mapping. - Two new event codes, `PRE_EXEC_DEV` and `POST_EXEC_DEV` have been added. This is primarily intended to separate use cases for the interpreter maintainers (via subclasses of `Interpreter`) and the notebook users (via the `ZeppelinContext` class), as otherwise user attempts at registering a hook would overwrite that set by the interpreter maintainer. - Global scope for hook registration is supported for the developer use cases. ### What type of PR is it? New Feature ### What is the Jira issue? [ZEPPELIN-1423](https://issues.apache.org/jira/browse/ZEPPELIN-1423) ### How should this be tested? In a new note, add the following lines of code to a paragraph: ```python %pyspark z.registerHook("post_exec", "print 'This code should be executed before the paragraph code!'") z.registerHook("pre_exec", "print 'This code should be executed after the paragraph code!'") ``` Then run any other paragraph in the note containing some other code, eg ```python %pyspark print "This code should be entered into the paragraph by the user!" ``` The output should be: ``` This code should be executed before the paragraph code! This code should be entered into the paragraph by the user! This code should be executed after the paragraph code! ``` You should also test out the other two methods (`getCallback()` and `unregisterCallback()`) specified in `ZeppelinContext.java`. One final caveat that should be mentioned: If there are errors in the code you specify for a pre-execute event, it will render the interpreter useless since the current implementation prepends the the code specified in `pre_exec` directly to the paragraph entered code before calling `interpret()`. The current workaround for this would be to either restart the interpreter group or call `unregisterHook()` via a different REPL within the interpreter group (eg, `z.unregisterHook("pre_exec", "pyspark")` from the spark interpreter). I would appreciate if anyone here would be willing to share any better approaches here. ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? Yes Author: Alex Goodman <[email protected]> Closes apache#1470 from agoodm/ZEPPELIN-1423v2 and squashes the following commits: 56ede60 [Alex Goodman] Automatically detect default interpreter for registerHook() 044a99d [Alex Goodman] Ensure that registered hooks are applied after call to open() 1331fe1 [Alex Goodman] Update interpreters.md 07cac65 [Alex Goodman] Implemented user-defined hook registry system for spark/pyspark interpreters 8fad936 [Alex Goodman] Added Interpreter Hooks to Interpreter Process
…erpreters ### What is this PR for? See apache#1439. This is a second attempt based on prior feedback, particularly from Leemoonsoo who advised that it would be better to contain all of the code inside of the Interpreter Process. Aside from this, the major changes since the previous PR are: - The terminology "hooks" is used instead of "callbacks". Since the information being passed to the interpreter isn't actually a "callback", strictly speaking, the more general term "hook" is probably a better choice. All variable / class names were updated correspondingly. - The registry is now keyed by class name rather than the `replName` specified by the magic on top of the paragraph. The interface provided by `ZeppelinContext` allows for a `replName` to be converted to `className` through an internal mapping. - Two new event codes, `PRE_EXEC_DEV` and `POST_EXEC_DEV` have been added. This is primarily intended to separate use cases for the interpreter maintainers (via subclasses of `Interpreter`) and the notebook users (via the `ZeppelinContext` class), as otherwise user attempts at registering a hook would overwrite that set by the interpreter maintainer. - Global scope for hook registration is supported for the developer use cases. ### What type of PR is it? New Feature ### What is the Jira issue? [ZEPPELIN-1423](https://issues.apache.org/jira/browse/ZEPPELIN-1423) ### How should this be tested? In a new note, add the following lines of code to a paragraph: ```python %pyspark z.registerHook("post_exec", "print 'This code should be executed before the paragraph code!'") z.registerHook("pre_exec", "print 'This code should be executed after the paragraph code!'") ``` Then run any other paragraph in the note containing some other code, eg ```python %pyspark print "This code should be entered into the paragraph by the user!" ``` The output should be: ``` This code should be executed before the paragraph code! This code should be entered into the paragraph by the user! This code should be executed after the paragraph code! ``` You should also test out the other two methods (`getCallback()` and `unregisterCallback()`) specified in `ZeppelinContext.java`. One final caveat that should be mentioned: If there are errors in the code you specify for a pre-execute event, it will render the interpreter useless since the current implementation prepends the the code specified in `pre_exec` directly to the paragraph entered code before calling `interpret()`. The current workaround for this would be to either restart the interpreter group or call `unregisterHook()` via a different REPL within the interpreter group (eg, `z.unregisterHook("pre_exec", "pyspark")` from the spark interpreter). I would appreciate if anyone here would be willing to share any better approaches here. ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? Yes Author: Alex Goodman <[email protected]> Closes apache#1470 from agoodm/ZEPPELIN-1423v2 and squashes the following commits: 56ede60 [Alex Goodman] Automatically detect default interpreter for registerHook() 044a99d [Alex Goodman] Ensure that registered hooks are applied after call to open() 1331fe1 [Alex Goodman] Update interpreters.md 07cac65 [Alex Goodman] Implemented user-defined hook registry system for spark/pyspark interpreters 8fad936 [Alex Goodman] Added Interpreter Hooks to Interpreter Process
…ter overview page ### What is this PR for? After #1470 merged, "(Experimental) Interpreter Execution Hooks" is added under https://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/manual/interpreters.html page. But i think we need to keep this page as simple as possible since it's explaining the basic concept of Zeppelin interpreters. So I separated "(Experimental) Interpreter Execution Hooks" section from `interpreters.md` and created new page `interpreterexechooks.md`. And also fixed some markdown rendering issues. ### What type of PR is it? Documentation ### What is the Jira issue? N/A ### How should this be tested? Please see below screenshots :) ### Screenshots (if appropriate) - Before (under [manual/interpreter.md](https://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/manual/interpreters.html#experimental-interpreter-execution-hooks) <img width="437" alt="screen shot 2016-12-15 at 5 05 49 pm]" src="https://cloud.githubusercontent.com/assets/10060731/21216362/a3bb89f0-c2e9-11e6-9678-8e6d8749229b.png"> - After <img width="300" alt="dropdown" src="https://cloud.githubusercontent.com/assets/10060731/21216570/dcca63f0-c2ea-11e6-90a9-969d363b423a.png"> <img width="437" alt="screen shot 2016-12-15 at 5 05 58 pm" src="https://cloud.githubusercontent.com/assets/10060731/21216363/a6c2d82e-c2e9-11e6-920c-a603e25e1699.png"> ### Questions: * Does the licenses files need update? no * Is there breaking changes for older versions? no * Does this needs documentation? no Author: AhyoungRyu <[email protected]> Closes #1768 from AhyoungRyu/separate-eventhook-section and squashes the following commits: ce19491 [AhyoungRyu] Separate 'interpreter exec hooks' from intp overview
What is this PR for?
See #1439. This is a second attempt based on prior feedback, particularly from @Leemoonsoo who advised that it would be better to contain all of the code inside of the Interpreter Process. Aside from this, the major changes since the previous PR are:
replNamespecified by the magic on top of the paragraph. The interface provided byZeppelinContextallows for areplNameto be converted toclassNamethrough an internal mapping.PRE_EXEC_DEVandPOST_EXEC_DEVhave been added. This is primarily intended to separate use cases for the interpreter maintainers (via subclasses ofInterpreter) and the notebook users (via theZeppelinContextclass), as otherwise user attempts at registering a hook would overwrite that set by the interpreter maintainer.What type of PR is it?
New Feature
What is the Jira issue?
ZEPPELIN-1423
How should this be tested?
In a new note, add the following lines of code to a paragraph:
Then run any other paragraph in the note containing some other code, eg
The output should be:
You should also test out the other two methods (
getCallback()andunregisterCallback()) specified inZeppelinContext.java.One final caveat that should be mentioned: If there are errors in the code you specify for a pre-execute event, it will render the interpreter useless since the current implementation prepends the the code specified in
pre_execdirectly to the paragraph entered code before callinginterpret(). The current workaround for this would be to either restart the interpreter group or callunregisterHook()via a different REPL within the interpreter group (eg,z.unregisterHook("pre_exec", "pyspark")from the spark interpreter). I would appreciate if anyone here would be willing to share any better approaches here.Questions: