Skip to content

Conversation

@agoodm
Copy link
Member

@agoodm agoodm commented Sep 29, 2016

What is this PR for?

See #1439. This is a second attempt based on prior feedback, particularly from @Leemoonsoo who advised that it would be better to contain all of the code inside of the Interpreter Process. Aside from this, the major changes since the previous PR are:

  • The terminology "hooks" is used instead of "callbacks". Since the information being passed to the interpreter isn't actually a "callback", strictly speaking, the more general term "hook" is probably a better choice. All variable / class names were updated correspondingly.
  • The registry is now keyed by class name rather than the replName specified by the magic on top of the paragraph. The interface provided by ZeppelinContext allows for a replName to be converted to className through an internal mapping.
  • Two new event codes, PRE_EXEC_DEV and POST_EXEC_DEV have been added. This is primarily intended to separate use cases for the interpreter maintainers (via subclasses of Interpreter) and the notebook users (via the ZeppelinContext class), as otherwise user attempts at registering a hook would overwrite that set by the interpreter maintainer.
  • Global scope for hook registration is supported for the developer use cases.

What type of PR is it?

New Feature

What is the Jira issue?

ZEPPELIN-1423

How should this be tested?

In a new note, add the following lines of code to a paragraph:

%pyspark
z.registerHook("post_exec", "print 'This code should be executed before the paragraph code!'")
z.registerHook("pre_exec", "print 'This code should be executed after the paragraph code!'")

Then run any other paragraph in the note containing some other code, eg

%pyspark
print "This code should be entered into the paragraph by the user!"

The output should be:

This code should be executed before the paragraph code!
This code should be entered into the paragraph by the user!
This code should be executed after the paragraph code!

You should also test out the other two methods (getCallback() and unregisterCallback()) specified in ZeppelinContext.java.

One final caveat that should be mentioned: If there are errors in the code you specify for a pre-execute event, it will render the interpreter useless since the current implementation prepends the the code specified in pre_exec directly to the paragraph entered code before calling interpret(). The current workaround for this would be to either restart the interpreter group or call unregisterHook() via a different REPL within the interpreter group (eg, z.unregisterHook("pre_exec", "pyspark") from the spark interpreter). I would appreciate if anyone here would be willing to share any better approaches here.

Questions:

  • Does the licenses files need update? No
  • Is there breaking changes for older versions? No
  • Does this needs documentation? Yes

@agoodm
Copy link
Member Author

agoodm commented Sep 29, 2016

@Leemoonsoo please review, thanks.

Copy link
Member

@Leemoonsoo Leemoonsoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agoodm Thanks for the contribution. I've left some comments. Please take a look and let me know how you think.

* @param cmd The code to be executed by the interpreter on given event
*/
@Experimental
public void registerHook(String noteId, String event, String cmd) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InterpreterHookRegistry is directly passed to ZeppelinContext on it's constructor. And ZeppelinContext is exposed to user directly. So currently Interpreter.registerHook(),Interpreter.unregiserHook(),Interpreter.getHook() are accessed from only RemoteInterpreterServer. But RemoteInterpreterServer also can get directly access InterpreterHookRegistry from InterpreterGroup. In this way, i think the feature can be implemented without modification of Interpreter.java. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of see your point, moving the convenience methods from Interpreter.java to InterpreterGroup.java would save the need for defining wrappers in LazyOpenInterpreter.java. However this would also mean that the interpreter className must always be specified. This means for example that this line:

String cmdDev = interpreter.getHook(noteId, HookType.PRE_EXEC_DEV);

Would be changed to

String className = interpreter.getClassName();
String cmdDev = interpreter.getInterpreterGroup().getHook(noteId, className, HookType.PRE_EXEC_DEV);

Additionally while these methods are only currently called through RemoteInterpreterServer, I will have use cases for directly calling them in the Python Interpreter subclasses (eg, PySparkInterpreter and PythonInterpreter) for my upcoming matplotlib integration work. Given this, I felt that the current syntax would be clearer.

return runners;
}

public String getRequiredReplName() {
Copy link
Member

@Leemoonsoo Leemoonsoo Oct 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Zeppelin allows use alias to select interpreter (see, https://issues.apache.org/jira/browse/ZEPPELIN-1012), this method will not only return "spark", "pyspark", etc but also any arbitrary alias. So, this value can be the key not exists in interpreterClassMap.

Copy link
Member Author

@agoodm agoodm Oct 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot about this, this definitely sounds like it would be a problem. I was struggling to figure out ways to automatically detect the current interpreter class through ZeppelinContext since I thought that would be rather convenient for users, but I couldn't come up with better ideas at the time.

I think this problem can be partially addressed by passing in the interpreter class name through the current InterpreterContext instance as that could solve the problem for automatically detecting the current interpreter that's in use. However this still wouldn't allow users to manually specify aliases for another interpreter that isn't keyed into the internal mapping. Let me know what you think of this solution and if you have any other ideas.

@agoodm
Copy link
Member Author

agoodm commented Oct 3, 2016

@Leemoonsoo Thanks for the review, I have provided some responses.

@agoodm
Copy link
Member Author

agoodm commented Oct 6, 2016

@Leemoonsoo I have addressed your second comment, let me know if this looks good to merge into master.

@Leemoonsoo
Copy link
Member

Apologies for late response. Changes LGTM.
Last CI failure is irrelevant to this change.
Merge if there're no more discussion.

Thanks @agoodm for making step forward to ZEPPELIN-1344!

@asfgit asfgit closed this in c67bd6d Oct 17, 2016
@agoodm agoodm deleted the ZEPPELIN-1423v2 branch October 18, 2016 07:30
darionyaphet pushed a commit to darionyaphet/zeppelin that referenced this pull request Oct 27, 2016
…erpreters

### What is this PR for?
See apache#1439. This is a second attempt based on prior feedback, particularly from Leemoonsoo who advised that it would be better to contain all of the code inside of the Interpreter Process. Aside from this, the major changes since the previous PR are:

- The terminology "hooks" is used instead of "callbacks". Since the information being passed to the interpreter isn't actually a "callback", strictly speaking, the more general term "hook" is probably a better choice. All variable / class names were updated correspondingly.
- The registry is now keyed by class name rather than the `replName` specified by the magic on top of the paragraph. The interface provided by `ZeppelinContext` allows for a `replName` to be converted to `className` through an internal mapping.
- Two new event codes, `PRE_EXEC_DEV` and `POST_EXEC_DEV` have been added. This is primarily intended to separate use cases for the interpreter maintainers (via subclasses of `Interpreter`) and the notebook users (via the `ZeppelinContext` class), as otherwise user attempts at registering a hook would overwrite that set by the interpreter maintainer.
- Global scope for hook registration is supported for the developer use cases.

### What type of PR is it?
New Feature

### What is the Jira issue?
[ZEPPELIN-1423](https://issues.apache.org/jira/browse/ZEPPELIN-1423)

### How should this be tested?
In a new note, add the following lines of code to a paragraph:
```python
%pyspark
z.registerHook("post_exec", "print 'This code should be executed before the paragraph code!'")
z.registerHook("pre_exec", "print 'This code should be executed after the paragraph code!'")
```

Then run any other paragraph in the note containing some other code, eg
```python
%pyspark
print "This code should be entered into the paragraph by the user!"
```

The output should be:
```
This code should be executed before the paragraph code!
This code should be entered into the paragraph by the user!
This code should be executed after the paragraph code!
```

You should also test out the other two methods (`getCallback()` and `unregisterCallback()`) specified in `ZeppelinContext.java`.

One final caveat that should be mentioned: If there are errors in the code you specify for a pre-execute event, it will render the interpreter useless since the current implementation prepends the the code specified in `pre_exec` directly to the paragraph entered code before calling `interpret()`. The current workaround for this would be to either restart the interpreter group or call `unregisterHook()` via a different REPL within the interpreter group (eg, `z.unregisterHook("pre_exec", "pyspark")` from the spark interpreter). I would appreciate if anyone here would be willing to share any better approaches here.

### Questions:
* Does the licenses files need update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? Yes

Author: Alex Goodman <[email protected]>

Closes apache#1470 from agoodm/ZEPPELIN-1423v2 and squashes the following commits:

56ede60 [Alex Goodman] Automatically detect default interpreter for registerHook()
044a99d [Alex Goodman] Ensure that registered hooks are applied after call to open()
1331fe1 [Alex Goodman] Update interpreters.md
07cac65 [Alex Goodman] Implemented user-defined hook registry system for spark/pyspark interpreters
8fad936 [Alex Goodman] Added Interpreter Hooks to Interpreter Process
pedrozatta pushed a commit to pedrozatta/zeppelin that referenced this pull request Oct 27, 2016
…erpreters

### What is this PR for?
See apache#1439. This is a second attempt based on prior feedback, particularly from Leemoonsoo who advised that it would be better to contain all of the code inside of the Interpreter Process. Aside from this, the major changes since the previous PR are:

- The terminology "hooks" is used instead of "callbacks". Since the information being passed to the interpreter isn't actually a "callback", strictly speaking, the more general term "hook" is probably a better choice. All variable / class names were updated correspondingly.
- The registry is now keyed by class name rather than the `replName` specified by the magic on top of the paragraph. The interface provided by `ZeppelinContext` allows for a `replName` to be converted to `className` through an internal mapping.
- Two new event codes, `PRE_EXEC_DEV` and `POST_EXEC_DEV` have been added. This is primarily intended to separate use cases for the interpreter maintainers (via subclasses of `Interpreter`) and the notebook users (via the `ZeppelinContext` class), as otherwise user attempts at registering a hook would overwrite that set by the interpreter maintainer.
- Global scope for hook registration is supported for the developer use cases.

### What type of PR is it?
New Feature

### What is the Jira issue?
[ZEPPELIN-1423](https://issues.apache.org/jira/browse/ZEPPELIN-1423)

### How should this be tested?
In a new note, add the following lines of code to a paragraph:
```python
%pyspark
z.registerHook("post_exec", "print 'This code should be executed before the paragraph code!'")
z.registerHook("pre_exec", "print 'This code should be executed after the paragraph code!'")
```

Then run any other paragraph in the note containing some other code, eg
```python
%pyspark
print "This code should be entered into the paragraph by the user!"
```

The output should be:
```
This code should be executed before the paragraph code!
This code should be entered into the paragraph by the user!
This code should be executed after the paragraph code!
```

You should also test out the other two methods (`getCallback()` and `unregisterCallback()`) specified in `ZeppelinContext.java`.

One final caveat that should be mentioned: If there are errors in the code you specify for a pre-execute event, it will render the interpreter useless since the current implementation prepends the the code specified in `pre_exec` directly to the paragraph entered code before calling `interpret()`. The current workaround for this would be to either restart the interpreter group or call `unregisterHook()` via a different REPL within the interpreter group (eg, `z.unregisterHook("pre_exec", "pyspark")` from the spark interpreter). I would appreciate if anyone here would be willing to share any better approaches here.

### Questions:
* Does the licenses files need update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? Yes

Author: Alex Goodman <[email protected]>

Closes apache#1470 from agoodm/ZEPPELIN-1423v2 and squashes the following commits:

56ede60 [Alex Goodman] Automatically detect default interpreter for registerHook()
044a99d [Alex Goodman] Ensure that registered hooks are applied after call to open()
1331fe1 [Alex Goodman] Update interpreters.md
07cac65 [Alex Goodman] Implemented user-defined hook registry system for spark/pyspark interpreters
8fad936 [Alex Goodman] Added Interpreter Hooks to Interpreter Process
asfgit pushed a commit that referenced this pull request Dec 17, 2016
…ter overview page

### What is this PR for?
After #1470 merged, "(Experimental) Interpreter Execution Hooks" is added under https://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/manual/interpreters.html page. But i think we need to keep this page as simple as possible since it's explaining the basic concept of Zeppelin interpreters. So I separated  "(Experimental) Interpreter Execution Hooks" section from `interpreters.md` and created new page `interpreterexechooks.md`.
And also fixed some markdown rendering issues.

### What type of PR is it?
Documentation

### What is the Jira issue?
N/A

### How should this be tested?
Please see below screenshots :)

### Screenshots (if appropriate)
 - Before (under [manual/interpreter.md](https://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/manual/interpreters.html#experimental-interpreter-execution-hooks)
<img width="437" alt="screen shot 2016-12-15 at 5 05 49 pm]" src="https://cloud.githubusercontent.com/assets/10060731/21216362/a3bb89f0-c2e9-11e6-9678-8e6d8749229b.png">

 - After
<img width="300" alt="dropdown" src="https://cloud.githubusercontent.com/assets/10060731/21216570/dcca63f0-c2ea-11e6-90a9-969d363b423a.png">

<img width="437" alt="screen shot 2016-12-15 at 5 05 58 pm" src="https://cloud.githubusercontent.com/assets/10060731/21216363/a6c2d82e-c2e9-11e6-920c-a603e25e1699.png">

### Questions:
* Does the licenses files need update? no
* Is there breaking changes for older versions? no
* Does this needs documentation? no

Author: AhyoungRyu <[email protected]>

Closes #1768 from AhyoungRyu/separate-eventhook-section and squashes the following commits:

ce19491 [AhyoungRyu] Separate 'interpreter exec hooks' from intp overview
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants