ZEPPELIN-1345 - Create a custom matplotlib backend that natively supports inline plotting in a python interpreter cell #1534

agoodm · 2016-10-18T04:52:10Z

What is this PR for?

This PR is the first of two major steps needed to improve matplotlib integration in Zeppelin (ZEPPELIN-1344). The latter, which is a plotting backend with fully interactive tools enabled, will be done afterwards in a separate PR. This PR specifically for automatically displaying output from calls to matplotlib plotting functions inline with each paragraph. Thanks to the addition of post-execute hooks (ZEPPELIN-1423), there is no need to call any show() function to display an inline plot, just like in Jupyter.

What type of PR is it?

Improvement

Todos

The main code has been written and anyone who reads this is encouraged to test it, but there are a few minor todos:

- Add unit tests
- Add documentation
- Add screenshot showing iterative plotting with angular mode

What is the Jira issue?

ZEPPELIN-1345

How should this be tested?

In a pyspark or python paragraph, enter and run

import matplotlib.pyplot as plt
plt.plot([1, 2, 3])

The plot should be displayed automatically without calling any show() function whatsoever. A special method called configure_mpl() can also be used to modify the inline plotting behavior. For example,

z.configure_mpl(close=False, angular=True)
plt.plot([1, 2, 3])

allows for iterative updates to the plot provided you have PY4J installed for your python installation (which of course is always the case if you use pypsark). To clarify, this feature only currently works with pyspark (not python as there are no angularBind() and angularUnbind() methods yet). Doing something like:

plt.plot([3, 2, 1])

will update the plot that was generated by the previous paragraph by leveraging Zeppelin's Angular Display System. However, by setting close=False, matplotlib will no longer automatically close figures so it is now up to the user to explicitly close each figure instance they create. There's quite a bit more options for z.configure_mpl(), but I will save that discussion for the documentation.

Screenshots (if appropriate)

Questions:

Does the licenses files need update? No
Is there breaking changes for older versions? No
Does this needs documentation? Yes

agoodm · 2016-11-02T04:12:55Z

Ok, that should do it for now. @Leemoonsoo @bzz @felixcheung please feel free to review.

Leemoonsoo · 2016-11-02T21:26:14Z

Thanks @agoodm for great contribution.
I have tested this and it works really well on both %pyspark and %python.
Also verified /lib/python dir is packaged with -Pbuild-distr flag.

One thing is, with the tutorial note /notebook/2BQA35CJZ, Zeppelin fails to start with error

Caused by: java.lang.NullPointerException
    at org.apache.zeppelin.notebook.Notebook.loadNoteFromRepo(Notebook.java:447)
    at org.apache.zeppelin.notebook.Notebook.loadAllNotes(Notebook.java:469)
    at org.apache.zeppelin.notebook.Notebook.<init>(Notebook.java:123)
    at org.apache.zeppelin.server.ZeppelinServer.<init>(ZeppelinServer.java:107)
    ... 29 more

Without tutorial note notebook/2BQA35CJZ, Zeppelin starts without problem.
Do you experience the same problem with tutorial note?

agoodm · 2016-11-02T21:50:48Z

@Leemoonsoo
Yep, I also had trouble starting Zeppelin. I can run it on my main development directory, but couldn't when I cloned the current master branch to an empty directory and patched my commits manually. I confirmed this by deleting the note directory and then restarting Zeppelin, after which there were no problems.

agoodm · 2016-11-02T22:14:23Z

@Leemoonsoo Try it now, I think it should work. The main thing I changed was printing the last image as HTML instead of ANGULAR (for the purpose of saving the image in the notebook). I also imported the old note into a new one (hence the directory name changing) from the UI before making these changes, then deleted the old note in place of this new one.

Leemoonsoo · 2016-11-02T22:25:07Z

@agoodm Thanks for the quick update. It works well! Let's see if CI goes green.

felixcheung · 2016-11-02T22:42:44Z

very cool! could you add some documentation on how html or angular output can plug in to this?
would it be possible to add the same tests to Spark pyspark interpreter too?

agoodm · 2016-11-02T22:57:39Z

@felixcheung The main idea is that the output is set to HTML if angular=False (default). If angular=True, the figure data gets bound to the angular display system by figure id and the output is set to angular. This is essentially explained already in the tutorial notebook. Would you like to see some separate documentation somewhere else? If so, where do you think I should put it?

As for unit tests for pyspark, I believe this would be easy enough to do but in that case I would need to exclude them from the main test suite as we do with python because matplotlib is required. We should probably have python, matplotlib, pandas, and pandasql automatically downloaded for our CI tests, most likely using conda since those packages are in binary form.

felixcheung · 2016-11-03T05:15:50Z

@agoodm how about a bit on the output in docs/interpreter/python.md? I do see your reference to the notebook but it is a bit harder for general user to access that way (and it is not readily readable off github either)

great point on more packages for python and more tests 😄 perhaps you could help later?

felixcheung · 2016-11-03T05:30:25Z

about the /lib directory, would it make sense if it's under the interpreter directory (like interpreter/python/lib)? seems like that would be consistent with everything interpreter is isolated

agoodm · 2016-11-03T22:10:04Z

@felixcheung
I updated the documentation and also added the example gif shown in this PR. In regards to getting python dependencies installed for our CI tests, I would be glad to help with that. With that in mind though I think I'll hold off on committing a unit test for pyspark until then since I'll probably get around to doing this anyway right after this PR gets merged, so not much point in going through the trouble of excluding the test in the pom.xml.

The reason I put the backend files in /lib and not in /interpreter/python/lib is because I didn't want to worry about editing multiple versions of the same file if these were to be used by another interpreter (eg, I would also want to have them in /interpreter/spark/lib). I suppose it may be possible to just have a top-level lib/python get copied directly to both directories through maven, but I didn't see the added complexity being worth it.

Also on a final note @Leemoonsoo , the previous CI failures have alerted me to another issue: With %pyspark, the output from the last statement in a paragraph no longer gets automatically printed, eg

%pyspark
x = 1
x

This is because the do-nothing displayhook() call is now always the last statement thanks to the hook registry. This is only a problem with %pyspark, but not in python. I don't know enough about the internals of the former that would explain the difference, but it might be something we should look into later.

felixcheung · 2016-11-04T02:53:56Z

Great.

I think it will be better to have interpreter/lib - it might be easier to set ACL on a common interpreter root in the event there are user downloaded interpreters for examples.

agoodm · 2016-11-04T03:01:46Z

@felixcheung
In that case, I think that should be fine. Will do shortly.

@Leemoonsoo
The latest commit should fix the issue I just mentioned. Two other important things to mention:

Any print statements made prior to printing the %html or %angular won't show up in the results. This means that when close=False and when any plots are open, print statements won't display since the output type magic will always come after user entered print-statements. This can be fixed for the matplotlib backend specifically by adding a pre-execute hook that prints the magic of the currently open plot, but it's not really ideal.
Earlier we both had issues loading the tutorial notebook. As it turns out, the reason appears to be due the angular display system. This happens whenever I stop Zeppelin whenever variables are still bound to the angular display system. If I make sure to unbind everything first, then I can safely start Zeppelin again. I am pretty sure this issue was introduced in more recent commits since I didn't have this problem when I first made the tutorial notebook. It was not until after I rebased my branch to reproduce the problem when you initially discovered it. This is actually a serious issue, and in this case it would be bad to leave it unaddressed with this PR since that would mean anyone that runs the last example in the tutorial notebook without calling plt.close() at the end will brick their zeppelin installation until removing the notebook. Seems like this was fixed as of [ZEPPELIN-1612] Fix NPE when initializing Notebook #1590.

How do you think we should proceed?

Leemoonsoo · 2016-11-04T18:57:46Z

@agoodm
I think placement of common python lib and problem printing output when close=False can be handled in the separate issues. Do you mind create issues?

I have tested last commit and found that if i run %pyspark paragraph when spark interpreter is not initialized (i.e. right after Zeppelin start or spark interpreter restart), i'm seeing following error.

ERROR [2016-11-04 11:32:09,138] ({pool-1-thread-2} PySparkInterpreter.java[open]:168) - Error
java.lang.NullPointerException
    at org.apache.zeppelin.spark.PySparkInterpreter.createGatewayServerAndStartScript(PySparkInterpreter.java:197)
    at org.apache.zeppelin.spark.PySparkInterpreter.open(PySparkInterpreter.java:166)
    at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
    at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.open(RemoteInterpreterServer.java:250)
    at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$open.getResult(RemoteInterpreterService.java:1621)
    at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$open.getResult(RemoteInterpreterService.java:1606)
    at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
    at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
    at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
ERROR [2016-11-04 11:32:09,140] ({pool-1-thread-2} TThreadPoolServer.java[run]:296) - Error occurred during processing of message.
org.apache.zeppelin.interpreter.InterpreterException: java.lang.NullPointerException
    at org.apache.zeppelin.spark.PySparkInterpreter.open(PySparkInterpreter.java:169)
    at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
    at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.open(RemoteInterpreterServer.java:250)
    at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$open.getResult(RemoteInterpreterService.java:1621)
    at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$open.getResult(RemoteInterpreterService.java:1606)
    at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
    at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
    at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
    at org.apache.zeppelin.spark.PySparkInterpreter.createGatewayServerAndStartScript(PySparkInterpreter.java:197)
    at org.apache.zeppelin.spark.PySparkInterpreter.open(PySparkInterpreter.java:166)
    ... 10 more

Do you see the same error?

agoodm · 2016-11-04T21:20:04Z

@Leemoonsoo Sure, I will do that after this is merged.

As for your error, I cannot reproduce it. I tested both the latest code from master as well as my fork after rebasing. How are you performing the maven build? I use:

mvn clean package -Pbuild-distr -Ppyspark -DskipTests

without modifying any of the configuration files.

Leemoonsoo · 2016-11-04T21:38:31Z

@agoodm It's my bad, i was testing it with #1596 merged.
Tested just last commit in the branch again, and the error has gone.

agoodm · 2016-11-04T21:40:32Z

@Leemoonsoo Yep, I can confirm... right before you posted that comment, I manually patched in the changes from that PR to my branch and got the exact same stacktrace as you in my interpreter log file after running a %pyspark paragraph.

Leemoonsoo · 2016-11-05T04:38:25Z

@agoodm now CI has fixed on master. Could you rebase and see if CI test goes green?

agoodm · 2016-11-05T06:02:33Z

@Leemoonsoo Tests are done, the only failure seems to be from a known flaky test (ZEPPELIN-1623). Are we good to merge?

Leemoonsoo · 2016-11-05T06:59:26Z

LGTM and merge to master if there's no further discussion.
Thanks @agoodm for the great contribution.

felixcheung · 2016-11-05T07:04:35Z

How do we think about this #1534 (comment)

agoodm · 2016-11-06T06:43:01Z

@felixcheung done, I have tested with the tutorial notebook and it seems to work as well as before. Good to merge now?

felixcheung · 2016-11-06T18:20:43Z

LGTM.
Is there any known issue with SELENIUM tests? from a quick glance it seems to have failed a few times here.

agoodm · 2016-11-06T19:34:27Z

There is a JIRA issue: (ZEPPELIN-1623)

felixcheung · 2016-11-07T06:55:34Z

Great, thank you for digging it through.

Any more comment?

Leemoonsoo · 2016-11-07T16:17:10Z

Looks good to me!

Leemoonsoo · 2016-11-07T18:16:58Z

Merge to master if there's no further discussion.

bzz · 2016-11-07T23:06:32Z

Looks great to me, 👍 for extra tests. Let's merge

bzz · 2016-11-13T12:28:01Z

@agoodm sorry for digging this out, but I have just realized that this PR changes only Python PyZeppelinContext and Pyspark PyZeppelinContext but does not touch Python Py4jZeppelinContext - wich is used for dynamic forms, in case py4j is installed on the system for things like

z.select("Choose a letter", 
    ([a,"a"], [b,"b"], [c,"c"]

In latter case Python interpreter becomes not useable - any line results in

x = 1

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Py4jZeppelinContext' object has no attribute '_displayhook'

Update: logged it under https://issues.apache.org/jira/browse/ZEPPELIN-1655

### What is this PR for? After #1534 , Dynamic Forms were no longer working in the python interpreter. This is because the `Py4jZeppelinContext` constructor did not initialize the `_displayhook` which is always called on post-execute. ### What type of PR is it? Bug Fix ### What is the Jira issue? [ZEPPELIN-1655](https://issues.apache.org/jira/browse/ZEPPELIN-1655) ### How should this be tested? Run the following `%python` paragraph, being sure that Py4j is installed: ```python %python a, b, c = (1, 2, 3) z.select("Choose a letter", ([a,"a"], [b,"b"], [c,"c"] )) ``` ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? No Author: Alex Goodman <[email protected]> Closes #1626 from agoodm/ZEPPELIN-1655 and squashes the following commits: 2e4ee2d [Alex Goodman] Make sure _displayhook is initialized in Py4jZeppelinContext

### What is this PR for? There have been reports of #1534 causing the python interpreter to always show an error because `z` is not being set. As it turns out this is a result of improperly handling the case when matplotlib isn't found when initializing the interpreter. ### What type of PR is it? Bug Fix ### What is the Jira issue? [ZEPPELIN-1656](https://issues.apache.org/jira/browse/ZEPPELIN-1656) ### How should this be tested? Run any simple python paragraph. ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? No Author: Alex Goodman <[email protected]> Closes #1628 from agoodm/patch-1 and squashes the following commits: 67f2ad5 [Alex Goodman] python interpeter should work when matplotlib is not installed 0a7a9d7 [Alex Goodman] Fix indent in bootstrap.py

ghost · 2017-05-07T16:43:51Z

I can't make work the plot update from a different cell in a %python interpreter. I've described the problem in details in ZEPPELIN-2511.

agoodm force-pushed the ZEPPELIN-1345 branch from c5bcbcc to fef5a9d Compare November 1, 2016 21:24

agoodm changed the title ~~[WIP]ZEPPELIN-1345 - Create a custom matplotlib backend that natively supports inline plotting in a python interpreter cell~~ ZEPPELIN-1345 - Create a custom matplotlib backend that natively supports inline plotting in a python interpreter cell Nov 2, 2016

agoodm force-pushed the ZEPPELIN-1345 branch from 89f8fce to cd3cc5c Compare November 3, 2016 21:50

Leemoonsoo mentioned this pull request Nov 4, 2016

ZEPPELIN-1616. Interpreter open happens in jobRun #1596

Closed

1 task

agoodm force-pushed the ZEPPELIN-1345 branch from 707f1d2 to 6218e35 Compare November 4, 2016 21:17

agoodm added 8 commits November 4, 2016 21:54

Add new matplotlib backend for python/pyspark interpreters

edf750a

Added support for Angular Display System

f9c9498

Removed unused variable

82135ad

Fix NullPointerExceptions in unit tests

8b9b973

Add unit tests

9792f97

Update matplotlib tutorial notebook

82350e3

Update python.md

a321d79

Fix legend in tutorial notebook

c37b00f

agoodm added 10 commits November 4, 2016 21:54

Fix tutorial notebook not loading

86b1c90

Exclude tests are excluded in python/pom.xml

f2d9e86

Update python/README.md

8029a05

Add iterative plotting example image

c9b65a5

Update python.md for new matplotlib integration

bcf0bf3

Update spark.md

c90d204

Fix CI test failure

d3d1aa0

Remove unused variable

22b6fe4

Make sure expressions are printed when no plots are shown

bdb584e

Catch potential NullPointerExceptions from hook registry

24f89c6

agoodm force-pushed the ZEPPELIN-1345 branch from 6218e35 to 24f89c6 Compare November 5, 2016 04:56

Move mpl backend files to /interpreter

9ef6ff7

asfgit closed this in 438dbca Nov 8, 2016

agoodm deleted the ZEPPELIN-1345 branch November 8, 2016 20:59

agoodm mentioned this pull request Nov 13, 2016

[ZEPPELIN-1655] Dynamic forms in Python interpreter do not work #1626

Closed

agoodm mentioned this pull request Nov 13, 2016

[HOTFIX][ZEPPELIN-1656] z.show in Python interpreter does not work #1628

Closed

ZEPPELIN-1345 - Create a custom matplotlib backend that natively supports inline plotting in a python interpreter cell #1534

ZEPPELIN-1345 - Create a custom matplotlib backend that natively supports inline plotting in a python interpreter cell #1534

Uh oh!

Conversation

agoodm commented Oct 18, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is this PR for?

What type of PR is it?

Todos

What is the Jira issue?

How should this be tested?

Screenshots (if appropriate)

Questions:

Uh oh!

agoodm commented Nov 2, 2016

Uh oh!

Leemoonsoo commented Nov 2, 2016

Uh oh!

agoodm commented Nov 2, 2016

Uh oh!

agoodm commented Nov 2, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Leemoonsoo commented Nov 2, 2016

Uh oh!

felixcheung commented Nov 2, 2016

Uh oh!

agoodm commented Nov 2, 2016

Uh oh!

felixcheung commented Nov 3, 2016

Uh oh!

felixcheung commented Nov 3, 2016

Uh oh!

agoodm commented Nov 3, 2016

Uh oh!

felixcheung commented Nov 4, 2016

Uh oh!

agoodm commented Nov 4, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Leemoonsoo commented Nov 4, 2016

Uh oh!

agoodm commented Nov 4, 2016

Uh oh!

Leemoonsoo commented Nov 4, 2016

Uh oh!

agoodm commented Nov 4, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Leemoonsoo commented Nov 5, 2016

Uh oh!

agoodm commented Nov 5, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Leemoonsoo commented Nov 5, 2016

Uh oh!

felixcheung commented Nov 5, 2016

Uh oh!

agoodm commented Nov 6, 2016

Uh oh!

felixcheung commented Nov 6, 2016

Uh oh!

agoodm commented Nov 6, 2016

Uh oh!

felixcheung commented Nov 7, 2016

Uh oh!

Leemoonsoo commented Nov 7, 2016

Uh oh!

Leemoonsoo commented Nov 7, 2016

Uh oh!

bzz commented Nov 7, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bzz commented Nov 13, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ghost commented May 7, 2017 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

agoodm commented Oct 18, 2016 •

edited

Loading

agoodm commented Nov 2, 2016 •

edited

Loading

agoodm commented Nov 4, 2016 •

edited

Loading

agoodm commented Nov 4, 2016 •

edited

Loading

agoodm commented Nov 5, 2016 •

edited

Loading

bzz commented Nov 7, 2016 •

edited

Loading

bzz commented Nov 13, 2016 •

edited

Loading

ghost commented May 7, 2017 •

edited by ghost

Loading