-
Notifications
You must be signed in to change notification settings - Fork 2.8k
ZEPPELIN-1345 - Create a custom matplotlib backend that natively supports inline plotting in a python interpreter cell #1534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Ok, that should do it for now. @Leemoonsoo @bzz @felixcheung please feel free to review. |
|
Thanks @agoodm for great contribution. One thing is, with the tutorial note Without tutorial note |
|
@Leemoonsoo |
|
@Leemoonsoo Try it now, I think it should work. The main thing I changed was printing the last image as HTML instead of ANGULAR (for the purpose of saving the image in the notebook). I also imported the old note into a new one (hence the directory name changing) from the UI before making these changes, then deleted the old note in place of this new one. |
|
@agoodm Thanks for the quick update. It works well! Let's see if CI goes green. |
|
very cool! could you add some documentation on how html or angular output can plug in to this? |
|
@felixcheung The main idea is that the output is set to HTML if As for unit tests for pyspark, I believe this would be easy enough to do but in that case I would need to exclude them from the main test suite as we do with python because matplotlib is required. We should probably have python, matplotlib, pandas, and pandasql automatically downloaded for our CI tests, most likely using conda since those packages are in binary form. |
|
@agoodm how about a bit on the output in docs/interpreter/python.md? I do see your reference to the notebook but it is a bit harder for general user to access that way (and it is not readily readable off github either) great point on more packages for python and more tests 😄 perhaps you could help later? |
|
about the /lib directory, would it make sense if it's under the interpreter directory (like interpreter/python/lib)? seems like that would be consistent with everything interpreter is isolated |
|
@felixcheung The reason I put the backend files in Also on a final note @Leemoonsoo , the previous CI failures have alerted me to another issue: With %pyspark
x = 1
xThis is because the do-nothing |
|
Great. I think it will be better to have interpreter/lib - it might be easier to set ACL on a common interpreter root in the event there are user downloaded interpreters for examples. |
|
@felixcheung @Leemoonsoo
How do you think we should proceed? |
|
@agoodm I have tested last commit and found that if i run %pyspark paragraph when spark interpreter is not initialized (i.e. right after Zeppelin start or spark interpreter restart), i'm seeing following error. Do you see the same error? |
|
@Leemoonsoo Sure, I will do that after this is merged. As for your error, I cannot reproduce it. I tested both the latest code from master as well as my fork after rebasing. How are you performing the maven build? I use: without modifying any of the configuration files. |
|
@Leemoonsoo Yep, I can confirm... right before you posted that comment, I manually patched in the changes from that PR to my branch and got the exact same stacktrace as you in my interpreter log file after running a |
|
@agoodm now CI has fixed on master. Could you rebase and see if CI test goes green? |
|
@Leemoonsoo Tests are done, the only failure seems to be from a known flaky test (ZEPPELIN-1623). Are we good to merge? |
|
LGTM and merge to master if there's no further discussion. |
|
How do we think about this #1534 (comment) |
|
@felixcheung done, I have tested with the tutorial notebook and it seems to work as well as before. Good to merge now? |
|
LGTM. |
|
There is a JIRA issue: (ZEPPELIN-1623) |
|
Great, thank you for digging it through. Any more comment? |
|
Looks good to me! |
|
Merge to master if there's no further discussion. |
|
Looks great to me, 👍 for extra tests. Let's merge |
|
@agoodm sorry for digging this out, but I have just realized that this PR changes only Python In latter case Python interpreter becomes not useable - any line results in Update: logged it under https://issues.apache.org/jira/browse/ZEPPELIN-1655 |
### What is this PR for? After #1534 , Dynamic Forms were no longer working in the python interpreter. This is because the `Py4jZeppelinContext` constructor did not initialize the `_displayhook` which is always called on post-execute. ### What type of PR is it? Bug Fix ### What is the Jira issue? [ZEPPELIN-1655](https://issues.apache.org/jira/browse/ZEPPELIN-1655) ### How should this be tested? Run the following `%python` paragraph, being sure that Py4j is installed: ```python %python a, b, c = (1, 2, 3) z.select("Choose a letter", ([a,"a"], [b,"b"], [c,"c"] )) ``` ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? No Author: Alex Goodman <[email protected]> Closes #1626 from agoodm/ZEPPELIN-1655 and squashes the following commits: 2e4ee2d [Alex Goodman] Make sure _displayhook is initialized in Py4jZeppelinContext
### What is this PR for? There have been reports of #1534 causing the python interpreter to always show an error because `z` is not being set. As it turns out this is a result of improperly handling the case when matplotlib isn't found when initializing the interpreter. ### What type of PR is it? Bug Fix ### What is the Jira issue? [ZEPPELIN-1656](https://issues.apache.org/jira/browse/ZEPPELIN-1656) ### How should this be tested? Run any simple python paragraph. ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? No Author: Alex Goodman <[email protected]> Closes #1628 from agoodm/patch-1 and squashes the following commits: 67f2ad5 [Alex Goodman] python interpeter should work when matplotlib is not installed 0a7a9d7 [Alex Goodman] Fix indent in bootstrap.py
|
I can't make work the plot update from a different cell in a |
What is this PR for?
This PR is the first of two major steps needed to improve matplotlib integration in Zeppelin (ZEPPELIN-1344). The latter, which is a plotting backend with fully interactive tools enabled, will be done afterwards in a separate PR. This PR specifically for automatically displaying output from calls to matplotlib plotting functions inline with each paragraph. Thanks to the addition of post-execute hooks (ZEPPELIN-1423), there is no need to call any
show()function to display an inline plot, just like in Jupyter.What type of PR is it?
Improvement
Todos
The main code has been written and anyone who reads this is encouraged to test it, but there are a few minor todos:
What is the Jira issue?
ZEPPELIN-1345
How should this be tested?
In a pyspark or python paragraph, enter and run
The plot should be displayed automatically without calling any
show()function whatsoever. A special method calledconfigure_mpl()can also be used to modify the inline plotting behavior. For example,allows for iterative updates to the plot provided you have PY4J installed for your python installation (which of course is always the case if you use pypsark). To clarify, this feature only currently works with pyspark (not python as there are no
angularBind()andangularUnbind()methods yet). Doing something like:will update the plot that was generated by the previous paragraph by leveraging Zeppelin's Angular Display System. However, by setting
close=False, matplotlib will no longer automatically close figures so it is now up to the user to explicitly close each figure instance they create. There's quite a bit more options forz.configure_mpl(), but I will save that discussion for the documentation.Screenshots (if appropriate)
Questions: