-
Notifications
You must be signed in to change notification settings - Fork 2.8k
R Interpreter for Zeppelin #208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the need for double { ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None - thanks for catching that. At one point I used IntelliJ to reformat the code for Apache styles and it left a bunch of those artifacts. Thought I caught them all.
On Aug 13, 2015, at 3:45 AM, Martin Grigorov [email protected] wrote:
In r/src/main/scala/org/apache/zeppelin/rinterpreter/KnitRInterpreter.scala:
"""opts_knit$set(out.format = 'html',|results='asis',|progress = FALSE,|self.contained = TRUE,|verbose = FALSE,|comment = NA,|tidy = FALSE)||.z.show.googleVis <- function(widget) print(widget, tag = 'chart')|.z.show.rCharts <- function(widget) widget$show('inline', include_assets=TRUE, cdn=TRUE)| """.stripMargin)- }
- logger.info("KnitR: Finished initial commands")
- }
- def interpret(st: String, context: InterpreterContext): InterpreterResult = try { {
What is the need for double { ?—
Reply to this email directly or view it on GitHub.
353eaa6 to
673b7bd
Compare
|
The check is failing because R is missing in Travis-this is ready for people to start playing with. |
|
Ok I'm temporarily giving up on Travis - these test failures seem to all be spurious. |
ffeecb5 to
c078252
Compare
|
All: This should now be functional. Tests pass (there may still be some spurious errors on Travis, but I'm not seeing them). And it should run out-of-box. People should definitely start looking at it. |
00433bb to
2e1fe2e
Compare
|
On the basic RInterpreter notebook, the first paragraph fails with:
What assumptions/prerequisites is it looking for in the installation? |
|
If you look in the log, it should show what packages are missing. Repr shouldn't be needed unless something else is going wrong. Can you send me your complete log? It sounds like what's happening is that initialization is failing if some packages are missing and that definitely has to be fixed.
|
|
@jabowles Pls give what I just pushed a try and see if it resolves your issue. I really need a log to be sure I've gotten to the bottom of it, but this may do it. |
|
@elbamos found it working great in all scenarios involving SparkR, KnitR and various plots/charts. Great work !!! |
|
Thanks for the contribution and 1. Scala->R invocationYou seem to use rscala and separate connection to make Using similar technique that PySparkInterpreter does for py4j in R Interpreter will help
2. KnitR InteprreterSeems like calling knit2html() is the role. 3. Author, Copyrights and Maintainer tagZeppelin discourages Author tag. as well as Copyright / Maintainer if it is not really necessary. |
|
Please let me know who from the PPMC I can work with to resolve the continuing problems with the travis build and zeppelin-spark architecture. Regarding your comments, as I've explained in our e-mail exchange: 1. Scala-R invocationI am not using rscala. It is not possible to use the SparkR connection in the way you describe. I did look into this early on. There are numerous packages for interfacing the jvm and R. None of them use two-way connections. The python-spark and zeppelin integrations you describe leverage an external dependency. There is no comparable package available for R that has a compatible license. 2. Knitr*As I understand, you don't use R, so it may seem strange to have a separate interpreter rather than a function. That's understandable. The distinction between the r-repl and knitr interpreters makes perfect sense for people who are coming from R. The repl and knitr handle code, and errors, and output, in fundamentally different ways. They have different capabilities. It is not possible, consistent with the zeppelin architecture, to put both capabilities into a single interpreter without making the use of that interpreter very unintuitive for someone coming from an R background. The knit2html() command is something no R user would ever use when making use of R. It is perhaps best thought of as part of the "R operating system." 3, The author tagThat's really fine, but in my view this is the lowest-priority possible item. The highest priority is the travis build problems. Travis consistently fails building parts other than rzeppelin. The other high priority is consistency with the Zeppelin-Spark interface, which has grown quite complex and difficult to use. My users have had a long stream of issues getting Spark to work through zeppelin. They get reported to me as rzeppelin issues, but have all turned out to be issues in the way zeppelin and spark interface, e.g., with conflicts between SPARK_HOME and spark.home. rzeppelin needs to be consistent with the rest of the Zeppelin architecture in that regard. This is not something I can fix because I don't own that code. |
Correction rscala -> forked-rscala. You don't need to find any other external package for two-way connection. You can make two-way PySpark implements one way connection I guess RInterpreter can do the same, with SparkR. It'll able to make
Yes i'm not familiar with R. So please convince me. What KnitR Interpreter doing is basically and basic usage i found from KnitR website is So, to me, it's hard to imagine why functions like By the way, KnitR is GPL license. I don't think Zeppelin can have a feature that depends on GPL licensed code.
License and Copyright problems are one of the highest priority item in Zeppelin project
Latest exception from your CI Build is Please try to increase PermGem memory option in the test.
And https://issues.apache.org/jira/browse/ZEPPELIN-421 will address removal of spark.home property, in Zeppelin setting window. But until that, you can simply not trying to set spark.home. And technically you don't own Zeppelin code but ASF does. but nothing stops you fix the problem. |
|
Another one, is location of this package. If it is implementing general R Interpreter, i think having separate submodule, like as-is make sense. Also, this approach is more consistent with package location of PySparkInterpreter, which is also part of Spark Interpreter group. |
|
I will be rebasing over the next few days. I the meantime I'm going to go through the issues you raised in sequence... 1. The R-Scala InterfaceI've explained several times that the proposal to use SparkR bi-directionally doesn't work. I don't feel that I have more to add about that. I will try to reduce the size of the code that originated in rscala. If this is not going to be acceptable to the PCCM, please tell me now. 2. KnitRWhat you're proposing is that users enter the same boilerplate, which they would have to figure out for themselves, every time they want to use knitr. Knitr and the repl are fundamentally two different ways for users to interact with R. They have very different behaviors in terms of error reporting and handling visualizations. If you don't want to trust me about this, then I suggest we ask some other R users what makes the most sense. If this is not going to be acceptable to the PCCM, please tell me now. 3. KnitR GPL LicenseBy the way, KnitR is GPL license. I don't think Zeppelin can have a feature that depends on GPL licensed code. KnitR is an optional external dependency. This is not a licensing problem. It is also not a licensing problem to interact with GPL code that isn't supplied with Zeppelin. For example, R itself is GPL code. So it Zeppelin cannot interact with external GPL code, then there cannot be an R interpreter in Zeppelin at all. Considering that Spark interacts with R, I think this issue is closed. 4. License any copyrightLicense and Copyright problems are one of the highest priority item in Zeppelin project Huh? What we were talking about is who gets identified in code as the author. That is obviously not a license/copyright issue, its an issue of credit. You said that it is discouraged to identify anybody as the author in Apache projects. However, the current code does identify authors, with you identified as the principal author. So, at this point I'm not sure what you're referring to? 5. Location of the PackageIts fine with me if it goes under /spark. The reason its in /r is that it simplifies testing and development. Someone will have to merge the two build scripts; I'm using scalatest for testing. 6. Travis BuildsActually the error in the travis logs begins with this:
That's not coming from rzeppelin. That's coming from the SparkInterpreter when rzeppelin asks it to initiate a spark backend. This is what I mean about issues in the spark-zeppelin interface. |
|
@Leemoonsoo @sourav-mazumder @Emaasit @felixcheung Actually, I just watched the youtube video in which, in September, Lee and Felix demo'd this code at the Seattle R users group. Lee, I will be e-mailing you my phone number. Call me. |
|
@elbamos Hi, You're misunderstanding the authorship of codes from Apache project including Zeppelin. What @Leemoonsoo said to you is we have tracked your contribution "with git commit logs" not being in codes. You should remove your developer tag in the @Leemoonsoo I think that we need to enrich our documentations. The title of this PR doesn't have any connected number of issue from JIRA. |
|
I have no idea what you were trying to say. It appears to me that your primary concern is the names identified in the pom files as the authors? Is that correct? (Many of the files you are referencing, under rzeppelin/, are part of the source that has to be there for an R package.) It appears to me that I have contributed in the correct manner, but there may be some people affiliated with this project who have other issues. |
|
@elbamos Could you please tell me the usage of |
|
@elbamos Do you mean that it's ok because of locating the files under the rzeppelin/, and do you mean rzeppelin is not a part of the contribution? |
|
@jongyoul What's excluded from rat are files that get generated automatically during the build process -- they are the R equivalent of a .class or .jar file. There are two rzeppelin folders. One contains R source code. The other contains the actual compiled R package. Console.scala is used to create a virtual interface to R. I believe its called by an implicit, but that is actually part from rscala that I haven't tinkered with. |
|
I believe that rebase to latest master must eliminate most of CI issues. Did one more round of reviews and updated #208 (comment) - a minor cleanup of licenses is needed. |
|
@bzz We are green! Good to go? |
pom.xml
Outdated
| <exclude>docs/Gemfile.lock</exclude> | ||
|
|
||
| <!-- compiled R packages (binaries) --> | ||
| <exclude>R/**</exclude> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does not this exclude whole ./r/ maven sub-module (assuming cases-insensitive FS)?
AFAIK compiled R packages now live under ~/R and fine-grained excludes should be (and already are) managed by ./r/pom.xml
|
Looks awesome! Copied them here for convenience:
and we should be good to go! |
|
@bzz The push I just made should reflect all of the latest requests. Thank you for pointing out the effect of a case-insensitive FS. There are no substantive changes in this push, so if its going red, its just the random-CI stuff. |
Working on CI CI CI CI CI permissions CI Should be good Triggering CI squashme - force push squashme CI Removing unused dependency squashme squashme squashme squashme squashme squashme License changes requested by @bzz squashme
|
@bzz Are we good? |
### What is this PR for? Improvement of Virtual Machine Script to support R interpretor ### What type of PR is it? Improvement of Virtual Machine Script to support R interpretor ### Todos * [x] - Test with #208 * [x] - Test with #702 ### Is there a relevant Jira issue? Zeppelin-700 ### How should this be tested? Follow the steps in this Read Me to build a VM from scratch: https://github.com/apache/incubator-zeppelin/blob/master/scripts/vagrant/zeppelin-dev/README.md ### Screenshots (if appropriate) ### Questions: * Does the licenses files need update? Added to all headers * Is there breaking changes for older versions? No * Does this needs documentation? Yes, this will be a separate PR to update docs and README Author: Jeff Steinmetz <[email protected]> Closes #751 from jeffsteinmetz/ZEPPELIN-700 and squashes the following commits: e03dba5 [Jeff Steinmetz] update to support R interpreter sparkr build profile a9a2052 [Jeff Steinmetz] add base64enc, repr and htmltools bdcdf5f [Jeff Steinmetz] removed packages not required for pr702. repr not needed - base64encode not needed - htmltools not needed d497994 [Jeff Steinmetz] fix so that devtools will install 5643fc6 [Jeff Steinmetz] plotting in r interpreter requires the repr package. install devtools package first so repr can be installed. fb18a2b [Jeff Steinmetz] plotting in r interpreter requires the repr package. install devtools package first so repr can be installed. ef8f638 [Jeff Steinmetz] ZEPPELIN-700. Add Ansible R role to Virtual Machine to support dependencies for the R Interpreter b940bcd [Jeff Steinmetz] ZEPPELIN-700. Add Ansible R role to Virtual Machine to support dependencies for the R Interpreter
### What is this PR for? Improvement of Virtual Machine Script to support R interpretor ### What type of PR is it? Improvement of Virtual Machine Script to support R interpretor ### Todos * [x] - Test with apache#208 * [x] - Test with apache#702 ### Is there a relevant Jira issue? Zeppelin-700 ### How should this be tested? Follow the steps in this Read Me to build a VM from scratch: https://github.com/apache/incubator-zeppelin/blob/master/scripts/vagrant/zeppelin-dev/README.md ### Screenshots (if appropriate) ### Questions: * Does the licenses files need update? Added to all headers * Is there breaking changes for older versions? No * Does this needs documentation? Yes, this will be a separate PR to update docs and README Author: Jeff Steinmetz <[email protected]> Closes apache#751 from jeffsteinmetz/ZEPPELIN-700 and squashes the following commits: e03dba5 [Jeff Steinmetz] update to support R interpreter sparkr build profile a9a2052 [Jeff Steinmetz] add base64enc, repr and htmltools bdcdf5f [Jeff Steinmetz] removed packages not required for pr702. repr not needed - base64encode not needed - htmltools not needed d497994 [Jeff Steinmetz] fix so that devtools will install 5643fc6 [Jeff Steinmetz] plotting in r interpreter requires the repr package. install devtools package first so repr can be installed. fb18a2b [Jeff Steinmetz] plotting in r interpreter requires the repr package. install devtools package first so repr can be installed. ef8f638 [Jeff Steinmetz] ZEPPELIN-700. Add Ansible R role to Virtual Machine to support dependencies for the R Interpreter b940bcd [Jeff Steinmetz] ZEPPELIN-700. Add Ansible R role to Virtual Machine to support dependencies for the R Interpreter
|
@elbamos awesome, thank you! Please, let me do a final pass and I'll post back ASAP. Just FYI - I'm still working on fixing random CI failures under ZEPPELIN-783 (not directly related\blocking this work) |
|
@bzz thanks! Yeah, I was going to suggest, that getting CI to work reliably is probably something important to the project beyond the scope of this PR. One thing that might offer a quick improvement, is @jeffsteinmetz's suggestion that we stop supporting older versions of Spark (e.g., 1.1, 1.2). That should reduce the random test failures by at least 20%. |
|
@elbamos everything looks good, except one last thing - BSD license text need to be removed from Sorry for not pointing it out earlier, but reviewing whole 4000+LOC diff everytime at once is a bit uneasy. I think we should be good with merging it right after that. And yes, improving project CI is a priority and will be continued. |
|
@bzz So we should be good to merge now right? |
|
Looks good to me. thank you for the efforts. I think it is ready to be merged, and if there is no more discussion - I will merge it tomorrow. |
|
@bzz the push I just made fixed an issue that if -Pr wasn't specified, rat would complain about the licenses of files under the r/ subdirectory. There are no other changes. |
|
+1 |
### What is this PR for? Recently #208 and #702 are merged into master branch. They passed the CI test individually, but failing after both merged. * zeppelin-web build error ``` [ERROR] npm ERR! registry error parsing json [ERROR] npm http 200 https://registry.npmjs.org/bower/1.7.2 [ERROR] npm ERR! SyntaxError: Unexpected token � [ERROR] npm ERR! ������Y[o�6��+��6��u��lE��� [ERROR] npm ERR! �{h��C�ˑ�L�=RJ���o�9b�4����W4��["��wn�����E����2C�ϕn���U`���a� [ERROR] npm ERR! G�p^��$�e��Ley,��IU�"/K�,qr�[8�F���^���p�������Z�����=x?�}��{W�+���ܳЀ쵱���} ``` * 'SparkRInterpreter.java' and 'RRepl.java' uses the same interpreter name. 'spark.r'. That conflicts and make https://github.com/apache/incubator-zeppelin/blob/master/zeppelin-server/src/test/java/org/apache/zeppelin/rest/ZeppelinSparkClusterTest.java#L87 fails. * R.md and r.md both exists under same directory. That confuses git client. ### What type of PR is it? Hot Fix ### Todos * [x] - Merge R.md and r.md * [x] - Fix zeppelin-web build error * [x] - Change interpreter listing order ### What is the Jira issue? ### How should this be tested? ### Screenshots (if appropriate) ### Questions: * Does the licenses files need update? no * Is there breaking changes for older versions? no * Does this needs documentation? no Author: Lee moon soo <[email protected]> This patch had conflicts when merged, resolved by Committer: Lee moon soo <[email protected]> Closes #815 from Leemoonsoo/r_hotfix and squashes the following commits: eeb411e [Lee moon soo] Change interpreter listing order 9baf57b [Lee moon soo] Change node and npm version 6854ac7 [Lee moon soo] R.md -> r.md
### What is this PR for? It forces travis CI to use container-based infra for everything, not only `master`. It will result in [many benefits](https://docs.travis-ci.com/user/migrating-from-legacy/#Why-migrate-to-container-based-infrastructure%3F), improving CI stability and including reducing [obscure CI failures \w old Spark versions](https://issues.apache.org/jira/browse/ZEPPELIN-776?focusedCommentId=15219388&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15219388) ### What type of PR is it? Improvement, Hotfix for apache#208 ### What is the Jira issue? [ZEPPELIN-776](https://issues.apache.org/jira/browse/ZEPPELIN-776) ### How should this be tested? CI should be green, that is all. ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? No Author: Alexander Bezzubov <[email protected]> Closes apache#808 from bzz/ZEPPELIN-776-force-ci-to-container-infra and squashes the following commits: 0e550f6 [Alexander Bezzubov] Remove obsolete notifications hook d56ed0d [Alexander Bezzubov] ZEPPELIN-776: force CI to always use container-based infra
### What is this PR for? Improvement of Virtual Machine Script to support R interpretor ### What type of PR is it? Improvement of Virtual Machine Script to support R interpretor ### Todos * [x] - Test with apache#208 * [x] - Test with apache#702 ### Is there a relevant Jira issue? Zeppelin-700 ### How should this be tested? Follow the steps in this Read Me to build a VM from scratch: https://github.com/apache/incubator-zeppelin/blob/master/scripts/vagrant/zeppelin-dev/README.md ### Screenshots (if appropriate) ### Questions: * Does the licenses files need update? Added to all headers * Is there breaking changes for older versions? No * Does this needs documentation? Yes, this will be a separate PR to update docs and README Author: Jeff Steinmetz <[email protected]> Closes apache#751 from jeffsteinmetz/ZEPPELIN-700 and squashes the following commits: e03dba5 [Jeff Steinmetz] update to support R interpreter sparkr build profile a9a2052 [Jeff Steinmetz] add base64enc, repr and htmltools bdcdf5f [Jeff Steinmetz] removed packages not required for pr702. repr not needed - base64encode not needed - htmltools not needed d497994 [Jeff Steinmetz] fix so that devtools will install 5643fc6 [Jeff Steinmetz] plotting in r interpreter requires the repr package. install devtools package first so repr can be installed. fb18a2b [Jeff Steinmetz] plotting in r interpreter requires the repr package. install devtools package first so repr can be installed. ef8f638 [Jeff Steinmetz] ZEPPELIN-700. Add Ansible R role to Virtual Machine to support dependencies for the R Interpreter b940bcd [Jeff Steinmetz] ZEPPELIN-700. Add Ansible R role to Virtual Machine to support dependencies for the R Interpreter
This is the initial PR for an R Interpreter for Zeppelin. There's still some work to be done (e.g., tests), but its useable, it brings to Zeppelin features from R like its library of statistics and machine learning packages, as well as advanced interactive visualizations. So I'd like to open it up for others to comment and/or become involved. Summary: - There are two interpreters, one emulates a REPL, the other uses knitr to weave markdown and formatted R output. The two interpreters share a single execution environment. - Visualisations: Besides R's own graphics, this also supports interactive visualizations with googleVis and rCharts. I am working on htmlwidgets (almost done) with the author of that package, and a next-step project is to get Shiny/ggvis working. Sometimes, a visualization won't load until the page is reloaded. I'm not sure why this is. - Licensing: To talk to R, this integrates code forked from rScala. rScala was released with a BSD-license option, and the author's permission was obtained. - Spark: Getting R to share a single spark context with the Spark interpreter group is going to be a project. For right now, the R interpreters live in their own "r" interpreter group, and new spark contexts are created on startup. - Zeppelin Context: Not yet integrated, in significant part because there's no ZeppelinContext to talk to until it lives in the Spark interpreter group. - Documentation: A notebook is included that demonstrates what the interpreter does and how to use it. - Tests: Working on it... P.S.: This is my first PR on a project of this size; let me know what I messed up and I'll try to fix it ASAP. Author: Amos Elb <[email protected]> Author: Amos B. Elberg <[email protected]> Closes apache#208 from elbamos/rinterpreter and squashes the following commits: ffc1a25 [Amos Elb] Fix rat issue a08ec5b [Amos B. Elberg] R Interpreter
### What is this PR for? Recently apache#208 and apache#702 are merged into master branch. They passed the CI test individually, but failing after both merged. * zeppelin-web build error ``` [ERROR] npm ERR! registry error parsing json [ERROR] npm http 200 https://registry.npmjs.org/bower/1.7.2 [ERROR] npm ERR! SyntaxError: Unexpected token � [ERROR] npm ERR! ������Y[o�6��+��6��u��lE��� [ERROR] npm ERR! �{h��C�ˑ�L�=RJ���o�9b�4����W4��["��wn�����E����2C�ϕn���U`���a� [ERROR] npm ERR! G�p^��$�e��Ley,��IU�"/K�,qr�[8�F���^���p�������Z�����=x?�}��{W�+���ܳЀ쵱���} ``` * 'SparkRInterpreter.java' and 'RRepl.java' uses the same interpreter name. 'spark.r'. That conflicts and make https://github.com/apache/incubator-zeppelin/blob/master/zeppelin-server/src/test/java/org/apache/zeppelin/rest/ZeppelinSparkClusterTest.java#L87 fails. * R.md and r.md both exists under same directory. That confuses git client. ### What type of PR is it? Hot Fix ### Todos * [x] - Merge R.md and r.md * [x] - Fix zeppelin-web build error * [x] - Change interpreter listing order ### What is the Jira issue? ### How should this be tested? ### Screenshots (if appropriate) ### Questions: * Does the licenses files need update? no * Is there breaking changes for older versions? no * Does this needs documentation? no Author: Lee moon soo <[email protected]> This patch had conflicts when merged, resolved by Committer: Lee moon soo <[email protected]> Closes apache#815 from Leemoonsoo/r_hotfix and squashes the following commits: eeb411e [Lee moon soo] Change interpreter listing order 9baf57b [Lee moon soo] Change node and npm version 6854ac7 [Lee moon soo] R.md -> r.md
### What is this PR for? This PR apply fix #769 again, which is reverted by #208. Also removing unnecessary code from interpreter.sh ### What type of PR is it? Bug Fix ### Todos * [x] - Apply #769 again * [x] - Remove unnecessary code ### What is the Jira issue? ### How should this be tested? ### Screenshots (if appropriate) ### Questions: * Does the licenses files need update? no * Is there breaking changes for older versions? no * Does this needs documentation? no Author: Lee moon soo <[email protected]> Closes #889 from Leemoonsoo/fix_interpreter_sh_classpath and squashes the following commits: 8468fd5 [Lee moon soo] Remove unnecessary construction of classpath ea8fee8 [Lee moon soo] Apply pr769 again, reverted by pr208
### What is this PR for? This PR apply fix apache#769 again, which is reverted by apache#208. Also removing unnecessary code from interpreter.sh ### What type of PR is it? Bug Fix ### Todos * [x] - Apply apache#769 again * [x] - Remove unnecessary code ### What is the Jira issue? ### How should this be tested? ### Screenshots (if appropriate) ### Questions: * Does the licenses files need update? no * Is there breaking changes for older versions? no * Does this needs documentation? no Author: Lee moon soo <[email protected]> Closes apache#889 from Leemoonsoo/fix_interpreter_sh_classpath and squashes the following commits: 8468fd5 [Lee moon soo] Remove unnecessary construction of classpath ea8fee8 [Lee moon soo] Apply pr769 again, reverted by pr208
This is the initial PR for an R Interpreter for Zeppelin. There's still some work to be done (e.g., tests), but its useable, it brings to Zeppelin features from R like its library of statistics and machine learning packages, as well as advanced interactive visualizations. So I'd like to open it up for others to comment and/or become involved.
Summary:
P.S.: This is my first PR on a project of this size; let me know what I messed up and I'll try to fix it ASAP.