-
Notifications
You must be signed in to change notification settings - Fork 2.8k
[ZEPPELIN-840] Scalding interpreter that works in hdfs mode #917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Great improvement! Quick question - is there a reason you prefer to keep some classes in Both CI failures happen on some tests, like I have not seen before Seems not related but definitely deserves a JIRA with |
|
As #862 is merged it should fix the CI, so could you please rebase on the latest master and see if that helps? |
|
I had originally created ZeppelinScaldingShell in the org.apache.zeppelin.scalding package. I moved it to com.twitter.scalding so that I could access "private[scalding] var storedHdfsMode". I am planning to create a PR in https://github.com/twitter/scalding to make that field protected. Once that happens, I will move ZeppelinScaldingShell to org.apache.zeppelin.scalding. Does that make sense? I created https://issues.apache.org/jira/browse/ZEPPELIN-888 for the DummyNotebookRepo class not found issue. I merged #862 and the selenium test passed! But one of the jobs failed with the error below. I am going to start another build. I really appreciate you looking into the flaky builds and tests! |
|
@prasadwagle nice! Also, do you plan to update any interpreter documentation as part of this PR or is it ready to merge? |
| * limitations under the License. | ||
| */ | ||
|
|
||
| package com.twitter.scalding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be a zeppelin package?
…ssion:hadoop-lzo:jar:0.4.19 dependency
|
@felixcheung yes, it should be. I have moved the classes to org.apache.zeppelin.scalding but we need this scalding change. The scalding.version in scalding/pom.xml is 0.16.1-SNAPSHOT. I will change that when the scalding team publishes new artifacts. cc: @rubanm |
…pression:hadoop-lzo:jar:0.4.19 dependency
|
This seems to be moving the whole build to JDK8 and moving some components to only support scala 2.11 ? Is that really the plan ? |
|
Hmm... I'm not sure sure we should move to jdk8 completely by default - there are some interpreters that only run with jdk7 last time we checked. Similarly for Scala 2.11, but that's per-interpreter. |
|
Agreed @felixcheung , at least we should have a discussion on the dev list if we plan to have some changes like this. |
|
@felixcheung @lresende I agree. I wanted to see the build errors before I started a discussion on the dev list. In the previous build, one check didn't complete, many failed with "Failed to transfer file: http://archive.apache.org/dist/spark/spark-1.5.2/spark-1.5.2.tgz" and one succeeded. I am going to start the build again. |
|
There were build failures with the java 1.8 change that seemed due to flaky tests. In the latest build, all checks succeeded except one that failed due a flaky test tracked in https://issues.apache.org/jira/browse/ZEPPELIN-862. |
|
All checks passed in the last build. Can someone please review? There is interest in the cascading and scalding community for this interpreter. |
zeppelin-server/pom.xml
Outdated
| <groupId>org.scala-lang</groupId> | ||
| <artifactId>scala-library</artifactId> | ||
| <version>2.10.4</version> | ||
| <version>2.11.8</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is change here (zeppelin-server/pom.xml) required to change scala dependency version in scalding library?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I run zeppelin on my laptop with bin/zeppelin.sh, the scalding RemoteInterpreterServer process classpath has zeppelin-server classes before scalding. This causes 2.10.4 scala libraries to be used resulting in the errors when I run scalding which uses 2.11.8 scala libraries. Since scala is used only for scalatest in zeppelin-server and the tests run fine when I use scalatest_2.11, I thought I would keep this change and get your feedback. Let me know if you want me to revert this change. The scalding interpreter runs fine in production where we use zeppelin-distribution created by "mvn package -Pbuild-distr".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just tried this branch and printed a class path of interpreter process (without -Pbuild-distr). And my classpath is
-cp ::/zeppelin/interpreter/scalding/*:/zeppelin/zeppelin-interpreter/target/lib/*::/zeppelin/conf:/zeppelin/conf:/zeppelin/zeppelin-interpreter/target/classes
Could you verify classpath of scalding interpreter process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the "ps aux" I got running zeppelin.sh. It looks like interpreter.sh is picking up ZEPPELIN_CLASSPATH from the server process. Am I doing something wrong?
tw-172-25-130-178 incubator-zeppelin-prasadwagle (ZEPPELIN-840) $ ps aux | grep zeppelin
pwagle 91376 0.1 1.0 5221060 159804 s001 S 11:21AM 0:21.61 /Library/Java/JavaVirtualMachines/jdk1.8.0_65.jdk/Contents/Home/bin/java -Dfile.encoding=UTF-8 -Xms1024m -Xmx1024m -XX:MaxPermSize=512m -Dlog4j.configuration=file:///Users/pwagle/workspace/zeppelin/incubator-zeppelin-prasadwagle/conf/log4j.properties -Dzeppelin.log.file=/Users/pwagle/workspace/zeppelin/incubator-zeppelin-prasadwagle/logs/zeppelin-pwagle-tw-172-25-130-178.office.twttr.net.log -cp :.:/usr/local/lib/antlr-4.0-complete.jar::/Users/pwagle/workspace/zeppelin/incubator-zeppelin-prasadwagle/zeppelin-server/target/lib/*:/Users/pwagle/workspace/zeppelin/incubator-zeppelin-prasadwagle/zeppelin-zengine/target/lib/*:/Users/pwagle/workspace/zeppelin/incubator-zeppelin-prasadwagle/zeppelin-interpreter/target/lib/*:/Users/pwagle/workspace/zeppelin/incubator-zeppelin-prasadwagle/*::/Users/pwagle/workspace/zeppelin/incubator-zeppelin-prasadwagle/conf:/Users/pwagle/workspace/zeppelin/incubator-zeppelin-prasadwagle/zeppelin-interpreter/target/classes:/Users/pwagle/workspace/zeppelin/incubator-zeppelin-prasadwagle/zeppelin-zengine/target/classes:/Users/pwagle/workspace/zeppelin/incubator-zeppelin-prasadwagle/zeppelin-server/target/classes org.apache.zeppelin.server.ZeppelinServer
pwagle 91390 0.1 2.8 5105608 465152 s001 S 11:21AM 0:21.49 /Library/Java/JavaVirtualMachines/jdk1.8.0_65.jdk/Contents/Home/bin/java -Dfile.encoding=UTF-8 -Dlog4j.configuration=file:///Users/pwagle/workspace/zeppelin/incubator-zeppelin-prasadwagle/conf/log4j.properties -Dzeppelin.log.file=/Users/pwagle/workspace/zeppelin/incubator-zeppelin-prasadwagle/logs/zeppelin-interpreter-scalding-pwagle-tw-172-25-130-178.office.twttr.net.log -Xms1024m -Xmx1024m -XX:MaxPermSize=512m -cp :.:/usr/local/lib/antlr-4.0-complete.jar::/Users/pwagle/workspace/zeppelin/incubator-zeppelin-prasadwagle/zeppelin-server/target/lib/*:/Users/pwagle/workspace/zeppelin/incubator-zeppelin-prasadwagle/zeppelin-zengine/target/lib/*:/Users/pwagle/workspace/zeppelin/incubator-zeppelin-prasadwagle/zeppelin-interpreter/target/lib/*:/Users/pwagle/workspace/zeppelin/incubator-zeppelin-prasadwagle/*::/Users/pwagle/workspace/zeppelin/incubator-zeppelin-prasadwagle/conf:/Users/pwagle/workspace/zeppelin/incubator-zeppelin-prasadwagle/zeppelin-interpreter/target/classes:/Users/pwagle/workspace/zeppelin/incubator-zeppelin-prasadwagle/zeppelin-zengine/target/classes:/Users/pwagle/workspace/zeppelin/incubator-zeppelin-prasadwagle/zeppelin-server/target/classes:/Users/pwagle/workspace/zeppelin/incubator-zeppelin-prasadwagle/interpreter/scalding/*:/Users/pwagle/workspace/zeppelin/incubator-zeppelin-prasadwagle/zeppelin-interpreter/target/lib/*::/Users/pwagle/workspace/zeppelin/incubator-zeppelin-prasadwagle/conf:/Users/pwagle/workspace/zeppelin/incubator-zeppelin-prasadwagle/conf:/Users/pwagle/workspace/zeppelin/incubator-zeppelin-prasadwagle/zeppelin-interpreter/target/classes org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer 63581
pwagle 92022 0.1 0.6 5020016 96164 s001 S 12:18PM 0:01.18 /Library/Java/JavaVirtualMachines/jdk1.8.0_65.jdk/Contents/Home/bin/java -Dfile.encoding=UTF-8 -Dlog4j.configuration=file:///Users/pwagle/workspace/zeppelin/incubator-zeppelin-prasadwagle/conf/log4j.properties -Dzeppelin.log.file=/Users/pwagle/workspace/zeppelin/incubator-zeppelin-prasadwagle/logs/zeppelin-interpreter-md-pwagle-tw-172-25-130-178.office.twttr.net.log -Xms1024m -Xmx1024m -XX:MaxPermSize=512m -cp :.:/usr/local/lib/antlr-4.0-complete.jar::/Users/pwagle/workspace/zeppelin/incubator-zeppelin-prasadwagle/zeppelin-server/target/lib/*:/Users/pwagle/workspace/zeppelin/incubator-zeppelin-prasadwagle/zeppelin-zengine/target/lib/*:/Users/pwagle/workspace/zeppelin/incubator-zeppelin-prasadwagle/zeppelin-interpreter/target/lib/*:/Users/pwagle/workspace/zeppelin/incubator-zeppelin-prasadwagle/*::/Users/pwagle/workspace/zeppelin/incubator-zeppelin-prasadwagle/conf:/Users/pwagle/workspace/zeppelin/incubator-zeppelin-prasadwagle/zeppelin-interpreter/target/classes:/Users/pwagle/workspace/zeppelin/incubator-zeppelin-prasadwagle/zeppelin-zengine/target/classes:/Users/pwagle/workspace/zeppelin/incubator-zeppelin-prasadwagle/zeppelin-server/target/classes:/Users/pwagle/workspace/zeppelin/incubator-zeppelin-prasadwagle/interpreter/md/*:/Users/pwagle/workspace/zeppelin/incubator-zeppelin-prasadwagle/zeppelin-interpreter/target/lib/*::/Users/pwagle/workspace/zeppelin/incubator-zeppelin-prasadwagle/conf:/Users/pwagle/workspace/zeppelin/incubator-zeppelin-prasadwagle/conf:/Users/pwagle/workspace/zeppelin/incubator-zeppelin-prasadwagle/zeppelin-interpreter/target/classes org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer 63923
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following is 'ps aux | grep zeppelin' result from my system, and it doesn't not include zeppelin-server in the classpath.
Lees-MacBook:zeppelin-review moon$ ps aux | grep zeppelin
moon 7630 0.0 5.8 4262896 486108 s004 S 12:48PM 0:20.84 /Library/Java/JavaVirtualMachines/jdk1.7.0_79.jdk/Contents/Home/bin/java -Dfile.encoding=UTF-8 -Dlog4j.configuration=file:///Users/moon/Projects/zeppelin-review/conf/log4j.properties -Dzeppelin.log.file=/Users/moon/Projects/zeppelin-review/logs/zeppelin-interpreter-scalding-moon-Lees-MacBook.attlocal.net.log -Xms1024m -Xmx1024m -XX:MaxPermSize=512m -cp ::/Users/moon/Projects/zeppelin-review/interpreter/scalding/*:/Users/moon/Projects/zeppelin-review/zeppelin-interpreter/target/lib/*::/Users/moon/Projects/zeppelin-review/conf:/Users/moon/Projects/zeppelin-review/conf:/Users/moon/Projects/zeppelin-review/zeppelin-interpreter/target/classes org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer 59008
moon 7623 0.0 0.0 2454888 1264 s004 S 12:48PM 0:00.02 /bin/bash /Users/moon/Projects/zeppelin-review/bin/interpreter.sh -d /Users/moon/Projects/zeppelin-review/interpreter/scalding -p 59008 -l /Users/moon/Projects/zeppelin-review/local-repo/2BPXMN9Y8
moon 7619 0.0 5.0 4311156 419968 s004 S 12:47PM 0:08.45 /Library/Java/JavaVirtualMachines/jdk1.7.0_79.jdk/Contents/Home/bin/java -Dfile.encoding=UTF-8 -Xms1024m -Xmx1024m -XX:MaxPermSize=512m -Dlog4j.configuration=file:///Users/moon/Projects/zeppelin-review/conf/log4j.properties -Dzeppelin.log.file=/Users/moon/Projects/zeppelin-review/logs/zeppelin-moon-Lees-MacBook.attlocal.net.log -cp ::/Users/moon/Projects/zeppelin-review/zeppelin-server/target/lib/*:/Users/moon/Projects/zeppelin-review/zeppelin-zengine/target/lib/*:/Users/moon/Projects/zeppelin-review/zeppelin-interpreter/target/lib/*:/Users/moon/Projects/zeppelin-review/*::/Users/moon/Projects/zeppelin-review/conf:/Users/moon/Projects/zeppelin-review/zeppelin-interpreter/target/classes:/Users/moon/Projects/zeppelin-review/zeppelin-zengine/target/classes:/Users/moon/Projects/zeppelin-review/zeppelin-server/target/classes org.apache.zeppelin.server.ZeppelinServer
I build this branch and started without any configuration and then started scalding interpreter by running %scalding.
Could you suggest way to reproduce the problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Leemoonsoo - Sorry for the trouble. I found that my .local.bash had "export CLASSPATH" which caused the interpreter.sh to pick up zeppelin server classpath. I reverted the scalding version change to zeppelin-server/pom.xml. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Leemoonsoo The selenium check is failing with the error below. Is this due to a known flaky test issue? cc @bzz
https://travis-ci.org/apache/incubator-zeppelin/jobs/135722600
Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 75.38 sec <<< FAILURE! - in org.apache.zeppelin.integration.ParagraphActionsIT
testRemoveButton(org.apache.zeppelin.integration.ParagraphActionsIT) Time elapsed: 9.425 sec <<< FAILURE!
java.lang.AssertionError: After Remove : Number of paragraphs are
Expected: <2>
but: was <1>
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
at org.junit.Assert.assertThat(Assert.java:865)
at org.junit.rules.ErrorCollector$1.call(ErrorCollector.java:65)
at org.junit.rules.ErrorCollector.checkSucceeds(ErrorCollector.java:78)
at org.junit.rules.ErrorCollector.checkThat(ErrorCollector.java:63)
at org.apache.zeppelin.integration.ParagraphActionsIT.testRemoveButton(ParagraphActionsIT.java:156)
Results :
Failed tests:
ParagraphActionsIT.testRemoveButton:156 After Remove : Number of paragraphs are
Expected: <2>
but: was <1>
Tests run: 14, Failures: 1, Errors: 0, Skipped: 0
| <id>scalding</id> | ||
| <modules> | ||
| <module>scalding</module> | ||
| </modules> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove scalding profile and includes in the module list result release script creates a binary package with scalding interpreter included. Therefore, we need to take care few more things for binary package release.
One thing is Zeppelin want to avoid build binary package for release with 3rd party repository but scalding interpreter needs two 3rd party repositories (conjars.org/repo, maven.twttr.com).
Another thing updating LICENSE for binary package, while scalding interpreter brings new dependency libraries into binary package. mvn -DskipTests -pl 'zeppelin-interpreter,scalding' package dependency:tree will give detailed list of dependencies (and transitive dependency).
Here's partial list of dependencies of scalding interpreter.
[INFO] +- com.twitter:scalding-core_2.11:jar:0.16.1-RC1:compile
[INFO] | +- com.twitter:scalding-serialization_2.11:jar:0.16.1-RC1:compile
[INFO] | +- com.twitter:maple:jar:0.16.1-RC1:compile
[INFO] | +- cascading:cascading-core:jar:2.6.1:compile
[INFO] | | +- riffle:riffle:jar:0.1-dev:compile
[INFO] | | +- thirdparty:jgrapht-jdk1.6:jar:0.8.1:compile
[INFO] | | \- org.codehaus.janino:janino:jar:2.7.5:compile
[INFO] | | \- org.codehaus.janino:commons-compiler:jar:2.7.5:compile
[INFO] | +- cascading:cascading-hadoop:jar:2.6.1:compile
[INFO] | +- cascading:cascading-local:jar:2.6.1:compile
[INFO] | | \- com.google.guava:guava:jar:15.0:compile
[INFO] | +- com.twitter:chill-hadoop:jar:0.7.3:compile
[INFO] | | \- com.esotericsoftware.kryo:kryo:jar:2.21:compile
[INFO] | | +- com.esotericsoftware.reflectasm:reflectasm:jar:shaded:1.07:compile
[INFO] | | | \- org.ow2.asm:asm:jar:4.0:compile
[INFO] | | +- com.esotericsoftware.minlog:minlog:jar:1.2:compile
[INFO] | | \- org.objenesis:objenesis:jar:1.2:compile
[INFO] | +- com.twitter:chill-java:jar:0.7.3:compile
[INFO] | +- com.twitter:chill-bijection_2.11:jar:0.7.3:compile
[INFO] | +- com.twitter:algebird-core_2.11:jar:0.12.0:compile
[INFO] | | \- com.googlecode.javaewah:JavaEWAH:jar:0.6.6:compile
[INFO] | +- com.twitter:bijection-core_2.11:jar:0.9.1:compile
[INFO] | +- com.twitter:bijection-macros_2.11:jar:0.9.1:compile
[INFO] | +- com.twitter:chill_2.11:jar:0.7.3:compile
[INFO] | \- com.twitter:chill-algebird_2.11:jar:0.7.3:compile
[INFO] +- com.twitter:scalding-args_2.11:jar:0.16.1-RC1:compile
[INFO] +- com.twitter:scalding-date_2.11:jar:0.16.1-RC1:compile
[INFO] +- com.twitter:scalding-commons_2.11:jar:0.16.1-RC1:compile
[INFO] | +- com.google.protobuf:protobuf-java:jar:2.4.1:compile
[INFO] | +- com.twitter.elephantbird:elephant-bird-cascading2:jar:4.8:compile
[INFO] | +- com.twitter.elephantbird:elephant-bird-core:jar:4.8:compile
[INFO] | | +- com.twitter.elephantbird:elephant-bird-hadoop-compat:jar:4.8:compile
[INFO] | | \- com.googlecode.json-simple:json-simple:jar:1.1:compile
[INFO] | \- com.hadoop.gplcompression:hadoop-lzo:jar:0.4.19:compile
[INFO] | \- commons-logging:commons-logging:jar:1.1.1:compile
[INFO] +- com.twitter:scalding-avro_2.11:jar:0.16.1-RC1:compile
[INFO] | +- cascading.avro:avro-scheme:jar:2.1.2:compile
[INFO] | | +- org.apache.avro:avro-mapred:jar:1.7.4:compile
[INFO] | | | +- org.apache.avro:avro-ipc:jar:1.7.4:compile
[INFO] | | | | +- org.mortbay.jetty:jetty:jar:6.1.26:compile
[INFO] | | | | +- org.apache.velocity:velocity:jar:1.7:compile
[INFO] | | | | \- org.mortbay.jetty:servlet-api:jar:2.5-20081211:compile
[INFO] | | | \- org.apache.avro:avro-ipc:jar:tests:1.7.4:compile
[INFO] | | \- cascading:cascading-xml:jar:2.1.6:compile
[INFO] | | \- org.ccil.cowan.tagsoup:tagsoup:jar:1.2:compile
[INFO] | \- org.apache.avro:avro:jar:1.7.4:compile
[INFO] | +- org.codehaus.jackson:jackson-core-asl:jar:1.8.8:compile
[INFO] | +- org.codehaus.jackson:jackson-mapper-asl:jar:1.8.8:compile
[INFO] | +- com.thoughtworks.paranamer:paranamer:jar:2.3:compile
[INFO] | +- org.xerial.snappy:snappy-java:jar:1.0.4.1:compile
[INFO] | \- org.apache.commons:commons-compress:jar:1.4.1:compile
[INFO] | \- org.tukaani:xz:jar:1.0:compile
[INFO] +- com.twitter:scalding-parquet_2.11:jar:0.16.1-RC1:compile
[INFO] | +- org.apache.parquet:parquet-column:jar:1.8.1:compile
[INFO] | | +- org.apache.parquet:parquet-common:jar:1.8.1:compile
[INFO] | | +- org.apache.parquet:parquet-encoding:jar:1.8.1:compile
[INFO] | | \- commons-codec:commons-codec:jar:1.5:compile
[INFO] | +- org.apache.parquet:parquet-hadoop:jar:1.8.1:compile
[INFO] | | +- org.apache.parquet:parquet-format:jar:2.3.0-incubating:compile
[INFO] | | \- org.apache.parquet:parquet-jackson:jar:1.8.1:compile
[INFO] | \- org.apache.parquet:parquet-thrift:jar:1.8.1:compile
[INFO] +- com.twitter:scalding-repl_2.11:jar:0.16.1-RC1:compile
[INFO] | \- jline:jline:jar:2.11:compile
[INFO] +- org.scala-lang:scala-library:jar:2.11.8:compile
[INFO] +- org.scala-lang:scala-compiler:jar:2.11.8:compile
[INFO] | +- org.scala-lang.modules:scala-xml_2.11:jar:1.0.4:compile
[INFO] | \- org.scala-lang.modules:scala-parser-combinators_2.11:jar:1.0.4:compile
[INFO] +- org.scala-lang:scala-reflect:jar:2.11.8:compile
They need to be addressed in zeppelin-distribution/src/bin_license/LICENSE file.
I recommend restore scalding profile and create separate issue for removing scalding profile. And we can take care of 3rd party repo and binary package license in the new issue.
|
@Leemoonsoo I understand and have made the changes you recommended. I also created https://issues.apache.org/jira/browse/ZEPPELIN-972. |
|
@prasadwagle Thanks! Looks good to me and merge if there're no more discussions |
What is this PR for?
Scalding interpreter that works in hdfs mode
What type of PR is it?
Improvement
Todos
What is the Jira issue?
ZEPPELIN-840
How should this be tested?
Screenshots (if appropriate)
Questions: