[ZEPPELIN-682] New interpreter for Apache Beam (incubating)/DataFlow #1334

mfelgamal · 2016-08-16T11:04:38Z

What is this PR for?

The PR is a interpreter for Apache Beam which is an open source unified platform for data processing pipelines. A pipeline can be build using one of the Beam SDKs.
The execution of the pipeline is done by different Runners . Currently, Beam supports Apache Flink Runner, Apache Spark Runner, and Google Dataflow Runner.

What type of PR is it?

Feature

Todos

Test case
Review Comments
Documentation

What is the Jira issue?

[ZEPPELIN-682]

How should this be tested?

Start the Zeppelin server
The prefix of interpreter is %beam and then write your code with required imports and the runner

Screenshots (if appropriate)

Questions:

Does the licenses files need update? no
Is there breaking changes for older versions? no
Does this needs documentation? yes

update from original

merge master

bzz · 2016-08-16T13:52:09Z

Thank you for contributing @mfelgamal ! Apache Beam interpreter is a valuable contribution many people are looking for.

There are few things that need to be done before we can merge it though:

this branch need to be rebased on top of the latest master (git rebase master) and conflicts must be resolved
establish the authorship of this work: this PR includes commits by you and @fouadma
make sure that every file in this PR, as any other file in ASF project, contain Apache2.0 license header
make sure that every file is formatted according to project's code conventions: pom.xml, .jave, etc.
From the quick glance at the Java code:
- please make sure you follow the conventions that are used in Zeppelin projects: brackets, sl4j logging. Best way is to look at other interpreters code
- classes should have comments explaining their purpose and extra empty lines as well as @author tags need to be removed
- zeppelin-distribution/src/bin_license must be updated to include ALL dependencies, AND their transitive dependencies that were added in this PR. You can see all of them in mvn dependency:tree.

Please, feel free to ping me after those issues are addressed and will be happy to look more into it and help you getting this merged!

merge master

mfelgamal · 2016-08-16T22:26:00Z

@bzz The required changes is done, in the recent commit.

bzz · 2016-08-17T05:15:31Z

Thank you for addressing the feedback promptly! Please let me take another pass on it and get back to you here.

bzz · 2016-08-17T05:27:00Z

Before proceed with review - I have noticed that CI is failing.
@mfelgamal could you please take a look and make sure that there are no Beam-related failures? Thanks!

So far there is

[INFO] Zeppelin: Beam interpreter ......................... FAILURE [  6.632 s]
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 02:46 min
[INFO] Finished at: 2016-08-16T23:45:31+00:00
[INFO] Final Memory: 97M/414M
[INFO] ------------------------------------------------------------------------
....
[ERROR] Failed to execute goal on project zeppelin-beam: Could not resolve dependencies for project org.apache.zeppelin:zeppelin-beam:jar:0.7.0-SNAPSHOT: The following artifacts could not be resolved: org.apache.beam:beam-runners-flink_2.11:jar:0.1.0-incubating, org.apache.beam:beam-runners-flink_2.11-examples:jar:0.1.0-incubating: Could not find artifact org.apache.beam:beam-runners-flink_2.11:jar:0.1.0-incubating in central (http://repo.maven.apache.org/maven2) -> [Help 1]

Which includes dependencies that I can not find mentioned in ./zeppelin-distribution/src/bin-license/LICENSE.

zeppelin-distribution/src/bin_license must be updated to include ALL dependencies, AND their transitive dependencies that were added in this PR

Could you take a look into it one more time please?

AhyoungRyu · 2016-08-17T06:41:29Z

conf/interpreter-list

 elasticsearch   org.apache.zeppelin:zeppelin-elasticsearch:0.6.1        Elasticsearch interpreter
 file            org.apache.zeppelin:zeppelin-file:0.6.1                 HDFS file interpreter
 flink           org.apache.zeppelin:zeppelin-flink_2.11:0.6.1           Flink interpreter built with Scala 2.11
+beam            org.apache.zeppelin:zeppelin-beam:0.6.1                 Beam interpreter


@mfelgamal Can we put beam in alphabetical order?

@AhyoungRyu done.

mfelgamal · 2016-08-17T10:42:44Z

@bzz Kindly be noted that there is no maven Scala 2.11 build for beam runner Flink. The only available build is beam-runners-flink_2.10 .
https://mvnrepository.com/artifact/org.apache.beam/beam-runners-flink_2.10/0.1.0-incubating
Our pom file :

<artifactId>beam-runners-flink_2.10</artifactId>
<version>${beam.beam.version}</version>

That is why we hardcode it.
While the failure logs claims that maven can not find beam-runners-flink_2.11
It seems that your builder enforces the usage of Scala 2.11

iemejia · 2016-08-17T11:55:26Z

Good work guys, probably this is the one other approach I missed in the JIRA, to have a static repl to compile the full class and then run it, this is nice because this can be reused for any full file Java case.

Just two comments:
I think that it is probably a good idea to move StaticRepl and BeamInterpreter to a more appropriate java package because this thing is a generic Java Repl/Interpreter (restricted to full class definitions). And I don't know if this is the case if it is worth dividing this into two PRs, one for the generic Java thing and other for the Beam specific part (notice that you can easily imagine reusing the java part e.g. to run a Spark or Flink note written in pure Java against a java ambient for example).

However I don't know the details of how (if) this reuse is possible in zeppelin, any hints @bzz ?

iemejia · 2016-08-17T11:58:48Z

Oups I forgot the second comment, I don't know if it is also worth to separate every runner as a different interpreter (leaving probably the DirectRunner as the default Beam one). And having the others: beam-spark, beam-flink, beam-cloud-dataflow, as different ones.

Notice that this will reduce the size of the dependencies for casual use cases (that mostly can be run in local), but let open the option to run those in cluster mode extrictly when needed.

felixcheung · 2016-08-17T14:01:38Z

Let's make sure the new doc is added to the list in _navigation.html?
https://zeppelin.apache.org/docs/0.6.1/development/writingzeppelininterpreter.html#contributing-a-new-interpreter-to-zeppelin-releases

felixcheung · 2016-08-17T14:04:51Z

beam/pom.xml

+	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
+	<modelVersion>4.0.0</modelVersion>
+
+


extra empty newline?

mfelgamal · 2016-08-17T20:18:16Z

@felixcheung the required changes is done.

bzz · 2016-08-19T01:44:20Z

It seems that your builder enforces the usage of Scala 2.11

@mfelgamal you are right, .travis.yml contains Scala 2.11 as well as scala 2.10 profiles.
In this case, could you please explicitly exclude -pl "\!beam" from all such CI build profiles and put a comment\commit message like there is no maven Scala 2.11 build for Beam runner on Flink

AhyoungRyu · 2016-08-21T04:20:15Z

beam/pom.xml

+	<modelVersion>4.0.0</modelVersion>
+
+	<parent>
+		<artifactId>zeppelin</artifactId>


@mfelgamal Maybe it's a nitpick, could you make this pom.xml with 2 spaces indentation like all the other pom.xml in Zeppelin?

mfelgamal · 2016-09-23T12:00:23Z

@bzz rebased.

bzz · 2016-09-26T18:43:00Z

Thank you @mfelgamal !

CI fails on single profile, which seems not relevant to the changes

[INFO] Zeppelin: Apache Cassandra interpreter ............. FAILURE [01:11 min]

Tests in error: 
  CassandraInterpreterTest.org.apache.zeppelin.cassandra.CassandraInterpreterTest Â» ExceptionInInitializer
  CassandraInterpreterTest.org.apache.zeppelin.cassandra.CassandraInterpreterTest Â» NoClassDefFound

java.lang.ExceptionInInitializerError: null
    at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84)
    at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:37)
    at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
    at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245)
    at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:63)
    at info.archinnov.achilles.script.ScriptExecutor.executeScript(ScriptExecutor.java:61)
    at info.archinnov.achilles.embedded.AchillesInitializer.executeStartupScripts(AchillesInitializer.java:204)
    at info.archinnov.achilles.embedded.AchillesInitializer.initialize(AchillesInitializer.java:98)
    at info.archinnov.achilles.embedded.AchillesInitializer.initializeFromParameters(AchillesInitializer.java:61)
    at info.archinnov.achilles.embedded.CassandraEmbeddedServer.<init>(CassandraEmbeddedServer.java:76)
    at info.archinnov.achilles.embedded.CassandraEmbeddedServerBuilder.buildNativeSessionOnly(CassandraEmbeddedServerBuilder.java:436)
    at org.apache.zeppelin.cassandra.CassandraInterpreterTest.<clinit>(CassandraInterpreterTest.java:59)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: localhost/127.0.0.1:9455 (com.datastax.driver.core.exceptions.OperationTimedOutException: [localhost/127.0.0.1] Timed out waiting for server response))
    at com.datastax.driver.core.RequestHandler.reportNoMoreHosts(RequestHandler.java:211)
    at com.datastax.driver.core.RequestHandler.access$1000(RequestHandler.java:43)
    at com.datastax.driver.core.RequestHandler$SpeculativeExecution.sendRequest(RequestHandler.java:277)
    at com.datastax.driver.core.RequestHandler$SpeculativeExecution$1.run(RequestHandler.java:400)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

org.apache.zeppelin.cassandra.CassandraInterpreterTest  Time elapsed: 39.92 sec  <<< ERROR!
java.lang.NoClassDefFoundError: Could not initialize class org.apache.zeppelin.cassandra.CassandraInterpreterTest

Merging to master, if there is no further discussion.

mfelgamal · 2016-09-26T18:56:59Z

@bzz For me, no further discussion. looking forward to seeing the PR merged.

bzz · 2016-09-27T03:53:31Z

Thank you so much @mfelgamal !

iemejia · 2016-09-27T07:14:55Z

Congratulations guys, excellent work !

iemejia · 2016-09-27T07:16:20Z

@mfelgamal or @bzz you must announce this milestone in the beam mailing list too.

kennknowles · 2016-09-27T15:41:57Z

👍!

babupe · 2016-09-30T19:50:11Z

Great work guys!

## What is this PR for? The PR is a interpreter for [Apache Beam](http://beam.incubator.apache.org) which is an open source unified platform for data processing pipelines. A pipeline can be build using one of the Beam SDKs. The execution of the pipeline is done by different Runners . Currently, Beam supports Apache Flink Runner, Apache Spark Runner, and Google Dataflow Runner. ### What type of PR is it? - Feature ### Todos * Test case * Review Comments * Documentation ### What is the Jira issue? * [ZEPPELIN-682] ### How should this be tested? - Start the Zeppelin server - The prefix of interpreter is `%beam` and then write your code with required imports and the runner ### Screenshots (if appropriate) ![](https://s9.postimg.org/s6eiwrbxb/beam_interpreter.png) ![](https://s9.postimg.org/eq3h8wsrz/visualisation_with_table.png) ### Questions: * Does the licenses files need update? no * Is there breaking changes for older versions? no * Does this needs documentation? yes Author: mahmoudelgamal <[email protected]> Author: mfelgamal <[email protected]> Author: Fouad <[email protected]> Closes apache#1334 from mfelgamal/beam-interpreter-static-repl-7 and squashes the following commits: da66c27 [mahmoudelgamal] Modify condition of checking static modifier 55c1322 [mahmoudelgamal] set spark version to 1.6.2 and throw original exception 27d7690 [mahmoudelgamal] set spark version to 1.6.1 and some modifications 750041c [mahmoudelgamal] Add readme file and modify pom file and travis.yml ca88f94 [mahmoudelgamal] edit pom file and .travis.yml 3d65427 [mahmoudelgamal] update .travis.yml file f19f98d [mahmoudelgamal] Make easy example with imports ands some modifications 74c14ca [mahmoudelgamal] Update the licenses acc7afb [mahmoudelgamal] Change beam to version 0.2.0 e821614 [mahmoudelgamal] Removing hadoop-core and print stack trace to failure 5cb7c7b [mahmoudelgamal] Add some changes to doc and pom file 75fc4f7 [mahmoudelgamal] add interpreter to navigation.html and remove extra spaces and lines 9b1b385 [mahmoudelgamal] put beam in alphabetical order 9c1e25d [mahmoudelgamal] Adding changes like logging and conventions and license 2aa6d65 [mahmoudelgamal] changing class name to StaticRepl and adding some modifications 7cf25fb [mahmoudelgamal] Adding some tests 3c5038f [mahmoudelgamal] Modifying the documentation 5695077 [mahmoudelgamal] Modifying pom file and Making documentation 26fc59b [mahmoudelgamal] Refactoring of the code 3a2bd85 [mahmoudelgamal] Adding the beam to zeppelin 7 ab7ee2d [mahmoudelgamal] beam interpreter 85957ff [mfelgamal] Merge pull request apache#10 from apache/master 852c3d3 [mfelgamal] Merge pull request apache#9 from apache/master a4bcc0d [mfelgamal] Merge pull request apache#8 from apache/master 858f1e1 [mfelgamal] Merge pull request apache#7 from apache/master 03a1e80 [mfelgamal] Merge pull request apache#4 from apache/master 2586651 [Fouad] Merge pull request apache#2 from apache/master

Fouad and others added 2 commits May 29, 2016 17:39

Merge pull request #2 from apache/master

2586651

update from original

Merge pull request #4 from apache/master

03a1e80

merge master

Merge pull request #7 from apache/master

858f1e1

merge master

mfelgamal force-pushed the beam-interpreter-static-repl-7 branch from b88ff75 to 25b5c18 Compare August 16, 2016 22:13

mfelgamal force-pushed the beam-interpreter-static-repl-7 branch 3 times, most recently from effade8 to 897ee3e Compare August 16, 2016 23:27

AhyoungRyu reviewed Aug 17, 2016
View reviewed changes

felixcheung reviewed Aug 17, 2016
View reviewed changes

AhyoungRyu reviewed Aug 21, 2016
View reviewed changes

mfelgamal force-pushed the beam-interpreter-static-repl-7 branch 3 times, most recently from 9894b86 to 0a87cd2 Compare August 21, 2016 18:57

mfelgamal added 19 commits September 23, 2016 13:51

Refactoring of the code

26fc59b

Modifying pom file and Making documentation

5695077

Modifying the documentation

3c5038f

Adding some tests

7cf25fb

changing class name to StaticRepl and adding some modifications

2aa6d65

Adding changes like logging and conventions and license

9c1e25d

put beam in alphabetical order

9b1b385

add interpreter to navigation.html and remove extra spaces and lines

75fc4f7

Add some changes to doc and pom file

5cb7c7b

Removing hadoop-core and print stack trace to failure

e821614

Change beam to version 0.2.0

acc7afb

Update the licenses

74c14ca

Make easy example with imports ands some modifications

f19f98d

update .travis.yml file

3d65427

edit pom file and .travis.yml

ca88f94

Add readme file and modify pom file and travis.yml

750041c

set spark version to 1.6.1 and some modifications

27d7690

set spark version to 1.6.2 and throw original exception

55c1322

Modify condition of checking static modifier

da66c27

mfelgamal force-pushed the beam-interpreter-static-repl-7 branch from 68f94fe to da66c27 Compare September 23, 2016 11:58

asfgit closed this in 403c8c4 Sep 27, 2016

		xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
		<modelVersion>4.0.0</modelVersion>

[ZEPPELIN-682] New interpreter for Apache Beam (incubating)/DataFlow #1334

[ZEPPELIN-682] New interpreter for Apache Beam (incubating)/DataFlow #1334

Uh oh!

Conversation

mfelgamal commented Aug 16, 2016

What is this PR for?

What type of PR is it?

Todos

What is the Jira issue?

How should this be tested?

Screenshots (if appropriate)

Questions:

Uh oh!

bzz commented Aug 16, 2016

Uh oh!

mfelgamal commented Aug 16, 2016

Uh oh!

bzz commented Aug 17, 2016

Uh oh!

bzz commented Aug 17, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AhyoungRyu Aug 17, 2016

Choose a reason for hiding this comment

Uh oh!

mfelgamal Aug 17, 2016

Choose a reason for hiding this comment

Uh oh!

mfelgamal commented Aug 17, 2016

Uh oh!

iemejia commented Aug 17, 2016

Uh oh!

iemejia commented Aug 17, 2016

Uh oh!

felixcheung commented Aug 17, 2016

Uh oh!

felixcheung Aug 17, 2016

Choose a reason for hiding this comment

Uh oh!

mfelgamal commented Aug 17, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bzz commented Aug 19, 2016

Uh oh!

AhyoungRyu Aug 21, 2016

Choose a reason for hiding this comment

Uh oh!

mfelgamal commented Sep 23, 2016

Uh oh!

bzz commented Sep 26, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mfelgamal commented Sep 26, 2016

Uh oh!

bzz commented Sep 27, 2016

Uh oh!

iemejia commented Sep 27, 2016

Uh oh!

iemejia commented Sep 27, 2016

Uh oh!

kennknowles commented Sep 27, 2016

Uh oh!

babupe commented Sep 30, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

bzz commented Aug 17, 2016 •

edited

Loading

mfelgamal commented Aug 17, 2016 •

edited

Loading

bzz commented Sep 26, 2016 •

edited

Loading