Skip to content

Conversation

@mfelgamal
Copy link
Contributor

What is this PR for?

The PR is a interpreter for Apache Beam which is an open source unified platform for data processing pipelines. A pipeline can be build using one of the Beam SDKs.
The execution of the pipeline is done by different Runners . Currently, Beam supports Apache Flink Runner, Apache Spark Runner, and Google Dataflow Runner.

What type of PR is it?

  • Feature

Todos

  • Test case
  • Review Comments
  • Documentation

What is the Jira issue?

How should this be tested?

  • Start the Zeppelin server
  • The prefix of interpreter is %beam and then write your code with required imports and the runner

Screenshots (if appropriate)


Questions:

  • Does the licenses files need update? no
  • Is there breaking changes for older versions? no
  • Does this needs documentation? yes

Fouad and others added 2 commits May 29, 2016 17:39
@bzz
Copy link
Member

bzz commented Aug 16, 2016

Thank you for contributing @mfelgamal ! Apache Beam interpreter is a valuable contribution many people are looking for.

There are few things that need to be done before we can merge it though:

  • this branch need to be rebased on top of the latest master (git rebase master) and conflicts must be resolved
  • establish the authorship of this work: this PR includes commits by you and @fouadma
  • make sure that every file in this PR, as any other file in ASF project, contain Apache2.0 license header
  • make sure that every file is formatted according to project's code conventions: pom.xml, .jave, etc.
    From the quick glance at the Java code:
    • please make sure you follow the conventions that are used in Zeppelin projects: brackets, sl4j logging. Best way is to look at other interpreters code
    • classes should have comments explaining their purpose and extra empty lines as well as @author tags need to be removed
    • zeppelin-distribution/src/bin_license must be updated to include ALL dependencies, AND their transitive dependencies that were added in this PR. You can see all of them in mvn dependency:tree.

Please, feel free to ping me after those issues are addressed and will be happy to look more into it and help you getting this merged!

@mfelgamal mfelgamal force-pushed the beam-interpreter-static-repl-7 branch from b88ff75 to 25b5c18 Compare August 16, 2016 22:13
@mfelgamal
Copy link
Contributor Author

@bzz The required changes is done, in the recent commit.

@mfelgamal mfelgamal force-pushed the beam-interpreter-static-repl-7 branch 3 times, most recently from effade8 to 897ee3e Compare August 16, 2016 23:27
@bzz
Copy link
Member

bzz commented Aug 17, 2016

Thank you for addressing the feedback promptly! Please let me take another pass on it and get back to you here.

@bzz
Copy link
Member

bzz commented Aug 17, 2016

Before proceed with review - I have noticed that CI is failing.
@mfelgamal could you please take a look and make sure that there are no Beam-related failures? Thanks!

So far there is

[INFO] Zeppelin: Beam interpreter ......................... FAILURE [  6.632 s]
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 02:46 min
[INFO] Finished at: 2016-08-16T23:45:31+00:00
[INFO] Final Memory: 97M/414M
[INFO] ------------------------------------------------------------------------
....
[ERROR] Failed to execute goal on project zeppelin-beam: Could not resolve dependencies for project org.apache.zeppelin:zeppelin-beam:jar:0.7.0-SNAPSHOT: The following artifacts could not be resolved: org.apache.beam:beam-runners-flink_2.11:jar:0.1.0-incubating, org.apache.beam:beam-runners-flink_2.11-examples:jar:0.1.0-incubating: Could not find artifact org.apache.beam:beam-runners-flink_2.11:jar:0.1.0-incubating in central (http://repo.maven.apache.org/maven2) -> [Help 1]

Which includes dependencies that I can not find mentioned in ./zeppelin-distribution/src/bin-license/LICENSE.

zeppelin-distribution/src/bin_license must be updated to include ALL dependencies, AND their transitive dependencies that were added in this PR

Could you take a look into it one more time please?

elasticsearch org.apache.zeppelin:zeppelin-elasticsearch:0.6.1 Elasticsearch interpreter
file org.apache.zeppelin:zeppelin-file:0.6.1 HDFS file interpreter
flink org.apache.zeppelin:zeppelin-flink_2.11:0.6.1 Flink interpreter built with Scala 2.11
beam org.apache.zeppelin:zeppelin-beam:0.6.1 Beam interpreter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mfelgamal Can we put beam in alphabetical order?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AhyoungRyu done.

@mfelgamal
Copy link
Contributor Author

@bzz Kindly be noted that there is no maven Scala 2.11 build for beam runner Flink. The only available build is beam-runners-flink_2.10 .
https://mvnrepository.com/artifact/org.apache.beam/beam-runners-flink_2.10/0.1.0-incubating
Our pom file :

<artifactId>beam-runners-flink_2.10</artifactId>
<version>${beam.beam.version}</version>

That is why we hardcode it.
While the failure logs claims that maven can not find beam-runners-flink_2.11
It seems that your builder enforces the usage of Scala 2.11

@iemejia
Copy link
Member

iemejia commented Aug 17, 2016

Good work guys, probably this is the one other approach I missed in the JIRA, to have a static repl to compile the full class and then run it, this is nice because this can be reused for any full file Java case.

Just two comments:
I think that it is probably a good idea to move StaticRepl and BeamInterpreter to a more appropriate java package because this thing is a generic Java Repl/Interpreter (restricted to full class definitions). And I don't know if this is the case if it is worth dividing this into two PRs, one for the generic Java thing and other for the Beam specific part (notice that you can easily imagine reusing the java part e.g. to run a Spark or Flink note written in pure Java against a java ambient for example).

However I don't know the details of how (if) this reuse is possible in zeppelin, any hints @bzz ?

@iemejia
Copy link
Member

iemejia commented Aug 17, 2016

Oups I forgot the second comment, I don't know if it is also worth to separate every runner as a different interpreter (leaving probably the DirectRunner as the default Beam one). And having the others: beam-spark, beam-flink, beam-cloud-dataflow, as different ones.

Notice that this will reduce the size of the dependencies for casual use cases (that mostly can be run in local), but let open the option to run those in cluster mode extrictly when needed.

@felixcheung
Copy link
Member

Let's make sure the new doc is added to the list in _navigation.html?
https://zeppelin.apache.org/docs/0.6.1/development/writingzeppelininterpreter.html#contributing-a-new-interpreter-to-zeppelin-releases

beam/pom.xml Outdated
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra empty newline?

@mfelgamal
Copy link
Contributor Author

mfelgamal commented Aug 17, 2016

@felixcheung the required changes is done.

@bzz
Copy link
Member

bzz commented Aug 19, 2016

It seems that your builder enforces the usage of Scala 2.11

@mfelgamal you are right, .travis.yml contains Scala 2.11 as well as scala 2.10 profiles.
In this case, could you please explicitly exclude -pl "\!beam" from all such CI build profiles and put a comment\commit message like there is no maven Scala 2.11 build for Beam runner on Flink

beam/pom.xml Outdated
<modelVersion>4.0.0</modelVersion>

<parent>
<artifactId>zeppelin</artifactId>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mfelgamal Maybe it's a nitpick, could you make this pom.xml with 2 spaces indentation like all the other pom.xml in Zeppelin?

@mfelgamal mfelgamal force-pushed the beam-interpreter-static-repl-7 branch 3 times, most recently from 9894b86 to 0a87cd2 Compare August 21, 2016 18:57
@mfelgamal mfelgamal force-pushed the beam-interpreter-static-repl-7 branch from 68f94fe to da66c27 Compare September 23, 2016 11:58
@mfelgamal
Copy link
Contributor Author

@bzz rebased.

@bzz
Copy link
Member

bzz commented Sep 26, 2016

Thank you @mfelgamal !

CI fails on single profile, which seems not relevant to the changes

[INFO] Zeppelin: Apache Cassandra interpreter ............. FAILURE [01:11 min]

Tests in error: 
  CassandraInterpreterTest.org.apache.zeppelin.cassandra.CassandraInterpreterTest » ExceptionInInitializer
  CassandraInterpreterTest.org.apache.zeppelin.cassandra.CassandraInterpreterTest » NoClassDefFound

java.lang.ExceptionInInitializerError: null
    at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84)
    at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:37)
    at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
    at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245)
    at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:63)
    at info.archinnov.achilles.script.ScriptExecutor.executeScript(ScriptExecutor.java:61)
    at info.archinnov.achilles.embedded.AchillesInitializer.executeStartupScripts(AchillesInitializer.java:204)
    at info.archinnov.achilles.embedded.AchillesInitializer.initialize(AchillesInitializer.java:98)
    at info.archinnov.achilles.embedded.AchillesInitializer.initializeFromParameters(AchillesInitializer.java:61)
    at info.archinnov.achilles.embedded.CassandraEmbeddedServer.<init>(CassandraEmbeddedServer.java:76)
    at info.archinnov.achilles.embedded.CassandraEmbeddedServerBuilder.buildNativeSessionOnly(CassandraEmbeddedServerBuilder.java:436)
    at org.apache.zeppelin.cassandra.CassandraInterpreterTest.<clinit>(CassandraInterpreterTest.java:59)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: localhost/127.0.0.1:9455 (com.datastax.driver.core.exceptions.OperationTimedOutException: [localhost/127.0.0.1] Timed out waiting for server response))
    at com.datastax.driver.core.RequestHandler.reportNoMoreHosts(RequestHandler.java:211)
    at com.datastax.driver.core.RequestHandler.access$1000(RequestHandler.java:43)
    at com.datastax.driver.core.RequestHandler$SpeculativeExecution.sendRequest(RequestHandler.java:277)
    at com.datastax.driver.core.RequestHandler$SpeculativeExecution$1.run(RequestHandler.java:400)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

org.apache.zeppelin.cassandra.CassandraInterpreterTest  Time elapsed: 39.92 sec  <<< ERROR!
java.lang.NoClassDefFoundError: Could not initialize class org.apache.zeppelin.cassandra.CassandraInterpreterTest

Merging to master, if there is no further discussion.

@mfelgamal
Copy link
Contributor Author

@bzz For me, no further discussion. looking forward to seeing the PR merged.

@asfgit asfgit closed this in 403c8c4 Sep 27, 2016
@bzz
Copy link
Member

bzz commented Sep 27, 2016

Thank you so much @mfelgamal !

@iemejia
Copy link
Member

iemejia commented Sep 27, 2016

Congratulations guys, excellent work !

@iemejia
Copy link
Member

iemejia commented Sep 27, 2016

@mfelgamal or @bzz you must announce this milestone in the beam mailing list too.

@kennknowles
Copy link
Member

👍!

@babupe
Copy link
Contributor

babupe commented Sep 30, 2016

Great work guys!

pedrozatta pushed a commit to pedrozatta/zeppelin that referenced this pull request Oct 27, 2016
## What is this PR for?
The PR is a interpreter for [Apache Beam](http://beam.incubator.apache.org) which is an open source unified platform for data processing pipelines. A pipeline can be build using one of the Beam SDKs.
The execution of the pipeline is done by different Runners . Currently, Beam supports Apache Flink Runner, Apache Spark Runner, and Google Dataflow Runner.

### What type of PR is it?
- Feature

### Todos
* Test case
* Review Comments
* Documentation

### What is the Jira issue?
* [ZEPPELIN-682]

### How should this be tested?
- Start the Zeppelin server
- The prefix of interpreter is `%beam` and then write your code with required imports and the runner

### Screenshots (if appropriate)
![](https://s9.postimg.org/s6eiwrbxb/beam_interpreter.png)
![](https://s9.postimg.org/eq3h8wsrz/visualisation_with_table.png)

### Questions:
* Does the licenses files need update? no
* Is there breaking changes for older versions? no
* Does this needs documentation? yes

Author: mahmoudelgamal <[email protected]>
Author: mfelgamal <[email protected]>
Author: Fouad <[email protected]>

Closes apache#1334 from mfelgamal/beam-interpreter-static-repl-7 and squashes the following commits:

da66c27 [mahmoudelgamal] Modify condition of checking static modifier
55c1322 [mahmoudelgamal] set spark version to 1.6.2 and throw original exception
27d7690 [mahmoudelgamal] set spark version to 1.6.1 and some modifications
750041c [mahmoudelgamal] Add readme file and modify pom file and travis.yml
ca88f94 [mahmoudelgamal] edit pom file and .travis.yml
3d65427 [mahmoudelgamal] update .travis.yml file
f19f98d [mahmoudelgamal] Make easy example with imports ands some modifications
74c14ca [mahmoudelgamal] Update the licenses
acc7afb [mahmoudelgamal] Change beam to version 0.2.0
e821614 [mahmoudelgamal] Removing hadoop-core and print stack trace to failure
5cb7c7b [mahmoudelgamal] Add some changes to doc and pom file
75fc4f7 [mahmoudelgamal] add interpreter to navigation.html and remove extra spaces and lines
9b1b385 [mahmoudelgamal] put beam in alphabetical order
9c1e25d [mahmoudelgamal] Adding changes like logging and conventions and license
2aa6d65 [mahmoudelgamal] changing class name to StaticRepl and adding some modifications
7cf25fb [mahmoudelgamal] Adding some tests
3c5038f [mahmoudelgamal] Modifying the documentation
5695077 [mahmoudelgamal] Modifying pom file and Making documentation
26fc59b [mahmoudelgamal] Refactoring of the code
3a2bd85 [mahmoudelgamal] Adding the beam to zeppelin 7
ab7ee2d [mahmoudelgamal] beam interpreter
85957ff [mfelgamal] Merge pull request apache#10 from apache/master
852c3d3 [mfelgamal] Merge pull request apache#9 from apache/master
a4bcc0d [mfelgamal] Merge pull request apache#8 from apache/master
858f1e1 [mfelgamal] Merge pull request apache#7 from apache/master
03a1e80 [mfelgamal] Merge pull request apache#4 from apache/master
2586651 [Fouad] Merge pull request apache#2 from apache/master
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants