-
Notifications
You must be signed in to change notification settings - Fork 2.8k
[ZEPPELIN-682] New interpreter for Apache Beam (incubating)/DataFlow #1334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
update from original
merge master
|
Thank you for contributing @mfelgamal ! Apache Beam interpreter is a valuable contribution many people are looking for. There are few things that need to be done before we can merge it though:
Please, feel free to ping me after those issues are addressed and will be happy to look more into it and help you getting this merged! |
merge master
b88ff75 to
25b5c18
Compare
|
@bzz The required changes is done, in the recent commit. |
effade8 to
897ee3e
Compare
|
Thank you for addressing the feedback promptly! Please let me take another pass on it and get back to you here. |
|
Before proceed with review - I have noticed that CI is failing. So far there is Which includes dependencies that I can not find mentioned in
Could you take a look into it one more time please? |
conf/interpreter-list
Outdated
| elasticsearch org.apache.zeppelin:zeppelin-elasticsearch:0.6.1 Elasticsearch interpreter | ||
| file org.apache.zeppelin:zeppelin-file:0.6.1 HDFS file interpreter | ||
| flink org.apache.zeppelin:zeppelin-flink_2.11:0.6.1 Flink interpreter built with Scala 2.11 | ||
| beam org.apache.zeppelin:zeppelin-beam:0.6.1 Beam interpreter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mfelgamal Can we put beam in alphabetical order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AhyoungRyu done.
|
@bzz Kindly be noted that there is no maven Scala 2.11 build for beam runner Flink. The only available build is beam-runners-flink_2.10 . That is why we hardcode it. |
|
Good work guys, probably this is the one other approach I missed in the JIRA, to have a static repl to compile the full class and then run it, this is nice because this can be reused for any full file Java case. Just two comments: However I don't know the details of how (if) this reuse is possible in zeppelin, any hints @bzz ? |
|
Oups I forgot the second comment, I don't know if it is also worth to separate every runner as a different interpreter (leaving probably the DirectRunner as the default Beam one). And having the others: beam-spark, beam-flink, beam-cloud-dataflow, as different ones. Notice that this will reduce the size of the dependencies for casual use cases (that mostly can be run in local), but let open the option to run those in cluster mode extrictly when needed. |
|
Let's make sure the new doc is added to the list in _navigation.html? |
beam/pom.xml
Outdated
| xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> | ||
| <modelVersion>4.0.0</modelVersion> | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra empty newline?
|
@felixcheung the required changes is done. |
@mfelgamal you are right, |
beam/pom.xml
Outdated
| <modelVersion>4.0.0</modelVersion> | ||
|
|
||
| <parent> | ||
| <artifactId>zeppelin</artifactId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mfelgamal Maybe it's a nitpick, could you make this pom.xml with 2 spaces indentation like all the other pom.xml in Zeppelin?
9894b86 to
0a87cd2
Compare
68f94fe to
da66c27
Compare
|
@bzz rebased. |
|
Thank you @mfelgamal ! CI fails on single profile, which seems not relevant to the changes Merging to master, if there is no further discussion. |
|
@bzz For me, no further discussion. looking forward to seeing the PR merged. |
|
Thank you so much @mfelgamal ! |
|
Congratulations guys, excellent work ! |
|
@mfelgamal or @bzz you must announce this milestone in the beam mailing list too. |
|
👍! |
|
Great work guys! |
## What is this PR for? The PR is a interpreter for [Apache Beam](http://beam.incubator.apache.org) which is an open source unified platform for data processing pipelines. A pipeline can be build using one of the Beam SDKs. The execution of the pipeline is done by different Runners . Currently, Beam supports Apache Flink Runner, Apache Spark Runner, and Google Dataflow Runner. ### What type of PR is it? - Feature ### Todos * Test case * Review Comments * Documentation ### What is the Jira issue? * [ZEPPELIN-682] ### How should this be tested? - Start the Zeppelin server - The prefix of interpreter is `%beam` and then write your code with required imports and the runner ### Screenshots (if appropriate)   ### Questions: * Does the licenses files need update? no * Is there breaking changes for older versions? no * Does this needs documentation? yes Author: mahmoudelgamal <[email protected]> Author: mfelgamal <[email protected]> Author: Fouad <[email protected]> Closes apache#1334 from mfelgamal/beam-interpreter-static-repl-7 and squashes the following commits: da66c27 [mahmoudelgamal] Modify condition of checking static modifier 55c1322 [mahmoudelgamal] set spark version to 1.6.2 and throw original exception 27d7690 [mahmoudelgamal] set spark version to 1.6.1 and some modifications 750041c [mahmoudelgamal] Add readme file and modify pom file and travis.yml ca88f94 [mahmoudelgamal] edit pom file and .travis.yml 3d65427 [mahmoudelgamal] update .travis.yml file f19f98d [mahmoudelgamal] Make easy example with imports ands some modifications 74c14ca [mahmoudelgamal] Update the licenses acc7afb [mahmoudelgamal] Change beam to version 0.2.0 e821614 [mahmoudelgamal] Removing hadoop-core and print stack trace to failure 5cb7c7b [mahmoudelgamal] Add some changes to doc and pom file 75fc4f7 [mahmoudelgamal] add interpreter to navigation.html and remove extra spaces and lines 9b1b385 [mahmoudelgamal] put beam in alphabetical order 9c1e25d [mahmoudelgamal] Adding changes like logging and conventions and license 2aa6d65 [mahmoudelgamal] changing class name to StaticRepl and adding some modifications 7cf25fb [mahmoudelgamal] Adding some tests 3c5038f [mahmoudelgamal] Modifying the documentation 5695077 [mahmoudelgamal] Modifying pom file and Making documentation 26fc59b [mahmoudelgamal] Refactoring of the code 3a2bd85 [mahmoudelgamal] Adding the beam to zeppelin 7 ab7ee2d [mahmoudelgamal] beam interpreter 85957ff [mfelgamal] Merge pull request apache#10 from apache/master 852c3d3 [mfelgamal] Merge pull request apache#9 from apache/master a4bcc0d [mfelgamal] Merge pull request apache#8 from apache/master 858f1e1 [mfelgamal] Merge pull request apache#7 from apache/master 03a1e80 [mfelgamal] Merge pull request apache#4 from apache/master 2586651 [Fouad] Merge pull request apache#2 from apache/master
What is this PR for?
The PR is a interpreter for Apache Beam which is an open source unified platform for data processing pipelines. A pipeline can be build using one of the Beam SDKs.
The execution of the pipeline is done by different Runners . Currently, Beam supports Apache Flink Runner, Apache Spark Runner, and Google Dataflow Runner.
What type of PR is it?
Todos
What is the Jira issue?
How should this be tested?
%beamand then write your code with required imports and the runnerScreenshots (if appropriate)
Questions: