-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Closes [ZEPPELIN-1505] Add Scio interpreter #1471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| * LogOutputStream of apache commons exec has one issue that method flush doesn't throw IOException, | ||
| * so that SparkOutputStream can not extend it correctly. | ||
| */ | ||
| public abstract class LogOutputStream extends OutputStream { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems you are doing the same thing as me in (https://github.com/apache/zeppelin/blob/master/spark/src/main/java/org/apache/zeppelin/spark/LogOutputStream.java). Maybe we can put it in module interpreter so that every interpreter implementation can use it. But could do it in a follow up ticket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
| assertEquals(InterpreterResult.Code.SUCCESS, | ||
| repl.interpret("sc.parallelize(1 to 10).closeAndCollect().toList", context).code()); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know much about scio, but the test seems not sufficient, as least you miss one case of InterpreterResult.Code.ERROR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Added test for ERROR. Plus will add more as part of the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to add more scenarios:
- throw exception
- multiple line code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, added:
- multiline
- throw exception
| override def getProgress(context: InterpreterContext): Int = { | ||
| // not implemented | ||
| 42 | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why 42 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no notion of overall progress in scio/beam, but would prefer it to be different than 0 just to see the different between execution vs interpreter startup. Is this fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The convention is 0 if progress is not implemented IIRC. If the execution is started, the state would transite from PENDING to RUNNING
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
| "args": { | ||
| "envName": null, | ||
| "propertyName": null, | ||
| "defaultValue": "--runner=InProcessPipelineRunner", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
propertyName is null ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep - same as in Spark interpreter - what would you recommend?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have now changed it to fully named property.
| postgresql org.apache.zeppelin:zeppelin-postgresql:0.6.1 Postgresql interpreter | ||
| python org.apache.zeppelin:zeppelin-python:0.6.1 Python interpreter | ||
| shell org.apache.zeppelin:zeppelin-shell:0.6.1 Shell command | ||
| scio org.apache.zeppelin:zeppelin-scio:0.6.1 Scio interpreter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ravwojdyla can we keep maintaining alphabetic order in here? need to change the position of scio with shell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
| Scio interpreter for Apache Zeppelin | ||
| ==================================== | ||
|
|
||
| ## Raison d'être: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean Purpose? French word is awesome 👍 , but it would be better to change to English for everyone :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to change, that said Raison d'être is commonly used in English. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't know that. Good to Know :D
| repl.open(); | ||
| } | ||
|
|
||
| context = new InterpreterContext("note", "id", "title", "text", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ravwojdyla Zeppelin's java code block indentation is "2" space :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file seems to be indented with 2 spaces. Which part exactly is the problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ravwojdyla Ah that's my bad. It's just a wrapping. Please ignore my comment for this :)
|
@ravwojdyla Thanks for your contribution and welcome! |
|
@AhyoungRyu @zjffdu thanks for prompt review. I have some questions in the comments, plus fixes coming in. I have also changed the topic of the PR. |
8457886 to
931f9b0
Compare
|
@AhyoungRyu @zjffdu could you please review when you have time. all the todos are ✔️ |
| public void testBasicSyntaxError() { | ||
| assertEquals(InterpreterResult.Code.ERROR, | ||
| repl.interpret("val a:Int = 'ds'", context).code()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to verify the message as well so that user can see the right error message in frontend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
| val content = it.take(maxResults) | ||
| .map(r => fieldNames.map(r.get).mkString(tab)) | ||
| .mkString(newline) | ||
| println(s"$table $header$newline$firstStr$newline$content") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of println ? Is it for logging or display ? If it is for display, it is weird for me to use println, and does it support table format of zeppelin ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for the display system and the %table format specifically.
9b02792 to
6241153
Compare
|
@AhyoungRyu @zjffdu are there any more comment, how do you want to proceed? the error on travis does not seems to be related? |
7886159 to
035738e
Compare
|
ping @AhyoungRyu @zjffdu |
zeppelin-zengine/pom.xml
Outdated
| <groupId>org.apache.commons</groupId> | ||
| <artifactId>commons-vfs2</artifactId> | ||
| <version>2.0</version> | ||
| <exclusions> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this related to this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was an issue that came up while dev - fixed via 45decf5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is not related to this PR directly, Can you revert it and make another minor PR instead? I think it would be better to track the changes :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
zeppelin-zengine/pom.xml
Outdated
| <groupId>org.apache.httpcomponents</groupId> | ||
| <artifactId>httpclient</artifactId> | ||
| </exclusion> | ||
| <exclusion> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this related to this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was an issue that came up while dev - fixed via 45decf5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
|
||
| Scio interpreter comes with display helpers to ease working with Zeppelin notebooks. Simply use `closeAndDisplay()` on `SCollection` to close context and display the results. The number of results is limited by `zeppelin.scio.maxResult` (by default 1000). | ||
|
|
||
| Supported `SCollection` types: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ravwojdyla could you add a blank newline after this line? or it'll be rendered like this

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
To support Dataflow service execution, which requires REPL state dump in jar, we add custom ContextAndArgs.
Flink dependency makes it impossible to compile scio for scala 2.11 given that there are not 2.11 artifacts for flink runner available yet.
020a653 to
acdf00a
Compare
|
Looks great to me. Merge to master if no further comments. |
### What is this PR for? Closes #ZEPPELIN-1505. Adds Scio interpreter. Scio is a Scala DSL on top of Dataflow/Beam. ### What type of PR is it? Improvement - ZEPPELIN-1505 ### Todos - [x] - test integration with zeppelin context (zeppelin context is too tightly coupled withs spark) - [x] - what to do about code completion? - [x] - add more tests? - [x] - add helpers to display data - [x] - add doc in `docs/interpreter/scio` ### What is the Jira issue? https://issues.apache.org/jira/browse/ZEPPELIN-1505 ### How should this be tested? ``` mvn -pl scio,zeppelin-display,zeppelin-interpreter -Dtest='org.apache.zeppelin.scio.*' -DfailIfNoTests=false test ``` ### Screenshots  ### Questions: - Does the licenses files need update? ~~no~~ yes - Is there breaking changes for older versions? no - Does this needs documentation? yes (included in the PR) Author: Rafal Wojdyla <rav@spotify.com> Closes apache#1471 from ravwojdyla/scio and squashes the following commits: d6fbc4e [Rafal Wojdyla] Add runner doc links 7e6fdec [Rafal Wojdyla] Fix indentation d8de7c8 [Rafal Wojdyla] Remove optional flink deps from Beam 508705f [Rafal Wojdyla] Remove duplicates 49cf0eb [Rafal Wojdyla] Add scio to beam group cd79fc8 [Rafal Wojdyla] Add .bigquery cache to gitignore b961791 [Rafal Wojdyla] Check the message content 1e30f76 [Rafal Wojdyla] Simplify SCollection implicits 3c519f1 [Rafal Wojdyla] Fix doc style e9579d8 [Rafal Wojdyla] Clarify Context sharing + add docs about display helpers 327273e [Rafal Wojdyla] Add Zeppelin custom ContextAndArgs 0920fdd [Rafal Wojdyla] Remove obsolete deps 8f25f71 [Rafal Wojdyla] Upgrade scio to 0.2.4 3275185 [Rafal Wojdyla] Add license bd4df5e [Rafal Wojdyla] Fix documentation style dcbb197 [Rafal Wojdyla] Add documentation link e635674 [Rafal Wojdyla] Add tests for DisplayHelpers c0f8ccf [Rafal Wojdyla] Fix style and number of records for take 9dcc8ce [Rafal Wojdyla] Add display helpers for Tap[T] and Future[Tap[T]] 4014c81 [Rafal Wojdyla] Parse params generic params 0305a3c [Rafal Wojdyla] Style fix a92494b [Rafal Wojdyla] Style + use `split` to support both scala 2.{10,11} b884b72 [Rafal Wojdyla] Fix tests - add exception and check messages 99a7daa [Rafal Wojdyla] Progress should be 0 7b54e49 [Rafal Wojdyla] No need to override delegation for completion 12f0096 [Rafal Wojdyla] Clean up tests 93233a8 [Rafal Wojdyla] Rename argz param, fix style 0a3b49a [Rafal Wojdyla] Add Scio doc 61850d7 [Rafal Wojdyla] Add TableRow display helper + style be252f8 [Rafal Wojdyla] Add avro display 89a2811 [Rafal Wojdyla] Add limit of display 9a21aa0 [Rafal Wojdyla] Add display helpers 6ff4e95 [Rafal Wojdyla] Inject argz to the Scio interpreter 570cfaa [Rafal Wojdyla] Add Scio interpreter tests 38abaf9 [Rafal Wojdyla] Add Scio interpreter 7b596ea [Rafal Wojdyla] Generalize SparkOutputStream
What is this PR for?
Closes #ZEPPELIN-1505. Adds Scio interpreter. Scio is a Scala DSL on top of Dataflow/Beam.
What type of PR is it?
Improvement - ZEPPELIN-1505
Todos
docs/interpreter/scioWhat is the Jira issue?
https://issues.apache.org/jira/browse/ZEPPELIN-1505
How should this be tested?
Screenshots
Questions:
noyes