Skip to content

Conversation

@ravwojdyla
Copy link
Contributor

@ravwojdyla ravwojdyla commented Sep 29, 2016

What is this PR for?

Closes #ZEPPELIN-1505. Adds Scio interpreter. Scio is a Scala DSL on top of Dataflow/Beam.

What type of PR is it?

Improvement - ZEPPELIN-1505

Todos

  • - test integration with zeppelin context (zeppelin context is too tightly coupled withs spark)
  • - what to do about code completion?
  • - add more tests?
  • - add helpers to display data
  • - add doc in docs/interpreter/scio

What is the Jira issue?

https://issues.apache.org/jira/browse/ZEPPELIN-1505

How should this be tested?

mvn -pl scio,zeppelin-display,zeppelin-interpreter -Dtest='org.apache.zeppelin.scio.*' -DfailIfNoTests=false test

Screenshots

GitHub Logo

Questions:

  • Does the licenses files need update? no yes
  • Is there breaking changes for older versions? no
  • Does this needs documentation? yes (included in the PR)

* LogOutputStream of apache commons exec has one issue that method flush doesn't throw IOException,
* so that SparkOutputStream can not extend it correctly.
*/
public abstract class LogOutputStream extends OutputStream {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems you are doing the same thing as me in (https://github.com/apache/zeppelin/blob/master/spark/src/main/java/org/apache/zeppelin/spark/LogOutputStream.java). Maybe we can put it in module interpreter so that every interpreter implementation can use it. But could do it in a follow up ticket.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

assertEquals(InterpreterResult.Code.SUCCESS,
repl.interpret("sc.parallelize(1 to 10).closeAndCollect().toList", context).code());
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know much about scio, but the test seems not sufficient, as least you miss one case of InterpreterResult.Code.ERROR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Added test for ERROR. Plus will add more as part of the PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to add more scenarios:

  • throw exception
  • multiple line code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, added:

  • multiline
  • throw exception

override def getProgress(context: InterpreterContext): Int = {
// not implemented
42
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 42 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no notion of overall progress in scio/beam, but would prefer it to be different than 0 just to see the different between execution vs interpreter startup. Is this fine?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The convention is 0 if progress is not implemented IIRC. If the execution is started, the state would transite from PENDING to RUNNING

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

"args": {
"envName": null,
"propertyName": null,
"defaultValue": "--runner=InProcessPipelineRunner",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

propertyName is null ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep - same as in Spark interpreter - what would you recommend?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now changed it to fully named property.

postgresql org.apache.zeppelin:zeppelin-postgresql:0.6.1 Postgresql interpreter
python org.apache.zeppelin:zeppelin-python:0.6.1 Python interpreter
shell org.apache.zeppelin:zeppelin-shell:0.6.1 Shell command
scio org.apache.zeppelin:zeppelin-scio:0.6.1 Scio interpreter
Copy link
Contributor

@AhyoungRyu AhyoungRyu Sep 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ravwojdyla can we keep maintaining alphabetic order in here? need to change the position of scio with shell.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Scio interpreter for Apache Zeppelin
====================================

## Raison d'être:
Copy link
Contributor

@AhyoungRyu AhyoungRyu Sep 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean Purpose? French word is awesome 👍 , but it would be better to change to English for everyone :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to change, that said Raison d'être is commonly used in English. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't know that. Good to Know :D

repl.open();
}

context = new InterpreterContext("note", "id", "title", "text",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ravwojdyla Zeppelin's java code block indentation is "2" space :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file seems to be indented with 2 spaces. Which part exactly is the problem?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ravwojdyla Ah that's my bad. It's just a wrapping. Please ignore my comment for this :)

@AhyoungRyu
Copy link
Contributor

@ravwojdyla Thanks for your contribution and welcome!
I assume this PR is not finished but Work In Process since there are some unchecked TODOs. So how about changing the title of this PR from Closes ZEPPELIN-1505: Add Scio interpreter to [WIP][ZEPPELIN-1505] Add Scio interpreter or sth? It would be more helpful to anyone who want to understand the purpose of this PR at a glance and start review :)

@ravwojdyla ravwojdyla changed the title Closes ZEPPELIN-1505: Add Scio interpreter [WIP][ZEPPELIN-1505] Add Scio interpreter Sep 29, 2016
@ravwojdyla
Copy link
Contributor Author

@AhyoungRyu @zjffdu thanks for prompt review. I have some questions in the comments, plus fixes coming in. I have also changed the topic of the PR.

@ravwojdyla ravwojdyla force-pushed the scio branch 4 times, most recently from 8457886 to 931f9b0 Compare September 30, 2016 01:19
@ravwojdyla ravwojdyla changed the title [WIP][ZEPPELIN-1505] Add Scio interpreter Closes [ZEPPELIN-1505] Add Scio interpreter Sep 30, 2016
@ravwojdyla
Copy link
Contributor Author

ravwojdyla commented Sep 30, 2016

@AhyoungRyu @zjffdu could you please review when you have time. all the todos are ✔️

public void testBasicSyntaxError() {
assertEquals(InterpreterResult.Code.ERROR,
repl.interpret("val a:Int = 'ds'", context).code());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to verify the message as well so that user can see the right error message in frontend.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

val content = it.take(maxResults)
.map(r => fieldNames.map(r.get).mkString(tab))
.mkString(newline)
println(s"$table $header$newline$firstStr$newline$content")
Copy link
Contributor

@zjffdu zjffdu Sep 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of println ? Is it for logging or display ? If it is for display, it is weird for me to use println, and does it support table format of zeppelin ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for the display system and the %table format specifically.

@ravwojdyla ravwojdyla force-pushed the scio branch 4 times, most recently from 9b02792 to 6241153 Compare October 1, 2016 01:42
@ravwojdyla
Copy link
Contributor Author

@AhyoungRyu @zjffdu are there any more comment, how do you want to proceed? the error on travis does not seems to be related?

@ravwojdyla ravwojdyla force-pushed the scio branch 3 times, most recently from 7886159 to 035738e Compare October 4, 2016 19:33
@ravwojdyla
Copy link
Contributor Author

ping @AhyoungRyu @zjffdu

<groupId>org.apache.commons</groupId>
<artifactId>commons-vfs2</artifactId>
<version>2.0</version>
<exclusions>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this related to this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was an issue that came up while dev - fixed via 45decf5

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is not related to this PR directly, Can you revert it and make another minor PR instead? I think it would be better to track the changes :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
</exclusion>
<exclusion>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this related to this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was an issue that came up while dev - fixed via 45decf5

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


Scio interpreter comes with display helpers to ease working with Zeppelin notebooks. Simply use `closeAndDisplay()` on `SCollection` to close context and display the results. The number of results is limited by `zeppelin.scio.maxResult` (by default 1000).

Supported `SCollection` types:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ravwojdyla could you add a blank newline after this line? or it'll be rendered like this
screen shot 2016-10-05 at 11 39 59 am

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@ravwojdyla ravwojdyla force-pushed the scio branch 4 times, most recently from 020a653 to acdf00a Compare November 12, 2016 16:29
@Leemoonsoo
Copy link
Member

Looks great to me.
CI failure is unrelated (known flaky test https://issues.apache.org/jira/browse/ZEPPELIN-1623)

Merge to master if no further comments.

@asfgit asfgit closed this in f127237 Nov 14, 2016
@ravwojdyla ravwojdyla deleted the scio branch November 14, 2016 19:02
tae-jun pushed a commit to tae-jun/zeppelin that referenced this pull request Nov 23, 2016
### What is this PR for?

Closes #ZEPPELIN-1505. Adds Scio interpreter. Scio is a Scala DSL on top of Dataflow/Beam.
### What type of PR is it?

Improvement - ZEPPELIN-1505
### Todos
- [x] - test integration with zeppelin context (zeppelin context is too tightly coupled withs spark)
- [x] - what to do about code completion?
- [x] - add more tests?
- [x] - add helpers to display data
- [x] - add doc in `docs/interpreter/scio`
### What is the Jira issue?

https://issues.apache.org/jira/browse/ZEPPELIN-1505
### How should this be tested?

```
mvn -pl scio,zeppelin-display,zeppelin-interpreter -Dtest='org.apache.zeppelin.scio.*' -DfailIfNoTests=false test
```
### Screenshots

![GitHub Logo](http://i.imgur.com/w6Cp5kp.png)
### Questions:
- Does the licenses files need update? ~~no~~ yes
- Is there breaking changes for older versions? no
- Does this needs documentation? yes (included in the PR)

Author: Rafal Wojdyla <rav@spotify.com>

Closes apache#1471 from ravwojdyla/scio and squashes the following commits:

d6fbc4e [Rafal Wojdyla] Add runner doc links
7e6fdec [Rafal Wojdyla] Fix indentation
d8de7c8 [Rafal Wojdyla] Remove optional flink deps from Beam
508705f [Rafal Wojdyla] Remove duplicates
49cf0eb [Rafal Wojdyla] Add scio to beam group
cd79fc8 [Rafal Wojdyla] Add .bigquery cache to gitignore
b961791 [Rafal Wojdyla] Check the message content
1e30f76 [Rafal Wojdyla] Simplify SCollection implicits
3c519f1 [Rafal Wojdyla] Fix doc style
e9579d8 [Rafal Wojdyla] Clarify Context sharing + add docs about display helpers
327273e [Rafal Wojdyla] Add Zeppelin custom ContextAndArgs
0920fdd [Rafal Wojdyla] Remove obsolete deps
8f25f71 [Rafal Wojdyla] Upgrade scio to 0.2.4
3275185 [Rafal Wojdyla] Add license
bd4df5e [Rafal Wojdyla] Fix documentation style
dcbb197 [Rafal Wojdyla] Add documentation link
e635674 [Rafal Wojdyla] Add tests for DisplayHelpers
c0f8ccf [Rafal Wojdyla] Fix style and number of records for take
9dcc8ce [Rafal Wojdyla] Add display helpers for Tap[T] and Future[Tap[T]]
4014c81 [Rafal Wojdyla] Parse params generic params
0305a3c [Rafal Wojdyla] Style fix
a92494b [Rafal Wojdyla] Style + use `split` to support both scala 2.{10,11}
b884b72 [Rafal Wojdyla] Fix tests - add exception and check messages
99a7daa [Rafal Wojdyla] Progress should be 0
7b54e49 [Rafal Wojdyla] No need to override delegation for completion
12f0096 [Rafal Wojdyla] Clean up tests
93233a8 [Rafal Wojdyla] Rename argz param, fix style
0a3b49a [Rafal Wojdyla] Add Scio doc
61850d7 [Rafal Wojdyla] Add TableRow display helper + style
be252f8 [Rafal Wojdyla] Add avro display
89a2811 [Rafal Wojdyla] Add limit of display
9a21aa0 [Rafal Wojdyla] Add display helpers
6ff4e95 [Rafal Wojdyla] Inject argz to the Scio interpreter
570cfaa [Rafal Wojdyla] Add Scio interpreter tests
38abaf9 [Rafal Wojdyla] Add Scio interpreter
7b596ea [Rafal Wojdyla] Generalize SparkOutputStream
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants