Skip to content

Conversation

@fireboy1919
Copy link
Contributor

Nondestructive addition of a unified API for handling separate result sets and an implementation of it for Hive and the Virtual File system notebook repo.

* Unlike a regular interpreter result, a JdbcInterpreter Result caches its
* output so that it can be used later and persisted independently of the regular result.
* It also has a standard return for all tabular SQL data.
* @author Rusty Phillips {@literal: <[email protected]>}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be willing to remove @author tag from JavaDoc here and below please?
Although not clearly documented we, as many other ASF projects, do not use them to keep track of contributions, but use git logs and JIRA issues instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely!

@bzz
Copy link
Member

bzz commented Feb 24, 2016

@fireboy1919 thank you for contribution!

The approach you take looks interesting, please help me to understand how much would that affect other existing notebook storages\interpreters implementations?

As review takes some time and effort, there are few things you can do to seed things us, by simplifying life of the reviewer a lot (in order of simplicity)

  • us the template for PR description from the docs
  • minimize the re-formatting changes like this or this, etc, as it makes hard to distinguish important changes
  • make sure that ALL your code style is consistent with the project's style guide's code conventions and the other code that you have changed. This looks like a nit, but it speeds the process up, removing the need to scatter reviewers attention between important changes and just minor style one's
  • keep your branch rebased on latest master
  • make sure the CI is green
  • ping somebody (using Gihutb mentions in this PR) as soon as you feel that it is ready for a review\merge

@Leemoonsoo
Copy link
Member

Thank you @bzz for organizing the check list very well.

Overall concept and approach of this PR make sense a lot.
How about think little bit more before we move on?

Implementation side, I'd like to generalize little bit more and align more with other components that already exists or planned in the future.

I think it make sense to introduce some common data type (like RDD in Spark) for the Zeppelin, the common data type abstracts underlaying physical data. In addition to current InterpreterResult.

And this common data type can be exported to ResourcePool from any interpreter.

And then we can create proxy object of this common data type in Zeppelin server side, so Zeppelin server can bring data from interpreter process.

This data can be provided by REST API or any other API.
So basically approach is the same but little bit more generalized and use existing component.

@fireboy1919
Copy link
Contributor Author

In order to minimize changes, the code that I wrote applies when the data is serialized rather than being written into the interpreters. So it's probably enough different to warrant scrapping the request, even if I would take some of the ideas from it.

In lieu of making another pull request, and in light of Leemoonsoo's comments, might I suggest splitting this into the issues that he mentioned:

  1. Create a serializable common tabular data type (e.g., TabularResultType or JDBCResultType).
  2. Make the base interpreter capable of sending its results to data pools.
  3. Make the front-end capable of using this data (for that part, I'd probably refactor quite a lot of what I've written in this pull request).
  4. Make a REST API that can download tabular data that comes from the resource pool (this looks to have a JIRA ticket for it already: ZEPPELIN-201)

I can't find any tickets for any of the first three things, but I'm happy to adapt the code that I've written to cover things one at a time. The second item could potentially have more thought.

Does that cover what you're thinking, @Leemoonsoo? And if so, are there any tickets or existing work being done that I am unaware of that relates to these issues, or should I write some?

@Leemoonsoo
Copy link
Member

@fireboy1919 It sounds like a good plan. 1) and 2) are also required for ZEPPELIN-533. To do 2), current data pool needs some upgrade. it'll need capability of removing data automatically on paragraph/notebook deletion.
I have created issue for it https://issues.apache.org/jira/browse/ZEPPELIN-713

@Leemoonsoo
Copy link
Member

Regarding 3), how i tried to handle it in https://cwiki.apache.org/confluence/display/ZEPPELIN/Helium+proposal,

is that, depends on the result data available in ResourcePool, Zeppelin suggest different type of pluggable visualization. That's maybe too long way to get 'TabularResult' visualized, but i believe it's more extensible and generalized way. So it'll be a framework for not only 'TabularResult' data, but any other type of resource can be exposed to ResourcePool and get displayed in front-end.

@asfgit asfgit closed this in c38a0a0 May 9, 2018
asfgit pushed a commit that referenced this pull request May 9, 2018
close #83
close #86
close #125
close #133
close #139
close #146
close #193
close #203
close #246
close #262
close #264
close #273
close #291
close #299
close #320
close #347
close #389
close #413
close #423
close #543
close #560
close #658
close #670
close #728
close #765
close #777
close #782
close #783
close #812
close #822
close #841
close #843
close #878
close #884
close #918
close #989
close #1076
close #1135
close #1187
close #1231
close #1304
close #1316
close #1361
close #1385
close #1390
close #1414
close #1422
close #1425
close #1447
close #1458
close #1466
close #1485
close #1492
close #1495
close #1497
close #1536
close #1545
close #1561
close #1577
close #1600
close #1603
close #1678
close #1695
close #1739
close #1748
close #1765
close #1767
close #1776
close #1783
close #1799
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants