[ZEPPELIN-1824] Add MetaData exploration to JDBC Interpreter #1776

pmccaffrey6 · 2016-12-16T09:31:41Z

What is this PR for?

Zeppelin currently has little functionality for data source exploration. This PR exists to build a small feature for the JDBC interpreter that would allow users to explore metadata for databases and database objects.

With this PR, the JDBC interpreter accepts the "explore" keyword. When run in isolation, this fetches metadata about the database as a whole (tables, views etc...). When the explore keyword is followed by the name of a table or view, this fetches metadata about that table or view (column names, data types etc...).

A video of this feature in action can be found here (https://s3.amazonaws.com/screenshots-mockups/embedvid.html).

What type of PR is it?

Improvement | Feature

What is the Jira issue?

https://issues.apache.org/jira/browse/ZEPPELIN-1824

How should this be tested?

Run explore in a %jdbc paragraph to get a list of tables, views, system tables, global and local temporary tables, aliases and synonyms.
Run explore followed by the name of a database table or view in order to get a list of column names, data types etc...

Additionally, this PR adds two new unit tests to JDBCInterpreterTest which test fetching of database metadata as well as table and view metadata.

Screenshots (if appropriate)

https://s3.amazonaws.com/screenshots-mockups/embedvid.html

Questions:

Does the licenses files need update? No
Is there breaking changes for older versions? No
Does this needs documentation? Yes. This would benefit from a small addition to the JDBC Interpreter documentation.

FireArrow · 2016-12-16T11:23:43Z

LGTM. Test failure seems unrelated.

DrIgor · 2016-12-16T13:36:27Z

jdbc/src/main/java/org/apache/zeppelin/jdbc/JDBCInterpreter.java

+      ByteArrayOutputStream baos = new ByteArrayOutputStream();
+      PrintStream ps = new PrintStream(baos);
+      e.printStackTrace(ps);
+      String errorMsg = new String(baos.toByteArray(), StandardCharsets.UTF_8);


As I remember, org.apache.commons.lang3.exception.ExceptionUtils#getStackTrace does the same thing

DrIgor · 2016-12-16T13:43:54Z

jdbc/src/main/java/org/apache/zeppelin/jdbc/JDBCInterpreter.java

+    String user = interpreterContext.getAuthenticationInfo().getUser();
+    DatabaseMetaData dataBaseMetaData;
+    ResultSet resultSet = null;
+    ResultSetMetaData resultSetMetaData = null;


Unused variable

DrIgor · 2016-12-16T13:45:00Z

jdbc/src/main/java/org/apache/zeppelin/jdbc/JDBCInterpreter.java

+      try {
+        closeDBPool(user, propertyKey);
+      } catch (SQLException e1) {
+        e1.printStackTrace();


It's better to use logger, not to write to System.err

Huge 👍 for using Logger here, as in the rest of the project

DrIgor · 2016-12-16T13:54:26Z

jdbc/src/main/java/org/apache/zeppelin/jdbc/JDBCInterpreter.java

+            resultSet.close();
+          } catch (SQLException e) { /*ignored*/ }
+        }
+        if (connection != null) {


connection can't be null here because we return at line 533 in case of null

pmccaffrey6 · 2016-12-16T16:54:40Z

Hey guys,
Thanks so much for the review! I removed the unused variable (great catch).

As for the other issues, I think they make a ton of sense. I wrote the getMetaData() method so that it pairs closely with the executeSql() method and so these issues are present in both.

Perhaps a good option would be to open a second PR to refactor the JDBC interpreter and make these changes in both getMetaData() and executeSql() methods? What do you think?

pmccaffrey6 · 2016-12-17T04:28:46Z

The failed test looks to be from testSparkSQLInterpreter from LivyInterpreterIT.java. More specifically, it comes from line 227 (and 226) in that file:

InterpreterResult result = sqlInterpreter.interpret("show tables", context);
assertEquals(InterpreterResult.Code.SUCCESS, result.code());

sqlInterpreter being an instance of LivySparkSQLInterpreter, it seems unrelated.

pmccaffrey6 · 2016-12-19T02:35:27Z

I went ahead and changed parsing of the "explore" keyword to use split(" +") instead of substring so that it wasn't tied to any particular word length etc.. and the METADATA_KEYWORD can be set to any single case-insensitive word.

As an additional detail, if you look up a table that doesn't exist you'll just get an empty table in the paragraph as opposed to an explicit error message. I experimented a bit with testing for an empty result set using resultSet.next() and resultSet.first() so that I could output an error message like "Database object not found".

However, it seems that when getting metadata, since you're not using Statement, this kind of thing causes problems because re-setting cursors on the resultset using methods like resultSet.beforeFirst() isn't supported for all data sources. I don't want to use Statement because I want to have this abstracted away from SQL and simply use the JDBC api to get metadata. So, in the interest of making this as general of a feature as possible, and therefore not using Statement, as well as not having a good way to test for resultSet emptiness that is adequately vendor-agnostic, there currently isn't testing for empty result sets, they just return as empty tables.

I don't imagine this is an issue really but if anyone has a suggestion of a good way to do this, I'd be very interested to hear it!

bzz · 2016-12-20T01:42:27Z

jdbc/src/main/java/org/apache/zeppelin/jdbc/JDBCInterpreter.java

 */
 package org.apache.zeppelin.jdbc;

+import static javax.swing.plaf.basic.BasicHTML.propertyKey;


Is javax.swing really required here?

bzz · 2016-12-20T01:48:47Z

Thank you @pmccaffrey6 , it looks great to me, modulo few things noted above.

Could you please double check that all feedback from reviews was addressed?
Also small documentation update might be in order there, to highlight this feature for the users.

pmccaffrey6 · 2016-12-20T21:41:14Z

Hey @bzz,
Great catch! Sorry about that. I removed the unnecessary import. The error seems unrelated. It comes from:

1470 21:03:56,259 ERROR org.apache.zeppelin.AbstractZeppelinIT:136 - Exception in ZeppelinIT while testSparkInterpreterDependencyLoading 
1471 org.openqa.selenium.TimeoutException: Timed out after 60 seconds waiting for org.apache.zeppelin.AbstractZeppelinIT$1@7918cba6
... Stack Trace ...
1507 Caused by: org.openqa.selenium.NoSuchElementException: Unable to locate element: {"method":"xpath","selector":"(//div[@ng-controller=\"ParagraphCtrl\"])[1]//div[contains(@class, 'control')]//span[1][contains(.,'FINISHED')]"}

As for the other excellent points that @DrIgor noted (including the great point about using logger), those issues are present both in getMetaData() as well as executeSql() methods. Would you prefer that I make those changes to the getMetaData() method in this PR or refactor both getMetaData() and executeSql() in a separate PR?

As always, thanks for your review!

close #83 close #86 close #125 close #133 close #139 close #146 close #193 close #203 close #246 close #262 close #264 close #273 close #291 close #299 close #320 close #347 close #389 close #413 close #423 close #543 close #560 close #658 close #670 close #728 close #765 close #777 close #782 close #783 close #812 close #822 close #841 close #843 close #878 close #884 close #918 close #989 close #1076 close #1135 close #1187 close #1231 close #1304 close #1316 close #1361 close #1385 close #1390 close #1414 close #1422 close #1425 close #1447 close #1458 close #1466 close #1485 close #1492 close #1495 close #1497 close #1536 close #1545 close #1561 close #1577 close #1600 close #1603 close #1678 close #1695 close #1739 close #1748 close #1765 close #1767 close #1776 close #1783 close #1799

DrIgor suggested changes Dec 16, 2016

View reviewed changes

pmccaffrey6 force-pushed the jdbc-metadata branch from ac3eb21 to 85124f5 Compare December 16, 2016 16:47

pmccaffrey6 force-pushed the jdbc-metadata branch from 097f06e to b973d5f Compare December 18, 2016 23:35

bzz reviewed Dec 20, 2016

View reviewed changes

Peter McCaffrey added 21 commits December 20, 2016 15:45

add getmetadata method

92bd1d4

added more table types

3df75f5

checkstyle

c9d319c

checkstyle

b561519

removed comments

c94a7a1

added tests

7e5f9a2

test

a046041

check cmd length

fde7b40

test table content

76113c2

test table content

aa5d5e1

test table content

04dfeda

test table content

40378f7

test table content

4b6e750

test table content

5519e2f

test table content

85bd918

test table content

8d3ce29

test table content

320c73f

test table content

01d35a3

test table content

25484fa

add comment

38de569

dont optimize imports

891aba8

Peter McCaffrey added 15 commits December 20, 2016 15:45

dont optimize imports

ac3bbef

organize

382e339

remove unused variable

527c383

improve tableName comparison

59989d9

improve tableName comparison

a1800ae

fix casing

81ca4d4

better error message

6cbde87

better error message

76443ae

fixed resultSet cursor

fb1bd89

fix scrolling

c2a77b5

fix scrolling

be10925

fix scrolling

319c93a

fix string split

897c2a9

fix tests

257ab5a

remove unnecessary import

6a6c7ad

pmccaffrey6 force-pushed the jdbc-metadata branch from b973d5f to 6a6c7ad Compare December 20, 2016 20:45

asfgit closed this in c38a0a0 May 9, 2018

[ZEPPELIN-1824] Add MetaData exploration to JDBC Interpreter #1776

[ZEPPELIN-1824] Add MetaData exploration to JDBC Interpreter #1776

Uh oh!

Conversation

pmccaffrey6 commented Dec 16, 2016

What is this PR for?

What type of PR is it?

What is the Jira issue?

How should this be tested?

Screenshots (if appropriate)

Questions:

Uh oh!

FireArrow commented Dec 16, 2016

Uh oh!

DrIgor Dec 16, 2016

Choose a reason for hiding this comment

Uh oh!

DrIgor Dec 16, 2016

Choose a reason for hiding this comment

Uh oh!

DrIgor Dec 16, 2016

Choose a reason for hiding this comment

Uh oh!

bzz Dec 20, 2016

Choose a reason for hiding this comment

Uh oh!

DrIgor Dec 16, 2016

Choose a reason for hiding this comment

Uh oh!

pmccaffrey6 commented Dec 16, 2016

Uh oh!

pmccaffrey6 commented Dec 17, 2016

Uh oh!

pmccaffrey6 commented Dec 19, 2016

Uh oh!

bzz Dec 20, 2016

Choose a reason for hiding this comment

Uh oh!

bzz commented Dec 20, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pmccaffrey6 commented Dec 20, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bzz commented Dec 20, 2016 •

edited

Loading