Skip to content

Conversation

@pmccaffrey6
Copy link

What is this PR for?

Zeppelin currently has little functionality for data source exploration. This PR exists to build a small feature for the JDBC interpreter that would allow users to explore metadata for databases and database objects.

With this PR, the JDBC interpreter accepts the "explore" keyword. When run in isolation, this fetches metadata about the database as a whole (tables, views etc...). When the explore keyword is followed by the name of a table or view, this fetches metadata about that table or view (column names, data types etc...).

A video of this feature in action can be found here (https://s3.amazonaws.com/screenshots-mockups/embedvid.html).

What type of PR is it?

Improvement | Feature

What is the Jira issue?

https://issues.apache.org/jira/browse/ZEPPELIN-1824

How should this be tested?

  1. Run explore in a %jdbc paragraph to get a list of tables, views, system tables, global and local temporary tables, aliases and synonyms.

  2. Run explore followed by the name of a database table or view in order to get a list of column names, data types etc...

Additionally, this PR adds two new unit tests to JDBCInterpreterTest which test fetching of database metadata as well as table and view metadata.

Screenshots (if appropriate)

https://s3.amazonaws.com/screenshots-mockups/embedvid.html

Questions:

  • Does the licenses files need update? No
  • Is there breaking changes for older versions? No
  • Does this needs documentation? Yes. This would benefit from a small addition to the JDBC Interpreter documentation.

@FireArrow
Copy link
Contributor

LGTM. Test failure seems unrelated.

ByteArrayOutputStream baos = new ByteArrayOutputStream();
PrintStream ps = new PrintStream(baos);
e.printStackTrace(ps);
String errorMsg = new String(baos.toByteArray(), StandardCharsets.UTF_8);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I remember, org.apache.commons.lang3.exception.ExceptionUtils#getStackTrace does the same thing

String user = interpreterContext.getAuthenticationInfo().getUser();
DatabaseMetaData dataBaseMetaData;
ResultSet resultSet = null;
ResultSetMetaData resultSetMetaData = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused variable

try {
closeDBPool(user, propertyKey);
} catch (SQLException e1) {
e1.printStackTrace();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to use logger, not to write to System.err

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huge 👍 for using Logger here, as in the rest of the project

resultSet.close();
} catch (SQLException e) { /*ignored*/ }
}
if (connection != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

connection can't be null here because we return at line 533 in case of null

@pmccaffrey6
Copy link
Author

Hey guys,
Thanks so much for the review! I removed the unused variable (great catch).

As for the other issues, I think they make a ton of sense. I wrote the getMetaData() method so that it pairs closely with the executeSql() method and so these issues are present in both.

Perhaps a good option would be to open a second PR to refactor the JDBC interpreter and make these changes in both getMetaData() and executeSql() methods? What do you think?

@pmccaffrey6
Copy link
Author

The failed test looks to be from testSparkSQLInterpreter from LivyInterpreterIT.java. More specifically, it comes from line 227 (and 226) in that file:

InterpreterResult result = sqlInterpreter.interpret("show tables", context);
assertEquals(InterpreterResult.Code.SUCCESS, result.code());

sqlInterpreter being an instance of LivySparkSQLInterpreter, it seems unrelated.

@pmccaffrey6
Copy link
Author

I went ahead and changed parsing of the "explore" keyword to use split(" +") instead of substring so that it wasn't tied to any particular word length etc.. and the METADATA_KEYWORD can be set to any single case-insensitive word.

As an additional detail, if you look up a table that doesn't exist you'll just get an empty table in the paragraph as opposed to an explicit error message. I experimented a bit with testing for an empty result set using resultSet.next() and resultSet.first() so that I could output an error message like "Database object not found".

However, it seems that when getting metadata, since you're not using Statement, this kind of thing causes problems because re-setting cursors on the resultset using methods like resultSet.beforeFirst() isn't supported for all data sources. I don't want to use Statement because I want to have this abstracted away from SQL and simply use the JDBC api to get metadata. So, in the interest of making this as general of a feature as possible, and therefore not using Statement, as well as not having a good way to test for resultSet emptiness that is adequately vendor-agnostic, there currently isn't testing for empty result sets, they just return as empty tables.

I don't imagine this is an issue really but if anyone has a suggestion of a good way to do this, I'd be very interested to hear it!

*/
package org.apache.zeppelin.jdbc;

import static javax.swing.plaf.basic.BasicHTML.propertyKey;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is javax.swing really required here?

@bzz
Copy link
Member

bzz commented Dec 20, 2016

Thank you @pmccaffrey6 , it looks great to me, modulo few things noted above.

Could you please double check that all feedback from reviews was addressed?
Also small documentation update might be in order there, to highlight this feature for the users.

@pmccaffrey6
Copy link
Author

Hey @bzz,
Great catch! Sorry about that. I removed the unnecessary import. The error seems unrelated. It comes from:

1470 21:03:56,259 ERROR org.apache.zeppelin.AbstractZeppelinIT:136 - Exception in ZeppelinIT while testSparkInterpreterDependencyLoading 
1471 org.openqa.selenium.TimeoutException: Timed out after 60 seconds waiting for org.apache.zeppelin.AbstractZeppelinIT$1@7918cba6
... Stack Trace ...
1507 Caused by: org.openqa.selenium.NoSuchElementException: Unable to locate element: {"method":"xpath","selector":"(//div[@ng-controller=\"ParagraphCtrl\"])[1]//div[contains(@class, 'control')]//span[1][contains(.,'FINISHED')]"}

As for the other excellent points that @DrIgor noted (including the great point about using logger), those issues are present both in getMetaData() as well as executeSql() methods. Would you prefer that I make those changes to the getMetaData() method in this PR or refactor both getMetaData() and executeSql() in a separate PR?

As always, thanks for your review!

@asfgit asfgit closed this in c38a0a0 May 9, 2018
asfgit pushed a commit that referenced this pull request May 9, 2018
close #83
close #86
close #125
close #133
close #139
close #146
close #193
close #203
close #246
close #262
close #264
close #273
close #291
close #299
close #320
close #347
close #389
close #413
close #423
close #543
close #560
close #658
close #670
close #728
close #765
close #777
close #782
close #783
close #812
close #822
close #841
close #843
close #878
close #884
close #918
close #989
close #1076
close #1135
close #1187
close #1231
close #1304
close #1316
close #1361
close #1385
close #1390
close #1414
close #1422
close #1425
close #1447
close #1458
close #1466
close #1485
close #1492
close #1495
close #1497
close #1536
close #1545
close #1561
close #1577
close #1600
close #1603
close #1678
close #1695
close #1739
close #1748
close #1765
close #1767
close #1776
close #1783
close #1799
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants