Support for ODBC driver connection type #116

kwigley · 2020-10-27T17:13:19Z

Resolves #104

Description

Add support for ODBC driver connections with Databricks via cluster specific path and sql endpoint path.

add pyodbc for using ODBC driver for connection
add driver and endpoint to profile config (cluster and endpoint are mutually exclusive, they determine how to connect to Databricks)
add integration and unit tests
- because this connection method uses a driver, I created a new image with the driver installed for integration tests to use. TBD where this Dockerfile lives!

Note

At the time of writing this, the new SQL Endpoint does not support create temporary view. Also, extended is prohibited by the ODBC driver for some operations dbt-labs/dbt-adapter-tests#8

jtcohen6

This is working for me locally, which is very exciting. Tiny comment for now around the nomenclature of the new connection endpoint.

jtcohen6 · 2020-10-28T14:39:35Z

dbt/adapters/spark/connections.py

+class SparkClusterType(StrEnum):
+    ALL_PURPOSE = "all-purpose"
+    VIRTUAL = "virtual"


Databricks isn't using the name "virtual clusters" anymore, I believe they're just calling them "endpoints."

Rather than a combination of cluster and cluster_type, I think we should make the distinction between cluster (old style, all-purpose/interactive) and endpoint (new style). Users should specify either a cluster or an endpoint when connecting via odbc.

kwigley · 2020-11-02T17:28:06Z

dbt/adapters/spark/connections.py

+                f"{self.method} connection method requires "
+                "additional dependencies. \n"
+                "Install the additional required dependencies with "
+                "`pip install dbt-spark[ODBC]`"


I landed on pip install dbt-spark[ODBC] because pyodbc is the only python dep that I imagine will require OS dependencies. Also, I think this should line up with connection methods (thrift, http, odbc), rather than connection locations (Databricks, etc..)?

Eventually, I think we might want to try moving PyHive[hive] as an extra instead of principal requirement, since it's only necessary for the http connection method.

Justification: we see some installation errors (e.g. #114) resulting from less-maintained dependencies exclusive to PyHive

Not something we need to do right now!

README.md

.circleci/config.yml

jtcohen6

This looks great! Thanks for the excellent, wide-reaching work to get this running with automated tests.

jtcohen6 · 2020-11-05T19:39:10Z

dbt/adapters/spark/connections.py

+                f"{self.method} connection method requires "
+                "additional dependencies. \n"
+                "Install the additional required dependencies with "
+                "`pip install dbt-spark[ODBC]`"


jtcohen6 · 2020-11-05T19:59:36Z

dbt/adapters/spark/connections.py

+                f"{self.method} connection method requires "
+                "additional dependencies. \n"
+                "Install the additional required dependencies with "
+                "`pip install dbt-spark[ODBC]`"


Eventually, I think we might want to try moving PyHive[hive] as an extra instead of principal requirement, since it's only necessary for the http connection method.

Justification: we see some installation errors (e.g. #114) resulting from less-maintained dependencies exclusive to PyHive

Not something we need to do right now!

jtcohen6

A few tiny notes after rechecking readme

README.md

drewbanin · 2020-11-05T20:26:01Z

FYI removing myself from review, but this looks chefs-kiss.jpeg

Co-authored-by: Jeremy Cohen <[email protected]>

Kyle Wigley added 2 commits October 22, 2020 15:48

add odbc connection type with all-purpose/virtual cluster support

7ca029f

fix SQL parameter style for pyodbc

25bfc61

kwigley mentioned this pull request Oct 27, 2020

Rename seed for dbt-spark integration tests dbt-labs/dbt-adapter-tests#8

Merged

jtcohen6 reviewed Oct 28, 2020

View reviewed changes

testing

9366d2e

kwigley force-pushed the odbc-driver-support branch from 47381a5 to 9366d2e Compare October 29, 2020 15:50

update credentials schema and integration tests

c2dc0fd

kwigley force-pushed the odbc-driver-support branch 3 times, most recently from 1e84db3 to cfd804b Compare October 30, 2020 19:24

set up integration tests

a18be08

kwigley force-pushed the odbc-driver-support branch from cfd804b to a18be08 Compare October 30, 2020 20:03

add unit test and update integration test image for ODBC tests

165d83b

kwigley force-pushed the odbc-driver-support branch from bb62760 to 165d83b Compare November 2, 2020 16:53

kwigley requested a review from drewbanin November 2, 2020 17:25

kwigley commented Nov 2, 2020

View reviewed changes

kwigley marked this pull request as ready for review November 2, 2020 17:28

update docs

9db1e6d

kwigley commented Nov 2, 2020

View reviewed changes

README.md Outdated Show resolved Hide resolved

Update README.md

95e0d72

kwigley self-assigned this Nov 2, 2020

kwigley added the enhancement New feature or request label Nov 2, 2020

kwigley commented Nov 4, 2020

View reviewed changes

.circleci/config.yml Outdated Show resolved Hide resolved

Update .circleci/config.yml

534486e

kwigley requested review from jtcohen6 and gshank November 5, 2020 19:05

jtcohen6 approved these changes Nov 5, 2020

View reviewed changes

jtcohen6 reviewed Nov 5, 2020

View reviewed changes

README.md Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

drewbanin removed their request for review November 5, 2020 20:25

Apply suggestions from code review

f882e15

Co-authored-by: Jeremy Cohen <[email protected]>

gshank approved these changes Nov 5, 2020

View reviewed changes

kwigley merged commit 1bbe718 into master Nov 6, 2020

kwigley deleted the odbc-driver-support branch November 6, 2020 14:32

jtcohen6 mentioned this pull request Nov 10, 2020

Document dbt-spark ODBC connection dbt-labs/docs.getdbt.com#454

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for ODBC driver connection type #116

Support for ODBC driver connection type #116

kwigley commented Oct 27, 2020 •

edited

Loading

jtcohen6 left a comment

jtcohen6 Oct 28, 2020 •

edited

Loading

kwigley Nov 2, 2020 •

edited

Loading

jtcohen6 Nov 5, 2020

jtcohen6 Nov 5, 2020 •

edited

Loading

jtcohen6 left a comment

jtcohen6 Nov 5, 2020

jtcohen6 Nov 5, 2020 •

edited

Loading

jtcohen6 left a comment

drewbanin commented Nov 5, 2020

Support for ODBC driver connection type #116

Support for ODBC driver connection type #116

Conversation

kwigley commented Oct 27, 2020 • edited Loading

Description

Note

jtcohen6 left a comment

Choose a reason for hiding this comment

jtcohen6 Oct 28, 2020 • edited Loading

Choose a reason for hiding this comment

kwigley Nov 2, 2020 • edited Loading

Choose a reason for hiding this comment

jtcohen6 Nov 5, 2020

Choose a reason for hiding this comment

jtcohen6 Nov 5, 2020 • edited Loading

Choose a reason for hiding this comment

jtcohen6 left a comment

Choose a reason for hiding this comment

jtcohen6 Nov 5, 2020

Choose a reason for hiding this comment

jtcohen6 Nov 5, 2020 • edited Loading

Choose a reason for hiding this comment

jtcohen6 left a comment

Choose a reason for hiding this comment

drewbanin commented Nov 5, 2020

kwigley commented Oct 27, 2020 •

edited

Loading

jtcohen6 Oct 28, 2020 •

edited

Loading

kwigley Nov 2, 2020 •

edited

Loading

jtcohen6 Nov 5, 2020 •

edited

Loading

jtcohen6 Nov 5, 2020 •

edited

Loading