-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Add DuckDB connector #23419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DuckDB connector #23419
Conversation
8ed5d45 to
914954c
Compare
This comment was marked as outdated.
This comment was marked as outdated.
68b7316 to
13afd4d
Compare
|
Thanks for contribution, sir. When can this PR merged and release? 🧐 |
41efaa5 to
00387f2
Compare
Some databases may map varchar(5) to different types.
plugin/trino-duckdb/src/test/java/io/trino/plugin/duckdb/TestDuckDbPlugin.java
Outdated
Show resolved
Hide resolved
plugin/trino-duckdb/src/test/java/io/trino/plugin/duckdb/TestDuckDbConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-duckdb/src/test/java/io/trino/plugin/duckdb/TestingDuckDb.java
Outdated
Show resolved
Hide resolved
plugin/trino-duckdb/src/test/java/io/trino/plugin/duckdb/TestingDuckDb.java
Outdated
Show resolved
Hide resolved
wendigo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. Since this is a new connector we can ship it in the initial form and improve it from here.
|
Whoa! Amazing .. two new connectors in 470 at this stage .. |
plugin/trino-duckdb/src/main/java/io/trino/plugin/duckdb/DuckDbClient.java
Show resolved
Hide resolved
|
@ebyhr how does the connector isolate duckdb/sandbox from the env? This page lays out several scenarios where you can use duckdb to access the underlying os: https://duckdb.org/docs/operations_manual/securing_duckdb/overview.html |
@StephenOTT the Trino DuckDB connector just connects to an external DuckDB instance via a JDBC driver. How that DuckDB is deployed, managed, secured, and sandboxed is completely outside of the scope of the connector. |
|
@mosabua based on that position then it would see that the connector would expose passwords stored locally on the trino server https://duckdb.org/docs/operations_manual/securing_duckdb/overview.html#disabling-file-access Theoretically someone would then be able to access file based catalog credentials. The issue is not the duckdb instance storage, but rather duckDb's ability to query locations such as the local file system the duckDb instance is running on (per the securing duckdb url above) |
|
No .. why would it expose passwords stored on the Trino server. DuckDB should be isolated and run separately from Trino on an entirely different server. And Trino connect to it via JDBC (network..). Data sources in Trino are never run on the same server. And yes, you probably need to secure that data source properly and how that is done varies a lot for each data source. |
|
DuckDB is an embedded in-process database engine similar to sqlite. There is no such thing as a server to connect to. https://duckdb.org/why_duckdb#simple |
|
@StephenOTT you need to explicitly configure this connector in order to do anything with it, by default there is no such catalog created. This applies to all connectors. |
|
Understood and agreed that it is an explicit configuration. However it would seem without sandboxing it would warrant a very explicit warning in the connector docs for duckdb. Are there other existing connectors that provide the ability to read any file based credential on the server just by activating the catalog? (Remember that in this Scenario it is not about where the db file is storage, it is merely just the activation of the db instance that allows you to then load a local file in the OS) |
|
@StephenOTT python functions are on by default and cannot be disabled. That's the major difference here. I'm not aware of any such connector like DuckDB. |
|
The easy mistake made here is someone is going to activate duckdb (through file catalog or dynamic catalogs) and assume it is "safe" in the same sense of safe as the other connectors: it provides a corridor to other systems/sources. The configurations make it appear that way as well with the duckdb db file path. But with duckdb we end up with a very specific vector that supports local OS file reading and thus by creating a duckdb instance we open the ability to expose the credentials on the os. I want to use duckdb connector. It is a great addition. I am only raising a very real security vector that admins would be very uncomfortable with. |
|
Fair enough .. the fact remains that you should NOT run DuckDB on the same instance / server as Trino. We consider all data sources as external systems and Trino does not attempt to manage them. From what I understand you cant just "activate DuckDB" by creating a catalog that uses the connector .. you still got to install and run DuckDB on your own somewhere .. and that should not be the Trino cluster nodes. I agree with your suggestions to add some sort of warning to the documentation though. Can you send a PR @StephenOTT ? We could add that info to the Requirements section of the connector docs. Maybe we need to detail info about the path for the jdbc url being to a shared storage and add a link to the security info and concerns. |
|
Can you clarify? Because the jdbc driver for duckdb is the "engine": https://duckdb.org/docs/clients/java.html When you provide a jdbc path, you are just providing a location for the file(s)/storage of the "database". But the processing power is still at the driver level. In DuckDbClientModule.class the DuckDbDriver is instantiated. There is no "duckdb compute database" you are connecting to. |
|
Oh .. I did not realize the JDBC driver for DuckDB is actually not just a driver.. sigh. So we definitely need to update the docs .. also because I wonder how this even works with a multi cluster node .. the path probably needs to be to a shared storage that is mounted on the same location .. but then I wonder if that works with multiple workers accessing DuckDB.. Please send a PR |
|
Sent PR now #25146 .. lets discuss more there @StephenOTT |
Description
Add DuckDB connector. This connector will be helpful for small local use case like #23344
Fixes #18031
Release notes