-
Notifications
You must be signed in to change notification settings - Fork 334
Initial work on DuckDB #14114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial work on DuckDB #14114
Conversation
|
distribution/lib/Standard/Microsoft/0.0.0-dev/docs/api/SQLServer_Connection.md
Show resolved
Hide resolved
At this point in testing it, DuckDB is expected to be very core to deliver some of the base functionality, like spatial and some reading and writing of files. If however it doesn't pan out to be useful, I will isolate to a separate library once have gone further along the path. |
The fact that DuckDB is believed to be essential from product point of view doesn't mean we need to compromise the architecture. It'd be better if none of the core Table&Database concepts (like spatial) were tight to particular implementation. |
|
|
At least how I envision using DuckDB this will be in the NI space for a period of time. I will put it as its own library and add to the standard list of import - assuming can do quickly as need to move on this. |
31f1fa4 to
e8660f8
Compare
distribution/lib/Standard/Database/0.0.0-dev/src/Connection/Parquet_Format.enso
Outdated
Show resolved
Hide resolved
Akirathan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose adding DuckDB to NI is OK. I am curious if you will be able to pass the NI size check though.
| .listFiles("*.jar") | ||
| .map(_.getAbsolutePath()) ++ | ||
| `std-tableau-polyglot-root` | ||
| .listFiles("*.jar") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I would really prefer if we could go without these three lines
- it is an excellent opportunity to make dual JVM mode work smoothly
- I promise to address any inquiries ASAP
- btw. try the experience execute:
sbt:enso> runEngineDistribution --jvm
--run test/DuckDB_Tests
--vm.D=polyglot.enso.classLoading=Standard.DuckDB:guest,enso_dev.DuckDB_Test:guest,hosted
4 tests succeeded.
0 tests failed.
0 tests skipped.
0 groups skipped.
- does it work? Yes, then let's try it. No, tell me what's failing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test code:
from Standard.Base import all
from Standard.Table import all
from Standard.Database import all
from Standard.DuckDB import all
main =
connection5 = Database.connect DuckDB.In_Memory
table4 = connection5.execute_query 'SELECT * FROM "/mnt/c/Users/jdunk/Downloads/flights-1m.parquet"' ..All_Rows write_operation=False
any10 = table4.aggregate columns=[..Count, ..Maximum 'ARR_DELAY']
any10.display
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
distribution/lib/Standard/Database/0.0.0-dev/src/Connection/Connection.enso
Show resolved
Hide resolved
| "libduckdb_java.so_osx_universal" -> PolyglotLib(MacOSAMD64), | ||
| "libduckdb_java.so_windows_amd64" -> PolyglotLib(WindowsAMD64), | ||
| "META-INF/**" -> CopyToOutputJar, | ||
| "org/**/*.class" -> CopyToOutputJar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am getting:
sbt:enso> runEngineDistribution --jvm --run test.enso
Execution finished with an error: Can't load library: /enso/build/debug/libduckdb_java.so_linux_amd64
at <java> java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2249)
at <java> java.base/java.lang.Runtime.load0(Runtime.java:767)
at <java> java.base/java.lang.System.load(System.java:1624)
at <java> org.duckdb.DuckDBNative.<clinit>(DuckDBNative.java:52)
at <java> org.duckdb.DuckDBConnection.newConnection(DuckDBConnection.java:62)
at <java> org.duckdb.DuckDBDriver.connect(DuckDBDriver.java:114)
at <java> java.sql/java.sql.DriverManager.getConnection(DriverManager.java:613)
at <java> java.sql/java.sql.DriverManager.getConnection(DriverManager.java:160)
at <java> org.enso.base.enso_cloud.EnsoSecretHelper.getJDBCConnection(EnsoSecretHelper.java:51)
where the test.enso. We need to patch DuckDBNative.java (upstream as well, please) to use System.loadLibrary...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, CI reports the same (it is masked by ClassNotFoundException)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related duckdb/duckdb-java#180
| *.pdb | ||
| *.so | ||
| *.jar | ||
| *.so_linux_amd64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This is wrong idea.
- We need
.soextension otherwiseSystem.loadLibrary("duckdb")will not find it - but we need to modify DuckDBNative
- to call
System.loadLibraryfirst before trying the equilibristic with extracting toFile.createTemp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic in DuckDBNative appears to have been added to support local dev. I agree that logic is rather flawed if one wants to use the jar as a 3rd party dependency.
| jdbc_connection = JDBC_Connection.from_java sql_exception=SQLException java_connection | ||
|
|
||
| ## For initial work using DuckDB, we will use the Postgres dialect. | ||
| DuckDB_Connection.Value (Connection.new jdbc_connection Dialect.postgres DuckDB_Entity_Naming_Properties.new) make_new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- using
Dialect.postgresdoesn't allow us to use the same trick asStandard.Microsoftdoes - there is SQLServer_Type_Mapping.column_fetcher_factory which loads a class from other JVM
- if we did the same for
DuckDB, then following worked:
sbt:enso> runEngineDistribution --jvm --run test.enso --vm.D=polyglot.enso.classLoading=Standard.DuckDB:guest,hosted
Execution finished with an error: Cannot convert '2006-01-01'(language: Enso Polyglot Bridge, type: java.sql.Date) to Java type 'java.sql.Date': Unsupported target type.
at <java> com.oracle.truffle.polyglot.PolyglotObjectProxyHandler.invoke(PolyglotObjectProxyHandler.java:112)
at <java> jdk.proxy3/jdk.proxy3.$Proxy46.getDate(Unknown Source)
at <java> org.enso.database.JDBCUtils.getLocalDate(JDBCUtils.java:35)
at <java> org.enso.database.fetchers.ColumnFetcherFactory$DefaultColumnFetcherFactory$5.getValue(ColumnFetcherFactory.java:72)
at <java> org.enso.database.fetchers.BaseColumnFetcher.append(BaseColumnFetcher.java:51)
at <java> org.enso.database.fetchers.ColumnFetcher.readResultSet(ColumnFetcher.java:51)
at <enso> Result_Set.result_set_to_table.aggregator<arg-2>(Result_Set.enso:38:13-126)
at <enso> Result_Set.result_set_to_table.aggregator(Result_Set.enso:37-38)
at <enso> Java_Problems.with_problem_aggregator(Java_Problems.enso:116:14-25)
at <enso> Result_Set.result_set_to_table(Result_Set.enso:35-39)
at <enso> case_branch.statement_setter(Connection.enso:404:25-143)
- where test.enso is exactly the snippet being executed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will be working on a new dialect for DuckDB next
Co-authored-by: Pavel Marek <[email protected]>
2452d0e to
f90b56e
Compare
DuckDBNative needed to be patched so that it can be loaded as a shared library by name in Native Image. Additionally, it looks like `duckdb_java` didn't follow any standards when it comes to naming the library name: - always prefixed with `lib` (not acceptable on Windows) - suffix includes OS and arch (should be one of `.so`, `.dll` or `.dylib`)
| static { | ||
| var libName = "duckdb_java"; | ||
| try { | ||
| System.loadLibrary(libName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd try to push this System.loadLibrary("duckdb_java") upstream to see how open the project is to 3rd party contributions.
Pull Request Description
Next PR will add a dialect and bring into unit tests.
Checklist
Please ensure that the following checklist has been satisfied before submitting the PR:
Scala,
Java,
TypeScript,
and
Rust
style guides. In case you are using a language not listed above, follow the Rust style guide.
or the Snowflake database integration, a run of the Extra Tests has been scheduled.