-
Notifications
You must be signed in to change notification settings - Fork 143
Add support for spatial GEOMETRY types + add extra_stats column
#412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
It seems that the tests are never executed in CI since the geo library is not linked and Could we resolve this? |
Nor sure, but it gets run in the nightly build when we test all extensions right? |
|
@pdet do you know if the config fail is expected? I haven't changed any storage stuff so I don't think its related to this PR. |
|
This CI basically runs all duckdb tests wrapped on ducklake, they might be failing due to changes in DuckDB or because they are new tests, in any case, you can add them to the skip file for now. In any case, I'm not a big fan of having tests here that are only executed in core DuckDB. The reason for that is that our CI can get broken here and be dependent on a manual bump of the library in core to detect a failure. Maybe you can look at running these tests in the CI by linking the geo extension, and adding it to the |
|
What is The issue with linking spatial is that we can't build extensions (from other extensions) if they require vcpkg. In the main DuckDB repo there is a "make extension_configuration" step that will merge all the |
|
The solution to running CI here is to download spatial when running against a release. @carlopi had this idea some time back. Essentially when running against a release target (which we often do for extensions) we can just download additional dependencies using This requires some upstream fixes though - effectively making |
|
I think when a duckdb build exist remotely, something along the line of: should work already (then to be added the ducklake specific flags). This likely needs a clean up, but it's already somewhat usable (when adding "Autoloading Error" to the regex of skipped errors). Happy to have a look, at least at having this tested locally. |
Mytherin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! LGTM - some minor comments
The repo isn't pinned to a release yet so an install won't work yet. I think this is something we can best pick up post release. |
|
Fwiw, it is the first open source implementation of geometry in open table format, congratulations |
This PR adds support for storing
GEOMETRYtypes from thespatialextension, as a follow-up to duckdb/duckdb#18832The geometry itself is just stored like native (geo)parquet geometry types, so there is really nothing
spatialspecific about the geometry support in the ducklake format itself.Geometries don't use the typical "min/max" statistics but instead store a "bounding box" and a list of geometry sub-types present in the column (e.g. POINT, LINESTRING). In order to facilitate this in ducklake, the
ducklake_table_column_statsandducklake_file_column_statisticstables now have an extraextra_statsvarchar column where semi-structured type-specific statistics can be stored. In the case ofGEOMETRY, theextra_statscolumn is populated by a JSON object containing the bounding box and geometry subtype list, but this column could also be used by other extension types in the future.There's a lot of code added to identify and cast geometries to WKB and back. This is because of the way the parquet extension interacts with spatial to handle geometries. It's a bit hacky, but we should be able to remove this in the future when we move the geometry type to core.
Filter pushdown for geometries is currently not implemented, but can be added to the extension later.