Skip to content

Conversation

@Maxxen
Copy link
Member

@Maxxen Maxxen commented Sep 4, 2025

This PR adds support for storing GEOMETRY types from the spatial extension, as a follow-up to duckdb/duckdb#18832

The geometry itself is just stored like native (geo)parquet geometry types, so there is really nothing spatial specific about the geometry support in the ducklake format itself.

Geometries don't use the typical "min/max" statistics but instead store a "bounding box" and a list of geometry sub-types present in the column (e.g. POINT, LINESTRING). In order to facilitate this in ducklake, the ducklake_table_column_stats and ducklake_file_column_statistics tables now have an extra extra_stats varchar column where semi-structured type-specific statistics can be stored. In the case of GEOMETRY, the extra_stats column is populated by a JSON object containing the bounding box and geometry subtype list, but this column could also be used by other extension types in the future.

There's a lot of code added to identify and cast geometries to WKB and back. This is because of the way the parquet extension interacts with spatial to handle geometries. It's a bit hacky, but we should be able to remove this in the future when we move the geometry type to core.

Filter pushdown for geometries is currently not implemented, but can be added to the extension later.

@pdet
Copy link
Collaborator

pdet commented Sep 5, 2025

It seems that the tests are never executed in CI since the geo library is not linked and require spatial is there.

Could we resolve this?

@Maxxen
Copy link
Member Author

Maxxen commented Sep 5, 2025

It seems that the tests are never executed in CI since the geo library is not linked and require spatial is there.

Could we resolve this?

Nor sure, but it gets run in the nightly build when we test all extensions right?

@Maxxen
Copy link
Member Author

Maxxen commented Sep 5, 2025

@pdet do you know if the config fail is expected? I haven't changed any storage stuff so I don't think its related to this PR.

@pdet
Copy link
Collaborator

pdet commented Sep 5, 2025

This CI basically runs all duckdb tests wrapped on ducklake, they might be failing due to changes in DuckDB or because they are new tests, in any case, you can add them to the skip file for now.

In any case, I'm not a big fan of having tests here that are only executed in core DuckDB. The reason for that is that our CI can get broken here and be dependent on a manual bump of the library in core to detect a failure. Maybe you can look at running these tests in the CI by linking the geo extension, and adding it to the --statically-loaded-extensions?

@Maxxen
Copy link
Member Author

Maxxen commented Sep 5, 2025

What is --statically-loaded-extensions?

The issue with linking spatial is that we can't build extensions (from other extensions) if they require vcpkg. In the main DuckDB repo there is a "make extension_configuration" step that will merge all the vcpkg.json:s from all loaded extensions into one so that the dependencies get propagated in the combined build, but AFAIK there is no such mechanism in extension-ci-tools.

@Mytherin
Copy link
Contributor

Mytherin commented Sep 8, 2025

The solution to running CI here is to download spatial when running against a release. @carlopi had this idea some time back. Essentially when running against a release target (which we often do for extensions) we can just download additional dependencies using INSTALL spatial. I think that's a better idea long-term than trying to get Spatial building and linking into the DuckLake repo. When running against a non-release target we can then just skip the test.

This requires some upstream fixes though - effectively making require XXX attempt an install, maybe with an additional flag. I would say let's just get this in for now and think about the CI situation later.

@carlopi
Copy link
Contributor

carlopi commented Sep 8, 2025

I think when a duckdb build exist remotely, something along the line of:

LOCAL_EXTENSION_REPO="http://extensions.duckdb.org" ./build/release/test/unittest --autoloading all --test-dir duckdb

should work already (then to be added the ducklake specific flags).

This likely needs a clean up, but it's already somewhat usable (when adding "Autoloading Error" to the regex of skipped errors).

Happy to have a look, at least at having this tested locally.

Copy link
Contributor

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM - some minor comments

@Mytherin
Copy link
Contributor

Mytherin commented Sep 8, 2025

I think when a duckdb build exist remotely, something along the line of:

LOCAL_EXTENSION_REPO="http://extensions.duckdb.org" ./build/release/test/unittest --autoloading all --test-dir duckdb

should work already (then to be added the ducklake specific flags).

This likely needs a clean up, but it's already somewhat usable (when adding "Autoloading Error" to the regex of skipped errors).

Happy to have a look, at least at having this tested locally.

The repo isn't pinned to a release yet so an install won't work yet. I think this is something we can best pick up post release.

@pdet
Copy link
Collaborator

pdet commented Sep 9, 2025

@Mytherin @Maxxen is this ready to go?

@Mytherin Mytherin merged commit 4e3c5fb into duckdb:main Sep 9, 2025
25 checks passed
@djouallah
Copy link

Fwiw, it is the first open source implementation of geometry in open table format, congratulations

J-Meyers added a commit to J-Meyers/ducklake that referenced this pull request Sep 9, 2025
J-Meyers added a commit to J-Meyers/ducklake that referenced this pull request Sep 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants