Revert to pyarrow v20 for compatibility with stale Kaggle geopandas#4589
Merged
zaneselvans merged 1 commit intomainfrom Sep 2, 2025
Merged
Revert to pyarrow v20 for compatibility with stale Kaggle geopandas#4589zaneselvans merged 1 commit intomainfrom
zaneselvans merged 1 commit intomainfrom
Conversation
Almost immediately after adding GeoParquet outputs to PUDL, we updated to using pyarrow 21.0, which now provides native support for the GEOMETRY and GEOGRAPHY data types, which is great, since that means the geoparquuet / geoarrow extensions to support the (previously) non-standard data types are no longer necessary. See: * apache/arrow#45459 * apache/arrow#45522 Unfortunately, Kaggle is stuck on geopandas 0.14.1 (released in April of 2024) due to what was at least at some point an incompatibility with the scikit-learn package. I created an issue asking them to update to modern geopandas or at least check whether the incompatibility still exists: Kaggle/docker-python#1491 For the moment I think the easiest way back to working notebooks is to downgrade our pyarrow to v20.0.0. It might also be the case that we no longer need to add the bespoke `b"geo"` metadata in our IO manager with pyarrow v21.0.0 and native GeoParquet support? But that would require more investigation. I tried recreating the GeoParquet outputs locally with pyarrow v20 and then reading them with the stale versions of geopandas from Kaggle and it worked, while those stale versions couldn't read the local geopandas outputs from pyarrow v21.
jdangerx
approved these changes
Sep 2, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Almost immediately after adding GeoParquet outputs to PUDL, we updated to using pyarrow 21.0, which now provides native support for the GEOMETRY and GEOGRAPHY data types, which is great, since that means the geoparquuet / geoarrow extensions to support the (previously) non-standard data types are no longer necessary.
See:
Unfortunately, Kaggle is stuck on geopandas 0.14.1 (released in April of 2024) due to what was at least at some point an incompatibility with the scikit-learn package.
I created an issue asking them to update to modern geopandas or at least check whether the incompatibility still exists:
Kaggle/docker-python#1491
For the moment I think the easiest way back to working notebooks is to downgrade our pyarrow to v20.0.0.
It might also be the case that we no longer need to add the bespoke
b"geo"metadata in our IO manager with pyarrow v21.0.0 and native GeoParquet support? But that would require more investigation.I tried recreating the GeoParquet outputs locally with pyarrow v20 and then reading them with the stale versions of geopandas from Kaggle and it worked, while those stale versions couldn't read the local geopandas outputs from pyarrow v21.