-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-53760][Geo][SQL] Introduce GeometryType and GeographyType #52491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
uros-db
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mkaravel Please review.
szehon-ho
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @uros-db for kicking it off! left some comments/questions as i read it
sql/api/src/main/scala/org/apache/spark/sql/catalyst/parser/DataTypeAstBuilder.scala
Outdated
Show resolved
Hide resolved
sql/api/src/main/scala/org/apache/spark/sql/types/CrsMappings.java
Outdated
Show resolved
Hide resolved
sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala
Outdated
Show resolved
Hide resolved
sql/api/src/main/scala/org/apache/spark/sql/types/GeographyType.scala
Outdated
Show resolved
Hide resolved
sql/api/src/main/scala/org/apache/spark/sql/types/GeometryType.scala
Outdated
Show resolved
Hide resolved
sql/api/src/main/scala/org/apache/spark/sql/types/GeographyType.scala
Outdated
Show resolved
Hide resolved
sql/api/src/main/scala/org/apache/spark/sql/types/GeometryType.scala
Outdated
Show resolved
Hide resolved
|
Hi folks, please re-review. @szehon-ho @mkaravel |
mkaravel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM. Please take a look at the comments I posted.
Thank you for this PR!
sql/api/src/main/scala/org/apache/spark/sql/types/SpatialReferenceMapper.java
Outdated
Show resolved
Hide resolved
sql/api/src/main/scala/org/apache/spark/sql/types/SpatialReferenceMapper.java
Outdated
Show resolved
Hide resolved
sql/api/src/main/scala/org/apache/spark/sql/types/SpatialReferenceMapper.java
Outdated
Show resolved
Hide resolved
mkaravel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@uros-db Thank you for addressing the comments!
LGTM.
|
@cloud-fan @szehon-ho Please review. Also cc @srielau (we have some new error classes here). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, be nice to document the expectation of the GeographyType and GeometryType argument (what is expected, not expected and only internal), as its a user facing API: ref: #52491 (comment)
|
Great work. The Apache Sedona team was not aware of or informed by this PR. We are now looking into this. |
|
We will review and comment on this PR. This is to ensure it does not create incompatibility with Apache Sedona. |
|
Thank you @jiayuasu. We will hold the merge until you have enough time to review. Please feel free to let me know if you have any questions or concerns. |
paleolimbot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool! Just a few notes from the GeoArrow/Parquet compatibility angle where most CRSes are written by spatial tools with a PROJJSON or authority:code CRS.
sql/api/src/main/scala/org/apache/spark/sql/types/GeographyType.scala
Outdated
Show resolved
Hide resolved
|
Hi folks @paleolimbot @jiayuasu, do you have any other concerns about this PR? cc @cloud-fan |
|
Thank you, @uros-db , @cloud-fan and all. |
…patial types ### What changes were proposed in this pull request? - Fix the `ST_INVALID_SRID_VALUE` error message. - Fix the `SpatialType` doc comment. ### Why are the changes needed? Followup PR to clean up original issues introduced in: #52491. ### Does this PR introduce _any_ user-facing change? Yes, the `ST_INVALID_SRID_VALUE` error message is updated. ### How was this patch tested? Existing tests are sufficient. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #52632 from uros-db/geo-logical-types-FOLLOWUP. Authored-by: Uros Bojanic <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…o PySpark API ### What changes were proposed in this pull request? Introduce two new geospatial data types to PySpark API: - `GeographyType` - `GeometryType` This PR also adds appropriate JSON serialization logic for the new types in PySpark. Note that the GEOMETRY and GEOGRAPHY logical types were recently included to Spark SQL as part of: #52491. ### Why are the changes needed? Expanding on GEOMETRY and GEOGRAPHY type support across all of the supported APIs. ### Does this PR introduce _any_ user-facing change? Yes, two new data types are now available to users of the PySpark API. ### How was this patch tested? Added new tests to: - `test_geographytype.py` - `test_geometrytype.py` Also, added appropriate test cases to: - `test_types.py` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #52627 from uros-db/geo-python-types. Authored-by: Uros Bojanic <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>
### What changes were proposed in this pull request? Introduce two new physical types to Spark: - `PhysicalGeographyType` - `PhysicalGeometryType` This PR also adds appropriate mapping from the logical geospatial types (introduced in: #52491) to the new physical types. ### Why are the changes needed? Extending the implementation of GEOMETRY and GEOGRAPHY types in Spark, laying the groundwork for full geospatial data type support. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added new tests to: - `GeographyValSuite` - `GeometryValSuite` Also, added appropriate test cases to: - `GeographyTypeSuite` - `GeographyTypeSuite` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #52629 from uros-db/geo-physical-types. Authored-by: Uros Bojanic <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…ava API ### What changes were proposed in this pull request? Introduce two new geospatial data types to Java API: - `GeographyType` - `GeometryType` Note that the GEOMETRY and GEOGRAPHY logical types were recently included to Spark SQL as part of: #52491. ### Why are the changes needed? Expanding on GEOMETRY and GEOGRAPHY type support across all of the supported APIs. ### Does this PR introduce _any_ user-facing change? Yes, two new data types are now available to users of the Java API. ### How was this patch tested? Added new tests to: - `JavaGeographyTypeSuite` - `JavaGeometryTypeSuite` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #52623 from uros-db/geo-java-types. Authored-by: Uros Bojanic <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…ReferenceSystemMapper ### What changes were proposed in this pull request? Extend the `SpatialReferenceSystemMapper` class to offer SRID <-> CRS mappings for both `GeographyType` and `GeometryType`. The `SpatialReferenceSystemMapper` class was introduced originally as part of: #52491. ### Why are the changes needed? Avoid manual checks for GEOGRAPHY type, and use the centralized SRS mapping logic. Also, this PR will make it easier to support additional SRID/CRS values for spatial types soon. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added new tests to: - `SpatialReferenceSystemMapperSuite` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #52667 from uros-db/geo-srsMapper. Authored-by: Uros Bojanic <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…n PySpark ### What changes were proposed in this pull request? Implement the Spatial Reference System Mapper classes to offer SRID <-> CRS mappings in PySpark, for both `GeographyType` and `GeometryType`. On JVM / Scala side, the `SpatialReferenceSystemMapper` class was introduced originally as part of: #52491, and then subsequently extended in: #52667. ### Why are the changes needed? Use centralized SRS mapping logic in PySpark, same as we currently do in Scala types. Also, this PR will make it easier to support additional SRID/CRS values for spatial types soon. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Updated tests in: - `test_geographytype` - `test_geometrytype` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #52799 from uros-db/geo-srsMapper-python. Authored-by: Uros Bojanic <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>
…er data types ### What changes were proposed in this pull request? This PR disallows casting `GEOGRAPHY` and `GEOMETRY` to/from other data types in the system (such as `BOOLEAN`, `BINARY`, `STRING`, and others). Note that these types were recently introduced as part of: #52491. ### Why are the changes needed? `GeographyType` and `GeometryType` are not interoperable with any other non-geospatial data types, so we need to forbid such casts explicitly in order to avoid unexpected failures. ### Does this PR introduce _any_ user-facing change? Yes, casts are disallowed for newly introduced geospatial types. ### How was this patch tested? Added new unit tests to verify the expected behaviour: - `CastWithAnsiOffSuite` - `CastWithAnsiOnSuite` Added new appropriate end-to-end SQL tests: - `st-functions` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #52806 from uros-db/geo-cast-disallow. Authored-by: Uros Bojanic <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
### What changes were proposed in this pull request? Introduce two new logical types to Spark: - `GeographyType` - `GeometryType` This PR also adds appropriate SQL parsing and JSON serialization logic for the new types. ### Why are the changes needed? Kicking off https://issues.apache.org/jira/browse/SPARK-51658 by adding GEOMETRY and GEOGRAPHY types to Spark. ### Does this PR introduce _any_ user-facing change? Yes, two new logical data types are added as `Experimental`. ### How was this patch tested? Added new tests to: - `GeographyTypeSuite` - `GeometryTypeSuite` Also, added appropriate test cases to: - `DataTypeSuite` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#52491 from uros-db/geo-logical-types. Authored-by: Uros Bojanic <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…patial types ### What changes were proposed in this pull request? - Fix the `ST_INVALID_SRID_VALUE` error message. - Fix the `SpatialType` doc comment. ### Why are the changes needed? Followup PR to clean up original issues introduced in: apache#52491. ### Does this PR introduce _any_ user-facing change? Yes, the `ST_INVALID_SRID_VALUE` error message is updated. ### How was this patch tested? Existing tests are sufficient. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#52632 from uros-db/geo-logical-types-FOLLOWUP. Authored-by: Uros Bojanic <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…o PySpark API ### What changes were proposed in this pull request? Introduce two new geospatial data types to PySpark API: - `GeographyType` - `GeometryType` This PR also adds appropriate JSON serialization logic for the new types in PySpark. Note that the GEOMETRY and GEOGRAPHY logical types were recently included to Spark SQL as part of: apache#52491. ### Why are the changes needed? Expanding on GEOMETRY and GEOGRAPHY type support across all of the supported APIs. ### Does this PR introduce _any_ user-facing change? Yes, two new data types are now available to users of the PySpark API. ### How was this patch tested? Added new tests to: - `test_geographytype.py` - `test_geometrytype.py` Also, added appropriate test cases to: - `test_types.py` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#52627 from uros-db/geo-python-types. Authored-by: Uros Bojanic <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>
### What changes were proposed in this pull request? Introduce two new physical types to Spark: - `PhysicalGeographyType` - `PhysicalGeometryType` This PR also adds appropriate mapping from the logical geospatial types (introduced in: apache#52491) to the new physical types. ### Why are the changes needed? Extending the implementation of GEOMETRY and GEOGRAPHY types in Spark, laying the groundwork for full geospatial data type support. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added new tests to: - `GeographyValSuite` - `GeometryValSuite` Also, added appropriate test cases to: - `GeographyTypeSuite` - `GeographyTypeSuite` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#52629 from uros-db/geo-physical-types. Authored-by: Uros Bojanic <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…ava API ### What changes were proposed in this pull request? Introduce two new geospatial data types to Java API: - `GeographyType` - `GeometryType` Note that the GEOMETRY and GEOGRAPHY logical types were recently included to Spark SQL as part of: apache#52491. ### Why are the changes needed? Expanding on GEOMETRY and GEOGRAPHY type support across all of the supported APIs. ### Does this PR introduce _any_ user-facing change? Yes, two new data types are now available to users of the Java API. ### How was this patch tested? Added new tests to: - `JavaGeographyTypeSuite` - `JavaGeometryTypeSuite` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#52623 from uros-db/geo-java-types. Authored-by: Uros Bojanic <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…ReferenceSystemMapper ### What changes were proposed in this pull request? Extend the `SpatialReferenceSystemMapper` class to offer SRID <-> CRS mappings for both `GeographyType` and `GeometryType`. The `SpatialReferenceSystemMapper` class was introduced originally as part of: apache#52491. ### Why are the changes needed? Avoid manual checks for GEOGRAPHY type, and use the centralized SRS mapping logic. Also, this PR will make it easier to support additional SRID/CRS values for spatial types soon. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added new tests to: - `SpatialReferenceSystemMapperSuite` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#52667 from uros-db/geo-srsMapper. Authored-by: Uros Bojanic <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…n PySpark ### What changes were proposed in this pull request? Implement the Spatial Reference System Mapper classes to offer SRID <-> CRS mappings in PySpark, for both `GeographyType` and `GeometryType`. On JVM / Scala side, the `SpatialReferenceSystemMapper` class was introduced originally as part of: apache#52491, and then subsequently extended in: apache#52667. ### Why are the changes needed? Use centralized SRS mapping logic in PySpark, same as we currently do in Scala types. Also, this PR will make it easier to support additional SRID/CRS values for spatial types soon. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Updated tests in: - `test_geographytype` - `test_geometrytype` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#52799 from uros-db/geo-srsMapper-python. Authored-by: Uros Bojanic <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>
…er data types ### What changes were proposed in this pull request? This PR disallows casting `GEOGRAPHY` and `GEOMETRY` to/from other data types in the system (such as `BOOLEAN`, `BINARY`, `STRING`, and others). Note that these types were recently introduced as part of: apache#52491. ### Why are the changes needed? `GeographyType` and `GeometryType` are not interoperable with any other non-geospatial data types, so we need to forbid such casts explicitly in order to avoid unexpected failures. ### Does this PR introduce _any_ user-facing change? Yes, casts are disallowed for newly introduced geospatial types. ### How was this patch tested? Added new unit tests to verify the expected behaviour: - `CastWithAnsiOffSuite` - `CastWithAnsiOnSuite` Added new appropriate end-to-end SQL tests: - `st-functions` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#52806 from uros-db/geo-cast-disallow. Authored-by: Uros Bojanic <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
Introduce two new logical types to Spark:
GeographyTypeGeometryTypeThis PR also adds appropriate SQL parsing and JSON serialization logic for the new types.
Why are the changes needed?
Kicking off https://issues.apache.org/jira/browse/SPARK-51658 by adding GEOMETRY and GEOGRAPHY types to Spark.
Does this PR introduce any user-facing change?
Yes, two new logical data types are added as
Experimental.How was this patch tested?
Added new tests to:
GeographyTypeSuiteGeometryTypeSuiteAlso, added appropriate test cases to:
DataTypeSuiteWas this patch authored or co-authored using generative AI tooling?
No.