Skip to content

Conversation

@uros-db
Copy link
Contributor

@uros-db uros-db commented Sep 30, 2025

What changes were proposed in this pull request?

Introduce two new logical types to Spark:

  • GeographyType
  • GeometryType

This PR also adds appropriate SQL parsing and JSON serialization logic for the new types.

Why are the changes needed?

Kicking off https://issues.apache.org/jira/browse/SPARK-51658 by adding GEOMETRY and GEOGRAPHY types to Spark.

Does this PR introduce any user-facing change?

Yes, two new logical data types are added as Experimental.

How was this patch tested?

Added new tests to:

  • GeographyTypeSuite
  • GeometryTypeSuite

Also, added appropriate test cases to:

  • DataTypeSuite

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Sep 30, 2025
Copy link
Contributor Author

@uros-db uros-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkaravel Please review.

Copy link
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @uros-db for kicking it off! left some comments/questions as i read it

@uros-db uros-db requested a review from szehon-ho October 6, 2025 11:08
@github-actions github-actions bot added the DOCS label Oct 6, 2025
@uros-db
Copy link
Contributor Author

uros-db commented Oct 7, 2025

Hi folks, please re-review. @szehon-ho @mkaravel

Copy link
Contributor

@mkaravel mkaravel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM. Please take a look at the comments I posted.
Thank you for this PR!

@uros-db uros-db requested a review from mkaravel October 9, 2025 17:51
Copy link
Contributor

@mkaravel mkaravel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@uros-db Thank you for addressing the comments!

LGTM.

@uros-db
Copy link
Contributor Author

uros-db commented Oct 9, 2025

@cloud-fan @szehon-ho Please review. Also cc @srielau (we have some new error classes here).

Copy link
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, be nice to document the expectation of the GeographyType and GeometryType argument (what is expected, not expected and only internal), as its a user facing API: ref: #52491 (comment)

@jiayuasu
Copy link
Member

jiayuasu commented Oct 10, 2025

Great work. The Apache Sedona team was not aware of or informed by this PR. We are now looking into this.

@jiayuasu
Copy link
Member

We will review and comment on this PR. This is to ensure it does not create incompatibility with Apache Sedona.

@uros-db
Copy link
Contributor Author

uros-db commented Oct 10, 2025

Thank you @jiayuasu. We will hold the merge until you have enough time to review. Please feel free to let me know if you have any questions or concerns.

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool! Just a few notes from the GeoArrow/Parquet compatibility angle where most CRSes are written by spatial tools with a PROJJSON or authority:code CRS.

@uros-db uros-db requested a review from paleolimbot October 10, 2025 17:21
@uros-db
Copy link
Contributor Author

uros-db commented Oct 13, 2025

Hi folks @paleolimbot @jiayuasu, do you have any other concerns about this PR? cc @cloud-fan

@cloud-fan cloud-fan closed this in 8ff9feb Oct 15, 2025
@dongjoon-hyun
Copy link
Member

Thank you, @uros-db , @cloud-fan and all.

cloud-fan pushed a commit that referenced this pull request Oct 18, 2025
…patial types

### What changes were proposed in this pull request?

- Fix the `ST_INVALID_SRID_VALUE` error message.
- Fix the `SpatialType` doc comment.

### Why are the changes needed?
Followup PR to clean up original issues introduced in: #52491.

### Does this PR introduce _any_ user-facing change?
Yes, the `ST_INVALID_SRID_VALUE` error message is updated.

### How was this patch tested?
Existing tests are sufficient.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #52632 from uros-db/geo-logical-types-FOLLOWUP.

Authored-by: Uros Bojanic <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
zhengruifeng pushed a commit that referenced this pull request Oct 22, 2025
…o PySpark API

### What changes were proposed in this pull request?
Introduce two new geospatial data types to PySpark API:
- `GeographyType`
- `GeometryType`

This PR also adds appropriate JSON serialization logic for the new types in PySpark.

Note that the GEOMETRY and GEOGRAPHY logical types were recently included to Spark SQL as part of: #52491.

### Why are the changes needed?
Expanding on GEOMETRY and GEOGRAPHY type support across all of the supported APIs.

### Does this PR introduce _any_ user-facing change?
Yes, two new data types are now available to users of the PySpark API.

### How was this patch tested?
Added new tests to:
- `test_geographytype.py`
- `test_geometrytype.py`

Also, added appropriate test cases to:
- `test_types.py`

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #52627 from uros-db/geo-python-types.

Authored-by: Uros Bojanic <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
cloud-fan pushed a commit that referenced this pull request Oct 23, 2025
### What changes were proposed in this pull request?
Introduce two new physical types to Spark:
- `PhysicalGeographyType`
- `PhysicalGeometryType`

This PR also adds appropriate mapping from the logical geospatial types (introduced in: #52491) to the new physical types.

### Why are the changes needed?
Extending the implementation of GEOMETRY and GEOGRAPHY types in Spark, laying the groundwork for full geospatial data type support.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Added new tests to:
- `GeographyValSuite`
- `GeometryValSuite`

Also, added appropriate test cases to:
- `GeographyTypeSuite`
- `GeographyTypeSuite`

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #52629 from uros-db/geo-physical-types.

Authored-by: Uros Bojanic <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
cloud-fan pushed a commit that referenced this pull request Oct 23, 2025
…ava API

### What changes were proposed in this pull request?
Introduce two new geospatial data types to Java API:
- `GeographyType`
- `GeometryType`

Note that the GEOMETRY and GEOGRAPHY logical types were recently included to Spark SQL as part of: #52491.

### Why are the changes needed?
Expanding on GEOMETRY and GEOGRAPHY type support across all of the supported APIs.

### Does this PR introduce _any_ user-facing change?
Yes, two new data types are now available to users of the Java API.

### How was this patch tested?
Added new tests to:
- `JavaGeographyTypeSuite`
- `JavaGeometryTypeSuite`

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #52623 from uros-db/geo-java-types.

Authored-by: Uros Bojanic <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
cloud-fan pushed a commit that referenced this pull request Oct 29, 2025
…ReferenceSystemMapper

### What changes were proposed in this pull request?
Extend the `SpatialReferenceSystemMapper` class to offer SRID <-> CRS mappings for both `GeographyType` and `GeometryType`.

The `SpatialReferenceSystemMapper` class was introduced originally as part of: #52491.

### Why are the changes needed?
Avoid manual checks for GEOGRAPHY type, and use the centralized SRS mapping logic.

Also, this PR will make it easier to support additional SRID/CRS values for spatial types soon.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Added new tests to:
- `SpatialReferenceSystemMapperSuite`

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #52667 from uros-db/geo-srsMapper.

Authored-by: Uros Bojanic <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
zhengruifeng pushed a commit that referenced this pull request Oct 31, 2025
…n PySpark

### What changes were proposed in this pull request?
Implement the Spatial Reference System Mapper classes to offer SRID <-> CRS mappings in PySpark, for both `GeographyType` and `GeometryType`.

On JVM / Scala side, the `SpatialReferenceSystemMapper` class was introduced originally as part of: #52491, and then subsequently extended in: #52667.

### Why are the changes needed?
Use centralized SRS mapping logic in PySpark, same as we currently do in Scala types.

Also, this PR will make it easier to support additional SRID/CRS values for spatial types soon.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Updated tests in:
- `test_geographytype`
- `test_geometrytype`

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #52799 from uros-db/geo-srsMapper-python.

Authored-by: Uros Bojanic <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
cloud-fan pushed a commit that referenced this pull request Oct 31, 2025
…er data types

### What changes were proposed in this pull request?
This PR disallows casting `GEOGRAPHY` and `GEOMETRY` to/from other data types in the system (such as `BOOLEAN`, `BINARY`, `STRING`, and others). Note that these types were recently introduced as part of: #52491.

### Why are the changes needed?
`GeographyType` and `GeometryType` are not interoperable with any other non-geospatial data types, so we need to forbid such casts explicitly in order to avoid unexpected failures.

### Does this PR introduce _any_ user-facing change?
Yes, casts are disallowed for newly introduced geospatial types.

### How was this patch tested?
Added new unit tests to verify the expected behaviour:
- `CastWithAnsiOffSuite`
- `CastWithAnsiOnSuite`

Added new appropriate end-to-end SQL tests:
- `st-functions`

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #52806 from uros-db/geo-cast-disallow.

Authored-by: Uros Bojanic <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
huangxiaopingRD pushed a commit to huangxiaopingRD/spark that referenced this pull request Nov 25, 2025
### What changes were proposed in this pull request?
Introduce two new logical types to Spark:
- `GeographyType`
- `GeometryType`

This PR also adds appropriate SQL parsing and JSON serialization logic for the new types.

### Why are the changes needed?
Kicking off https://issues.apache.org/jira/browse/SPARK-51658 by adding GEOMETRY and GEOGRAPHY types to Spark.

### Does this PR introduce _any_ user-facing change?
Yes, two new logical data types are added as `Experimental`.

### How was this patch tested?
Added new tests to:
- `GeographyTypeSuite`
- `GeometryTypeSuite`

Also, added appropriate test cases to:
- `DataTypeSuite`

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#52491 from uros-db/geo-logical-types.

Authored-by: Uros Bojanic <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
huangxiaopingRD pushed a commit to huangxiaopingRD/spark that referenced this pull request Nov 25, 2025
…patial types

### What changes were proposed in this pull request?

- Fix the `ST_INVALID_SRID_VALUE` error message.
- Fix the `SpatialType` doc comment.

### Why are the changes needed?
Followup PR to clean up original issues introduced in: apache#52491.

### Does this PR introduce _any_ user-facing change?
Yes, the `ST_INVALID_SRID_VALUE` error message is updated.

### How was this patch tested?
Existing tests are sufficient.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#52632 from uros-db/geo-logical-types-FOLLOWUP.

Authored-by: Uros Bojanic <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
huangxiaopingRD pushed a commit to huangxiaopingRD/spark that referenced this pull request Nov 25, 2025
…o PySpark API

### What changes were proposed in this pull request?
Introduce two new geospatial data types to PySpark API:
- `GeographyType`
- `GeometryType`

This PR also adds appropriate JSON serialization logic for the new types in PySpark.

Note that the GEOMETRY and GEOGRAPHY logical types were recently included to Spark SQL as part of: apache#52491.

### Why are the changes needed?
Expanding on GEOMETRY and GEOGRAPHY type support across all of the supported APIs.

### Does this PR introduce _any_ user-facing change?
Yes, two new data types are now available to users of the PySpark API.

### How was this patch tested?
Added new tests to:
- `test_geographytype.py`
- `test_geometrytype.py`

Also, added appropriate test cases to:
- `test_types.py`

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#52627 from uros-db/geo-python-types.

Authored-by: Uros Bojanic <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
huangxiaopingRD pushed a commit to huangxiaopingRD/spark that referenced this pull request Nov 25, 2025
### What changes were proposed in this pull request?
Introduce two new physical types to Spark:
- `PhysicalGeographyType`
- `PhysicalGeometryType`

This PR also adds appropriate mapping from the logical geospatial types (introduced in: apache#52491) to the new physical types.

### Why are the changes needed?
Extending the implementation of GEOMETRY and GEOGRAPHY types in Spark, laying the groundwork for full geospatial data type support.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Added new tests to:
- `GeographyValSuite`
- `GeometryValSuite`

Also, added appropriate test cases to:
- `GeographyTypeSuite`
- `GeographyTypeSuite`

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#52629 from uros-db/geo-physical-types.

Authored-by: Uros Bojanic <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
huangxiaopingRD pushed a commit to huangxiaopingRD/spark that referenced this pull request Nov 25, 2025
…ava API

### What changes were proposed in this pull request?
Introduce two new geospatial data types to Java API:
- `GeographyType`
- `GeometryType`

Note that the GEOMETRY and GEOGRAPHY logical types were recently included to Spark SQL as part of: apache#52491.

### Why are the changes needed?
Expanding on GEOMETRY and GEOGRAPHY type support across all of the supported APIs.

### Does this PR introduce _any_ user-facing change?
Yes, two new data types are now available to users of the Java API.

### How was this patch tested?
Added new tests to:
- `JavaGeographyTypeSuite`
- `JavaGeometryTypeSuite`

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#52623 from uros-db/geo-java-types.

Authored-by: Uros Bojanic <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
huangxiaopingRD pushed a commit to huangxiaopingRD/spark that referenced this pull request Nov 25, 2025
…ReferenceSystemMapper

### What changes were proposed in this pull request?
Extend the `SpatialReferenceSystemMapper` class to offer SRID <-> CRS mappings for both `GeographyType` and `GeometryType`.

The `SpatialReferenceSystemMapper` class was introduced originally as part of: apache#52491.

### Why are the changes needed?
Avoid manual checks for GEOGRAPHY type, and use the centralized SRS mapping logic.

Also, this PR will make it easier to support additional SRID/CRS values for spatial types soon.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Added new tests to:
- `SpatialReferenceSystemMapperSuite`

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#52667 from uros-db/geo-srsMapper.

Authored-by: Uros Bojanic <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
huangxiaopingRD pushed a commit to huangxiaopingRD/spark that referenced this pull request Nov 25, 2025
…n PySpark

### What changes were proposed in this pull request?
Implement the Spatial Reference System Mapper classes to offer SRID <-> CRS mappings in PySpark, for both `GeographyType` and `GeometryType`.

On JVM / Scala side, the `SpatialReferenceSystemMapper` class was introduced originally as part of: apache#52491, and then subsequently extended in: apache#52667.

### Why are the changes needed?
Use centralized SRS mapping logic in PySpark, same as we currently do in Scala types.

Also, this PR will make it easier to support additional SRID/CRS values for spatial types soon.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Updated tests in:
- `test_geographytype`
- `test_geometrytype`

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#52799 from uros-db/geo-srsMapper-python.

Authored-by: Uros Bojanic <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
huangxiaopingRD pushed a commit to huangxiaopingRD/spark that referenced this pull request Nov 25, 2025
…er data types

### What changes were proposed in this pull request?
This PR disallows casting `GEOGRAPHY` and `GEOMETRY` to/from other data types in the system (such as `BOOLEAN`, `BINARY`, `STRING`, and others). Note that these types were recently introduced as part of: apache#52491.

### Why are the changes needed?
`GeographyType` and `GeometryType` are not interoperable with any other non-geospatial data types, so we need to forbid such casts explicitly in order to avoid unexpected failures.

### Does this PR introduce _any_ user-facing change?
Yes, casts are disallowed for newly introduced geospatial types.

### How was this patch tested?
Added new unit tests to verify the expected behaviour:
- `CastWithAnsiOffSuite`
- `CastWithAnsiOnSuite`

Added new appropriate end-to-end SQL tests:
- `st-functions`

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#52806 from uros-db/geo-cast-disallow.

Authored-by: Uros Bojanic <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants