Skip to content

Conversation

@uros-db
Copy link
Contributor

@uros-db uros-db commented Oct 15, 2025

What changes were proposed in this pull request?

Introduce two new geospatial data types to PySpark API:

  • GeographyType
  • GeometryType

This PR also adds appropriate JSON serialization logic for the new types in PySpark.

Note that the GEOMETRY and GEOGRAPHY logical types were recently included to Spark SQL as part of: #52491.

Why are the changes needed?

Expanding on GEOMETRY and GEOGRAPHY type support across all of the supported APIs.

Does this PR introduce any user-facing change?

Yes, two new data types are now available to users of the PySpark API.

How was this patch tested?

Added new tests to:

  • test_geographytype.py
  • test_geometrytype.py

Also, added appropriate test cases to:

  • test_types.py

Was this patch authored or co-authored using generative AI tooling?

No.

@uros-db uros-db changed the title Initial commit [SPARK-53921][Geo][PYTHON] Introduce GeometryType and GeographyType to PySpark API Oct 15, 2025
@uros-db uros-db changed the title [SPARK-53921][Geo][PYTHON] Introduce GeometryType and GeographyType to PySpark API [WIP][SPARK-53921][Geo][PYTHON] Introduce GeometryType and GeographyType to PySpark API Oct 16, 2025
@uros-db uros-db changed the title [WIP][SPARK-53921][Geo][PYTHON] Introduce GeometryType and GeographyType to PySpark API [SPARK-53921][Geo][PYTHON] Introduce GeometryType and GeographyType to PySpark API Oct 20, 2025
Copy link
Contributor Author

@uros-db uros-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhengruifeng Thank you for your comments, I addressed them all. Could you please re-review now?

@uros-db uros-db requested a review from zhengruifeng October 20, 2025 12:21
@zhengruifeng
Copy link
Contributor

please also fix the linter

starting mypy annotations test...
annotations failed mypy checks:
python/pyspark/sql/types.py:2231: error: Missing return statement  [return]
Found 1 error in 1 file (checked 1162 source files)
1

@uros-db uros-db requested a review from zhengruifeng October 21, 2025 16:33
@zhengruifeng
Copy link
Contributor

thanks, merged to master

zhengruifeng pushed a commit that referenced this pull request Oct 29, 2025
…Geography JSON parsing

### What changes were proposed in this pull request?
This PR follows up on #52627, and fixes an issue with `GeographyType` JSON parsing in PySpark. Also, this PR adds appropriate tests for JSON parsing, both for `GeographyType` and `GeometryType`.

### Why are the changes needed?
Fixing a wrong error class thrown in the JSON parsing method for `GeographyType`.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Added new test cases to:
- `pyspark.sql.tests.test_geographytype`
- `pyspark.sql.tests.test_geometrytype`

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #52772 from uros-db/geo-python-types-tests.

Authored-by: Uros Bojanic <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
zhengruifeng pushed a commit that referenced this pull request Oct 30, 2025
…pe to `__all__` in types

### What changes were proposed in this pull request?
This PR follows up on #52627, and addresses a gap - `GeographyType` and `GeometryType` should be included in `__all__`.

### Why are the changes needed?
Include geospatial types in __all__ for `types.py`.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing tests suffice.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #52790 from uros-db/geo-python-types-all.

Authored-by: Uros Bojanic <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
cloud-fan pushed a commit that referenced this pull request Nov 5, 2025
…s to PySpark Connect

### What changes were proposed in this pull request?
Introduce `GeographyType` and `GeometryType` to PySpark Connect. Note that the geospatial data types have already been introduced in PySpark as part of: #52627.

Also, introduce classes to represent a `Geography` and `Geometry` value in Python. Note that the corresponding classes have already been introduced on Scala side as part of: #52804.

### Why are the changes needed?
Enabling geospatial types in Spark Connect.

### Does this PR introduce _any_ user-facing change?
Yes, `GeographyType` and `GeometryType` are now available in PySpark Connect.

### How was this patch tested?
Added new Python Connect tests:
- `test_parity_geographytype`
- `test_parity_geometrytype`

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #52871 from uros-db/geo-spark-connect.

Authored-by: Uros Bojanic <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
cloud-fan pushed a commit that referenced this pull request Nov 5, 2025
…s to PySpark Connect

### What changes were proposed in this pull request?
Introduce `GeographyType` and `GeometryType` to PySpark Connect. Note that the geospatial data types have already been introduced in PySpark as part of: #52627.

Also, introduce classes to represent a `Geography` and `Geometry` value in Python. Note that the corresponding classes have already been introduced on Scala side as part of: #52804.

### Why are the changes needed?
Enabling geospatial types in Spark Connect.

### Does this PR introduce _any_ user-facing change?
Yes, `GeographyType` and `GeometryType` are now available in PySpark Connect.

### How was this patch tested?
Added new Python Connect tests:
- `test_parity_geographytype`
- `test_parity_geometrytype`

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #52871 from uros-db/geo-spark-connect.

Authored-by: Uros Bojanic <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 22, 2025
…s to PySpark Connect

### What changes were proposed in this pull request?
Introduce `GeographyType` and `GeometryType` to PySpark Connect. Note that the geospatial data types have already been introduced in PySpark as part of: apache#52627.

Also, introduce classes to represent a `Geography` and `Geometry` value in Python. Note that the corresponding classes have already been introduced on Scala side as part of: apache#52804.

### Why are the changes needed?
Enabling geospatial types in Spark Connect.

### Does this PR introduce _any_ user-facing change?
Yes, `GeographyType` and `GeometryType` are now available in PySpark Connect.

### How was this patch tested?
Added new Python Connect tests:
- `test_parity_geographytype`
- `test_parity_geometrytype`

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#52871 from uros-db/geo-spark-connect.

Authored-by: Uros Bojanic <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
huangxiaopingRD pushed a commit to huangxiaopingRD/spark that referenced this pull request Nov 25, 2025
…o PySpark API

### What changes were proposed in this pull request?
Introduce two new geospatial data types to PySpark API:
- `GeographyType`
- `GeometryType`

This PR also adds appropriate JSON serialization logic for the new types in PySpark.

Note that the GEOMETRY and GEOGRAPHY logical types were recently included to Spark SQL as part of: apache#52491.

### Why are the changes needed?
Expanding on GEOMETRY and GEOGRAPHY type support across all of the supported APIs.

### Does this PR introduce _any_ user-facing change?
Yes, two new data types are now available to users of the PySpark API.

### How was this patch tested?
Added new tests to:
- `test_geographytype.py`
- `test_geometrytype.py`

Also, added appropriate test cases to:
- `test_types.py`

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#52627 from uros-db/geo-python-types.

Authored-by: Uros Bojanic <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
huangxiaopingRD pushed a commit to huangxiaopingRD/spark that referenced this pull request Nov 25, 2025
…Geography JSON parsing

### What changes were proposed in this pull request?
This PR follows up on apache#52627, and fixes an issue with `GeographyType` JSON parsing in PySpark. Also, this PR adds appropriate tests for JSON parsing, both for `GeographyType` and `GeometryType`.

### Why are the changes needed?
Fixing a wrong error class thrown in the JSON parsing method for `GeographyType`.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Added new test cases to:
- `pyspark.sql.tests.test_geographytype`
- `pyspark.sql.tests.test_geometrytype`

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#52772 from uros-db/geo-python-types-tests.

Authored-by: Uros Bojanic <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
huangxiaopingRD pushed a commit to huangxiaopingRD/spark that referenced this pull request Nov 25, 2025
…pe to `__all__` in types

### What changes were proposed in this pull request?
This PR follows up on apache#52627, and addresses a gap - `GeographyType` and `GeometryType` should be included in `__all__`.

### Why are the changes needed?
Include geospatial types in __all__ for `types.py`.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing tests suffice.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#52790 from uros-db/geo-python-types-all.

Authored-by: Uros Bojanic <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
huangxiaopingRD pushed a commit to huangxiaopingRD/spark that referenced this pull request Nov 25, 2025
…s to PySpark Connect

### What changes were proposed in this pull request?
Introduce `GeographyType` and `GeometryType` to PySpark Connect. Note that the geospatial data types have already been introduced in PySpark as part of: apache#52627.

Also, introduce classes to represent a `Geography` and `Geometry` value in Python. Note that the corresponding classes have already been introduced on Scala side as part of: apache#52804.

### Why are the changes needed?
Enabling geospatial types in Spark Connect.

### Does this PR introduce _any_ user-facing change?
Yes, `GeographyType` and `GeometryType` are now available in PySpark Connect.

### How was this patch tested?
Added new Python Connect tests:
- `test_parity_geographytype`
- `test_parity_geometrytype`

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#52871 from uros-db/geo-spark-connect.

Authored-by: Uros Bojanic <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants