Skip to content

Conversation

paleolimbot
Copy link
Member

This PR ensures that if output contains batches with zero rows, they are not included in the output RecordBatchReader. Most tools can handle empty batches but not all have been tested and at least lonboard fails for this case.

Reprex from original issue:

from arro3.core import Table
import sedona.db

sd = sedona.db.connect()
sd.read_parquet("https://github.com/geoarrow/geoarrow-data/releases/download/v0.2.0/ns-water_water-poly_geo.parquet").to_view("lakes")
sd.read_parquet("https://github.com/geoarrow/geoarrow-data/releases/download/v0.2.0/ns-water_water-line_geo.parquet").to_view("rivers")
sd.sql("""SELECT geometry AS lake FROM lakes WHERE "OBJECTID" = 1976""").to_view("east_lake")

inlets_and_outlets = sd.sql("""
    SELECT "OBJECTID", "FEAT_CODE", geometry
    FROM rivers
    JOIN east_lake ON ST_Intersects(east_lake.lake, rivers.geometry)
    """)

table_orig = Table.from_arrow(inlets_and_outlets)
col = table_orig.column("geometry")
", ".join(str(len(chunk)) for chunk in col.chunks)
#> '31'

I checked R as well:

library(sedonadb)

sd_read_parquet("/Users/dewey/gh/sedona-db/submodules/geoarrow-data/ns-water/files/ns-water_water-point_geo.parquet") |> 
  sd_to_view("point", overwrite = TRUE)

sd_read_parquet("/Users/dewey/gh/sedona-db/submodules/geoarrow-data/ns-water/files/ns-water_water-junc_geo.parquet") |> 
  sd_to_view("junc", overwrite = TRUE)

sd_sql('SELECT geometry FROM junc WHERE "OBJECTID" = 1814') |> 
  sd_to_view("junc_filter", overwrite = TRUE)

joined <- sd_sql('
  SELECT "OBJECTID", "FEAT_CODE", point.geometry
  FROM point
  JOIN junc_filter ON ST_DWithin(junc_filter.geometry, point.geometry, 10000)
')

joined |> 
  nanoarrow::as_nanoarrow_array_stream() |> 
  nanoarrow::collect_array_stream() |> 
  lapply("[[", "length")
#> [[1]]
#> [1] 24

Closes #156.

@paleolimbot paleolimbot marked this pull request as ready for review October 10, 2025 19:22
@jiayuasu jiayuasu merged commit df21442 into apache:main Oct 13, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Result reader generates empty chunks

3 participants