Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

c:postgres: support Postgres Arrays in parameter binding #2191

Open
judahrand opened this issue Sep 26, 2024 · 6 comments
Open

c:postgres: support Postgres Arrays in parameter binding #2191

judahrand opened this issue Sep 26, 2024 · 6 comments
Labels
Type: enhancement New feature or request

Comments

@judahrand
Copy link
Contributor

judahrand commented Sep 26, 2024

What feature or improvement would you like to see?

The simplest case of this is something like this called from Python:

cursor.execute('SELECT $1', ([1, 2, 3],))
cursor.fetch_arrow_table()

Currently, this results in

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<REDACTED>/.venv/lib/python3.12/site-packages/adbc_driver_manager/dbapi.py", line 698, in execute
    handle, self._rowcount = _blocking_call(
                             ^^^^^^^^^^^^^^^
  File "adbc_driver_manager/_lib.pyx", line 1569, in adbc_driver_manager._lib._blocking_call_impl
  File "adbc_driver_manager/_lib.pyx", line 1562, in adbc_driver_manager._lib._blocking_call_impl
  File "adbc_driver_manager/_lib.pyx", line 1213, in adbc_driver_manager._lib.AdbcStatement.execute_query
  File "adbc_driver_manager/_lib.pyx", line 260, in adbc_driver_manager._lib.check_error
adbc_driver_manager.NotSupportedError: NOT_IMPLEMENTED: [libpq] Field #1 ('0') has unsupported parameter type list

Along the same lines it would be nice if PyArrow arrays could be bound similarly like:

cursor.execute('SELECT $1', (pyarrow.array([1, 2, 3]),))
cursor.fetch_arrow_table()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/<REDACTED>/.venv/lib/python3.12/site-packages/adbc_driver_manager/dbapi.py", line 696, in execute
    self._prepare_execute(operation, parameters)
  File "<REDACTED>/.venv/lib/python3.12/site-packages/adbc_driver_manager/dbapi.py", line 675, in _prepare_execute
    rb = pyarrow.record_batch(
         ^^^^^^^^^^^^^^^^^^^^^
  File "pyarrow/table.pxi", line 5828, in pyarrow.lib.record_batch
  File "pyarrow/table.pxi", line 3429, in pyarrow.lib.RecordBatch.from_arrays
  File "pyarrow/table.pxi", line 1547, in pyarrow.lib._sanitize_arrays
  File "pyarrow/table.pxi", line 1528, in pyarrow.lib._schema_from_arrays
  File "pyarrow/array.pxi", line 368, in pyarrow.lib.array
  File "pyarrow/array.pxi", line 42, in pyarrow.lib._sequence_to_array
  File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Could not convert <pyarrow.lib.Int64Array object at 0x103d6c880>
[
  1
] with type pyarrow.lib.Int64Array: did not recognize Python value type when inferring an Arrow data type

My use case for this is to be able to have a parameterized query which looks something like:

WITH data (col1, col2) AS (
    SELECT UNNEST($1), UNNEST($2)
)
SELECT
    col1,
    col2,
    other_table.some_data
FROM data
JOIN  other_table
    ON data.col1 = other_table.col1 AND data.col2 = other_table.col2

Other simpler but more common query patterns which might use this would be

SELECT *
FROM table
WHERE table.column = ANY($1)
@judahrand judahrand added the Type: enhancement New feature or request label Sep 26, 2024
@lidavidm
Copy link
Member

Along the same lines it would be nice if PyArrow arrays could be bound similarly like:

Oof, we should definitely make that work at least.

@paleolimbot
Copy link
Member

I believe that lists can be bound after #2157 (although I don't know if this extends to the dbapi wrapper!).

@judahrand
Copy link
Contributor Author

Aha! Perhaps way ahead of me! Just need to wait for a new release in that case.

@judahrand
Copy link
Contributor Author

Although handling binding Arrow arrays to Postgres Arrays might also be nice or would the 'proper' way for a client to do this be to convert the array to a list scalar?

@paleolimbot
Copy link
Member

Should have done this a moment ago, but I did just check on main:

>>> from adbc_driver_postgresql import dbapi
>>> postgres = dbapi.connect("postgresql://localhost:5432/postgres?user=postgres&password=password")
>>> with postgres.cursor() as cur:
...     cur.execute('SELECT $1', ([1, 2, 3],))
...     cur.fetch_arrow_table()
... 
pyarrow.Table
?column?: list<item: int64>
  child 0, item: int64
----
?column?: [[[1,2,3]]]

@judahrand
Copy link
Contributor Author

judahrand commented Sep 26, 2024

Should have done this a moment ago, but I did just check on main:

>>> from adbc_driver_postgresql import dbapi
>>> postgres = dbapi.connect("postgresql://localhost:5432/postgres?user=postgres&password=password")
>>> with postgres.cursor() as cur:
...     cur.execute('SELECT $1', ([1, 2, 3],))
...     cur.fetch_arrow_table()
... 
pyarrow.Table
?column?: list<item: int64>
  child 0, item: int64
----
?column?: [[[1,2,3]]]

This should almost sort out my usecase and is precisely the expected behavior! 🥳 Perhaps worth a separate issue for

>>> from adbc_driver_postgresql import dbapi
>>> postgres = dbapi.connect("postgresql://localhost:5432/postgres?user=postgres&password=password")
>>> with postgres.cursor() as cur:
...     cur.execute('SELECT $1', (pyarrow.array([1, 2, 3]),))
...     cur.fetch_arrow_table()
... 
pyarrow.Table
?column?: list<item: int64>
  child 0, item: int64
----
?column?: [[[1,2,3]]]

working because I can see an argument for that being incorrect behavior since a RecordBatch or Table is arguably just a sequence of Arrays so maybe odd to have two different behaviors for passing table and tuple(table.columns)? 🤷‍♂️ But in my case it would save me doing a potentially expensive tuple(table.to_pydict().values()).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants