Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid polars.lit causes panic, ungraceful termination or hang #14776

Open
2 tasks done
mesner opened this issue Feb 29, 2024 · 5 comments
Open
2 tasks done

Invalid polars.lit causes panic, ungraceful termination or hang #14776

mesner opened this issue Feb 29, 2024 · 5 comments
Labels
A-dtype-object Area: object data type A-panic Area: code that results in panic exceptions bug Something isn't working P-low Priority: low python Related to Python Polars

Comments

@mesner
Copy link

mesner commented Feb 29, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl
import numpy as np

def getdf():
    s = [np.array(['MY STRING CONSTANT'])]
    df = pl.DataFrame(dict(x=[1,2,3])).with_columns(
        s=pl.lit(s)
    )

if __name__ == '__main__':
    try:
        df = getdf()
    except Exception as ex:
        print(f"error: {ex}")
    finally:
        print("getdf1 complete")

The program terminates without printing 'error'
When run in a multiprocessing pool (childprocess), the program hangs.

Log output

(base) C:\Users\ig2n\Documents\projects\hctsa-nicu\modeling\202402>python pl_bug_lit_list.py
thread '<unnamed>' panicked at crates\polars-core\src\chunked_array\ops\full.rs:101:81:
called `Result::unwrap()` on an `Err` value: InvalidOperation(ErrString("`list_builder` operation not supported for dtype `object`"))
stack backtrace:
   0:     0x7ffca6ac3907 - ffi_select_with_compiled_path
   1:     0x7ffca3e4403b - BrotliDecoderVersion
   2:     0x7ffca6aa7231 - ffi_select_with_compiled_path
   3:     0x7ffca6ac55da - ffi_select_with_compiled_path
   4:     0x7ffca6ac524b - ffi_select_with_compiled_path
   5:     0x7ffca6ac6197 - ffi_select_with_compiled_path
   6:     0x7ffca6ac5cd9 - ffi_select_with_compiled_path
   7:     0x7ffca6ac5c19 - ffi_select_with_compiled_path
   8:     0x7ffca6ac5c06 - ffi_select_with_compiled_path
   9:     0x7ffca6bf1ae7 - ffi_select_with_compiled_path
  10:     0x7ffca6bf1f93 - ffi_select_with_compiled_path
  11:     0x7ffca4a56abc - ffi_select_with_compiled_path
  12:     0x7ffca4a5c3b6 - ffi_select_with_compiled_path
  13:     0x7ffca4a54e54 - ffi_select_with_compiled_path
  14:     0x7ffca4c1cfcb - ffi_select_with_compiled_path
  15:     0x7ffca52b7684 - ffi_select_with_compiled_path
  16:     0x7ffca52b63fc - ffi_select_with_compiled_path
  17:     0x7ffca52a32ff - ffi_select_with_compiled_path
  18:     0x7ffca3b9f4a0 - BroccoliDestroyInstance
  19:     0x7ffca34f55eb - BroccoliDestroyInstance
  20:     0x7ffcecfb9bfb - PyComplex_AsCComplex
  21:     0x7ffcecfae659 - PyBytes_Repeat
  22:     0x7ffcecfaebd1 - PyObject_Vectorcall
  23:     0x7ffced0a6c5a - PyEval_EvalFrameDefault
  24:     0x7ffced0aaa4e - PyEval_EvalFrameDefault
  25:     0x7ffced0a2180 - PyEval_EvalCode
  26:     0x7ffced121e1e - PyRun_FileExFlags
  27:     0x7ffced121ef8 - PyRun_FileExFlags
  28:     0x7ffced121a28 - PyRun_StringFlags
  29:     0x7ffced11f5f5 - PyRun_SimpleFileObject
  30:     0x7ffced11e864 - PyRun_AnyFileObject
  31:     0x7ffcecf30abc - Py_gitidentifier
  32:     0x7ffcecf31493 - Py_gitidentifier
  33:     0x7ffcecf31830 - Py_Main
  34:     0x7ff679141494 - OPENSSL_Applink
  35:     0x7ffd206e7344 - BaseThreadInitThunk
  36:     0x7ffd20ea26b1 - RtlUserThreadStart
getdf1 complete
Traceback (most recent call last):
  File "C:\Users\ig2n\Documents\projects\hctsa-nicu\modeling\202402\pl_bug_lit_list.py", line 12, in <module>
    df = getdf()
         ^^^^^^^
  File "C:\Users\ig2n\Documents\projects\hctsa-nicu\modeling\202402\pl_bug_lit_list.py", line 6, in getdf
    df = pl.DataFrame(dict(x=[1,2,3])).with_columns(
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ig2n\AppData\Local\miniconda3\Lib\site-packages\polars\dataframe\frame.py", line 8301, in with_columns
    return self.lazy().with_columns(*exprs, **named_exprs).collect(_eager=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ig2n\AppData\Local\miniconda3\Lib\site-packages\polars\lazyframe\frame.py", line 1939, in collect
    return wrap_df(ldf.collect())
                   ^^^^^^^^^^^^^
pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: InvalidOperation(ErrString("`list_builder` operation not supported for dtype `object`"))

Issue description

I'm not expecting polars to be able to accept or parse a nested list / ndarray as a literal. My input data is from a mat file and sometimes it's nested.

Expected behavior

I expect a panic to raise an error on invalid arguments or gracefully return to the calling function, which I believe would enable the multiprocessing pool to finish gracefully instead of hang.

Installed versions

--------Version info---------
Polars:               0.20.13
Index type:           UInt32
Platform:             Windows-10-10.0.19045-SP0
Python:               3.11.5 | packaged by Anaconda, Inc. | (main, Sep 11 2023, 13:26:23) [MSC v.1916 64 bit (AMD64)]

----Optional dependencies----
adbc_driver_manager:  0.8.0
cloudpickle:          3.0.0
connectorx:           0.3.2
deltalake:            0.15.3
fsspec:               2023.10.0
gevent:               23.9.1
hvplot:               0.9.2
matplotlib:           3.8.3
numpy:                1.24.4
openpyxl:             <not installed>
pandas:               2.2.1
pyarrow:              14.0.1
pydantic:             2.5.2
pyiceberg:            0.5.1
pyxlsb:               <not installed>
sqlalchemy:           2.0.17
xlsx2csv:             0.8.1
xlsxwriter:           3.1.9```

</details>
@mesner mesner added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Feb 29, 2024
@mesner
Copy link
Author

mesner commented Feb 29, 2024

Ok I think this is related to #5606
Adding a catch for polars panic error results in what I expect.
Rethrowing as an exception causes the multiprocessing to finish gracefully.

except pl.PolarsPanicError as ex:
     print(f"PolarsPanicError error: {ex}")
     raise ValueError("PolarsPanicError error: {ex}")

This seems to be a trap. If end users are never supposed to see panics, then a catch-all and rethrow seems appropriate (perhaps for production builds or gated behind an env var). If users are expected to see panics, then having to catch a custom error seems inconvenient and error prone. Additionally, I've not confirmed, but it seems that generic error-catching assumptions of the multiprocessing library might extend to other libraries as well, and polars' (or any py03 library?) compatibility with them is degraded.

I admit that this could be argued as expected behavior and not a bug, but I still contend that it is not ideal.

I'll leave it open for comments. Many thanks for a great library.

@stinodego
Copy link
Contributor

stinodego commented Mar 1, 2024

Simpler reproducer:

import polars as pl

try:
    raise pl.PolarsPanicError()
except Exception:
    print("hello")
# pyo3_runtime.PanicException

Indeed, I think this should be a proper Exception. The exception does subclass BaseException, so the following code will work:

import polars as pl

try:
    raise pl.PolarsPanicError()
except BaseException:
    print("hello")
# hello

The Python docs mention that exceptions should generally prefer to subclass Exception rather than BaseException.

programmers are encouraged to derive new exceptions from the Exception class or one of its subclasses, and not from BaseException.

Perhaps this is a bug in PyO3. I'll check with them to see if there is anything we can do. We cannot really solve this in Polars I think.

@stinodego stinodego added P-low Priority: low A-exceptions Area: exception handling and removed needs triage Awaiting prioritization by a maintainer labels Mar 1, 2024
@github-project-automation github-project-automation bot moved this to Ready in Backlog Mar 1, 2024
@stinodego stinodego assigned stinodego and unassigned stinodego Mar 1, 2024
@stinodego stinodego added the blocked Cannot be worked on due to external dependencies, or significant new internal features needed first label Mar 1, 2024
@stinodego
Copy link
Contributor

I reported this to PyO3: PyO3/pyo3#3918

I will close this for now as I don't think there's anything Polars can do about this.

@stinodego stinodego closed this as not planned Won't fix, can't repro, duplicate, stale Mar 1, 2024
@github-project-automation github-project-automation bot moved this from Ready to Done in Backlog Mar 1, 2024
@mesner
Copy link
Author

mesner commented Mar 1, 2024

@stinodego Any response to my interpretation in PyO3/pyo3#3918 (comment) ?

Edit: Thanks for the response and for looking into this!

@stinodego
Copy link
Contributor

While the panic exception itself apparently does not subclass Exception intentionally, Polars should not panic here. Broadcasting a literal of type List(Object) is the culprit here:

import polars as pl

df = pl.DataFrame({"x": [1, 2]}).with_columns(pl.lit([object()]))

@stinodego stinodego reopened this Mar 1, 2024
@stinodego stinodego added A-dtype-object Area: object data type and removed A-exceptions Area: exception handling blocked Cannot be worked on due to external dependencies, or significant new internal features needed first labels Mar 1, 2024
@stinodego stinodego moved this from Done to Ready in Backlog Mar 1, 2024
@stinodego stinodego added the A-panic Area: code that results in panic exceptions label Jun 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-dtype-object Area: object data type A-panic Area: code that results in panic exceptions bug Something isn't working P-low Priority: low python Related to Python Polars
Projects
Status: Ready
Development

No branches or pull requests

2 participants