Skip to content

Boolean numpy-backed type fails when pyarrow is installed in env #3205

@pavlomuts

Description

@pavlomuts

I am using altair with pandas dataframe with numpy-backed types and I using streamlit to visualize it. streamlit has pyarrow as dependency and it turns out that datatype inference using pyarrow fails for nullable boolean of pandas dtype. Small (unrealistic) example reproduces the error:

import altair as alt
import pandas as pd

data = pd.DataFrame(
    {
        "x": pd.Series([1, 3, 5, 1, 3, 5]),
        "y": pd.Series([2, 4, 6, 2, 4, 6]),
        "flag": pd.Series([True, False, True, False, True, None], dtype="boolean"),
    }
)

chart = alt.Chart(data).mark_circle().encode(x="x", y="y", color="flag")

Traceback:

Traceback (most recent call last):
  File "C:\Users\ad\AppData\Local\Programs\Python\Python311\Lib\runpy.py", line 198, in _run_module_as_main
    return _run_code(code, main_globals, None,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ad\AppData\Local\Programs\Python\Python311\Lib\runpy.py", line 88, in _run_code
    exec(code, run_globals)
  File "c:\Users\ad\.vscode\extensions\ms-python.python-2023.16.0\pythonFiles\lib\python\debugpy\adapter/../..\debugpy\launcher/../..\debugpy\__main__.py", line 39, in <module>
    cli.main()
  File "c:\Users\ad\.vscode\extensions\ms-python.python-2023.16.0\pythonFiles\lib\python\debugpy\adapter/../..\debugpy\launcher/../..\debugpy/..\debugpy\server\cli.py", line 430, in main
    run()
  File "c:\Users\ad\.vscode\extensions\ms-python.python-2023.16.0\pythonFiles\lib\python\debugpy\adapter/../..\debugpy\launcher/../..\debugpy/..\debugpy\server\cli.py", line 284, in run_file
    runpy.run_path(target, run_name="__main__")
  File "c:\Users\ad\.vscode\extensions\ms-python.python-2023.16.0\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_runpy.py", line 321, in run_path
    return _run_module_code(code, init_globals, run_name,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\ad\.vscode\extensions\ms-python.python-2023.16.0\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_runpy.py", line 135, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "c:\Users\ad\.vscode\extensions\ms-python.python-2023.16.0\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_runpy.py", line 124, in _run_code
    exec(code, run_globals)
  File "test.py", line 18, in <module>
    chart.save(file, format="html")
  File "C:\Users\ad\AppData\Local\pypoetry\Cache\virtualenvs\ero-bIEndBiR-py3.11\Lib\site-packages\altair\vegalite\v5\api.py", line 1066, in save
    result = save(**kwds)
             ^^^^^^^^^^^^
  File "C:\Users\ad\AppData\Local\pypoetry\Cache\virtualenvs\ero-bIEndBiR-py3.11\Lib\site-packages\altair\utils\save.py", line 189, in save
    perform_save()
  File "C:\Users\ad\AppData\Local\pypoetry\Cache\virtualenvs\ero-bIEndBiR-py3.11\Lib\site-packages\altair\utils\save.py", line 127, in perform_save
    spec = chart.to_dict(context={"pre_transform": False})
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ad\AppData\Local\pypoetry\Cache\virtualenvs\ero-bIEndBiR-py3.11\Lib\site-packages\altair\vegalite\v5\api.py", line 2695, in to_dict
    return super().to_dict(
           ^^^^^^^^^^^^^^^^
  File "C:\Users\ad\AppData\Local\pypoetry\Cache\virtualenvs\ero-bIEndBiR-py3.11\Lib\site-packages\altair\vegalite\v5\api.py", line 903, in to_dict
    vegalite_spec = super(TopLevelMixin, copy).to_dict(  # type: ignore[misc]
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ad\AppData\Local\pypoetry\Cache\virtualenvs\ero-bIEndBiR-py3.11\Lib\site-packages\altair\utils\schemapi.py", line 965, in to_dict
    result = _todict(
             ^^^^^^^^
  File "C:\Users\ad\AppData\Local\pypoetry\Cache\virtualenvs\ero-bIEndBiR-py3.11\Lib\site-packages\altair\utils\schemapi.py", line 477, in _todict
    return {k: _todict(v, context) for k, v in obj.items() if v is not Undefined}
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ad\AppData\Local\pypoetry\Cache\virtualenvs\ero-bIEndBiR-py3.11\Lib\site-packages\altair\utils\schemapi.py", line 477, in <dictcomp>
    return {k: _todict(v, context) for k, v in obj.items() if v is not Undefined}
               ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ad\AppData\Local\pypoetry\Cache\virtualenvs\ero-bIEndBiR-py3.11\Lib\site-packages\altair\utils\schemapi.py", line 473, in _todict
    return obj.to_dict(validate=False, context=context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ad\AppData\Local\pypoetry\Cache\virtualenvs\ero-bIEndBiR-py3.11\Lib\site-packages\altair\utils\schemapi.py", line 965, in to_dict
    result = _todict(
             ^^^^^^^^
  File "C:\Users\ad\AppData\Local\pypoetry\Cache\virtualenvs\ero-bIEndBiR-py3.11\Lib\site-packages\altair\utils\schemapi.py", line 477, in _todict
    return {k: _todict(v, context) for k, v in obj.items() if v is not Undefined}
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ad\AppData\Local\pypoetry\Cache\virtualenvs\ero-bIEndBiR-py3.11\Lib\site-packages\altair\utils\schemapi.py", line 477, in <dictcomp>
    return {k: _todict(v, context) for k, v in obj.items() if v is not Undefined}
               ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ad\AppData\Local\pypoetry\Cache\virtualenvs\ero-bIEndBiR-py3.11\Lib\site-packages\altair\utils\schemapi.py", line 473, in _todict
    return obj.to_dict(validate=False, context=context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ad\AppData\Local\pypoetry\Cache\virtualenvs\ero-bIEndBiR-py3.11\Lib\site-packages\altair\vegalite\v5\schema\channels.py", line 34, in to_dict
    parsed = parse_shorthand(shorthand, data=context.get('data', None))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ad\AppData\Local\pypoetry\Cache\virtualenvs\ero-bIEndBiR-py3.11\Lib\site-packages\altair\utils\core.py", line 590, in parse_shorthand
    attrs["type"] = infer_vegalite_type_for_dfi_column(column)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ad\AppData\Local\pypoetry\Cache\virtualenvs\ero-bIEndBiR-py3.11\Lib\site-packages\altair\utils\core.py", line 639, in infer_vegalite_type_for_dfi_column
    kind = column.dtype[0]
           ^^^^^^^^^^^^
  File "properties.pyx", line 36, in pandas._libs.properties.CachedProperty.__get__
  File "C:\Users\ad\AppData\Local\pypoetry\Cache\virtualenvs\ero-bIEndBiR-py3.11\Lib\site-packages\pandas\core\interchange\column.py", line 128, in dtype
    return self._dtype_from_pandasdtype(dtype)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ad\AppData\Local\pypoetry\Cache\virtualenvs\ero-bIEndBiR-py3.11\Lib\site-packages\pandas\core\interchange\column.py", line 147, in _dtype_from_pandasdtype
    byteorder = dtype.byteorder
                ^^^^^^^^^^^^^^^
AttributeError: 'BooleanDtype' object has no attribute 'byteorder'

And my environment:

altair                    5.1.1        Vega-Altair: A declarative statistical visualization library for Python.
astroid                   2.15.8       An abstract syntax tree for Python with inference support.
flake8                    6.1.0        the modular source code checker: pep8 pyflakes and co
packaging                 23.1         Core utilities for Python packages
pandas                    2.1.1        Powerful data structures for data analysis, time series, and statistics
pathspec                  0.11.2       Utility library for gitignore style pattern matching of file paths.
pillow                    9.5.0        Python Imaging Library (Fork)
platformdirs              3.10.0       A small Python package for determining appropriate platform-specific dirs, e.g. a "user data dir".
pluggy                    1.3.0        plugin and hook calling mechanisms for python
protobuf                  4.24.3
pyarrow                   13.0.0       Python library for Apache Arrow
requests                  2.31.0       Python HTTP for Humans.
rich                      13.5.3       Render rich text, tables, progress bars, syntax highlighting, markdown and more to the terminal
rpds-py                   0.10.3       Python bindings to Rust's persistent data structures (rpds)
ruff                      0.0.291      An extremely fast Python linter, written in Rust.
scipy                     1.11.2       Fundamental algorithms for scientific computing in Python
six                       1.16.0       Python 2 and 3 compatibility utilities
smmap                     5.0.1        A pure Python implementation of a sliding window memory map manager
snakeviz                  2.2.0        A web-based viewer for Python profiler output
sqlalchemy                2.0.21       Database Abstraction Library
streamlit                 1.27.0       A faster way to build and share data apps
tabulate                  0.9.0        Pretty-print tabular data

yamllint                  1.32.0       A linter for YAML files.
zipp                      3.17.0       Backport of pathlib-compatible object wrapper for zip files

Thank you for taking a looking and for making such a great tool!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions