Skip to content

Limit pyarrow version to be <20.0.0#1060

Merged
dreadatour merged 1 commit intomainfrom
pyarrow-version-limit
Apr 28, 2025
Merged

Limit pyarrow version to be <20.0.0#1060
dreadatour merged 1 commit intomainfrom
pyarrow-version-limit

Conversation

@dreadatour
Copy link
Contributor

@dreadatour dreadatour commented Apr 28, 2025

With latest pyarrow 20.0.0 release we now have tests broken.

pytest run output
$ pytest tests/unit/lib/test_hf.py -k test_hf_array
Test session starts (platform: darwin, Python 3.13.2, pytest 8.3.5, pytest-sugar 1.0.0)
benchmark: 5.1.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /Users/vlad/work/iterative/datachain
configfile: pyproject.toml
plugins: servers-0.5.10, cov-6.0.0, hypothesis-6.129.4, sugar-1.0.0, benchmark-5.1.0, mock-3.14.0, xdist-3.6.1, requests-mock-1.12.1
collected 6 items / 5 deselected / 1 selected


―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― test_hf_array ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――

    def test_hf_array():
        ds = Dataset.from_dict({"arr": [[[0, 1], [2, 3]]]})
        new_features = ds.features.copy()
        new_features["arr"] = Array2D(shape=(2, 2), dtype="int32")
        ds = ds.cast(new_features)
        schema = get_output_schema(ds.features)
        assert schema["arr"] == list[list[int]]

        gen = HFGenerator(ds, dict_to_data_model("", schema))
        gen.setup()
>       row = next(iter(gen.process()))

/Users/vlad/work/iterative/datachain/tests/unit/lib/test_hf.py:88:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/Users/vlad/work/iterative/datachain/src/datachain/lib/hf.py:99: in process
    for row in ds:
/Users/vlad/.virtualenvs/datachain/lib/python3.13/site-packages/datasets/arrow_dataset.py:2387: in __iter__
    formatted_output = format_table(
/Users/vlad/.virtualenvs/datachain/lib/python3.13/site-packages/datasets/formatting/formatting.py:666: in format_table
    formatted_output = formatter(pa_table_to_format, query_type=query_type)
/Users/vlad/.virtualenvs/datachain/lib/python3.13/site-packages/datasets/formatting/formatting.py:411: in __call__
    return self.format_row(pa_table)
/Users/vlad/.virtualenvs/datachain/lib/python3.13/site-packages/datasets/formatting/formatting.py:459: in format_row
    row = self.python_arrow_extractor().extract_row(pa_table)
/Users/vlad/.virtualenvs/datachain/lib/python3.13/site-packages/datasets/formatting/formatting.py:145: in extract_row
    return _unnest(pa_table.to_pydict())
pyarrow/table.pxi:2308: in pyarrow.lib._Tabular.to_pydict
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   TypeError: ArrayExtensionArray.to_pylist() got an unexpected keyword argument 'maps_as_pydicts'

pyarrow/table.pxi:1380: TypeError
-------------------------------------------------------------------------------- Captured stderr call ---------------------------------------------------------------------------------
Casting the dataset: 100%|██████████| 1/1 [00:00<00:00, 181.55 examples/s]


 tests/unit/lib/test_hf.py ⨯                                                                                                                                            100% ██████████
=============================================================================== short test summary info ===============================================================================
FAILED tests/unit/lib/test_hf.py::test_hf_array - TypeError: ArrayExtensionArray.to_pylist() got an unexpected keyword argument 'maps_as_pydicts'

Results (0.39s):
       1 failed
         - tests/unit/lib/test_hf.py:78 test_hf_array
       5 deselected
~/w/i/datachain

Also in CI here, for example.

This looks a bit odd and in short amount of time I can not find a way to fix this. We definitely need a proper fix for this and to remove version limit, here is an issue for that: #1061

@dreadatour dreadatour requested review from a team, Copilot and skshetry April 28, 2025 02:49
@dreadatour dreadatour self-assigned this Apr 28, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR temporarily limits the version of pyarrow to versions below 20 to work around failing tests with the new release.

  • Updated the pyarrow dependency constraint in pyproject.toml
  • Aimed to ensure compatibility until a proper fix is implemented

@codecov
Copy link

codecov bot commented Apr 28, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.19%. Comparing base (1550a18) to head (363ecbb).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1060      +/-   ##
==========================================
- Coverage   88.21%   88.19%   -0.02%     
==========================================
  Files         146      146              
  Lines       12465    12465              
  Branches     1736     1736              
==========================================
- Hits        10996    10994       -2     
- Misses       1048     1050       +2     
  Partials      421      421              
Flag Coverage Δ
datachain 88.12% <ø> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dreadatour dreadatour merged commit 959bddf into main Apr 28, 2025
35 checks passed
@dreadatour dreadatour deleted the pyarrow-version-limit branch April 28, 2025 08:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants