Skip to content

BUG: Series stealing references from CategoricalIndex is invalid for read-only arrays #63306

@vyasr

Description

@vyasr

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

# Not necessary for pandas 3.0, but I validated this in both the 3.0 rc and 2.3.3
# and this setting makes the MRE work for both.
pd.set_option("mode.copy_on_write", True)

# We must use an int8 array here or pandas will make a (writeable) copy of the array in
# https://github.com/pandas-dev/pandas/blob/499c5d4dd52a8645bf96c39bad60613097e84c06/pandas/core/dtypes/cast.py#L878
# We also must convert the codes to a numpy array since that produces a read-only array,
# whereas pandas has more internal logic to handle an Index correctly in CoW mode.
codes = pd.Index([0, 1, 2, 3], dtype="int8").to_numpy()
cats = pd.Index(["a", "b", "c", "d"])
data = pd.Categorical.from_codes(codes, cats)

# We can't create a series directly from the Categorical data because the
# implementation details in pandas prevent copies in this case. When we construct a
# Series from a CategoricalIndex pandas tries to steal references to optimize
# copying in CoW mode, which is necessary to observe the error.
s = pd.Series(pd.Index(data))
s[[False, False, True, True]] = cats[2:4]

Issue Description

The above example will fail with an error

  File "${SITE}/pandas/core/arrays/_mixins.py", line 269, in __setitem__
    self._ndarray[key] = value
    ~~~~~~~~~~~~~^^^^^
ValueError: assignment destination is read-only

The issue arises under the following circumstances:

  1. mode.copy_on_write is enabled
  2. A Series (or column in a DataFrame) is constructed from a read-only array
  3. The Series is constructed from an input that is determined not to have any other outstanding references such that CoW will not force a copy the first time an operation occurs.

Under these circumstances, pandas will not currently realize that the input data is read-only and will attempt to modify it, resulting in the above error. This example is a fairly specific case where this occurs, but I suspect that there are other similar cases where it is possible to end up with a read-only array inside a pandas object in CoW mode. The challenge is that such cases are easily obscured by any references floating around. Point 3 above is particularly delicate. While debugging this issue in my original example it took a lot of work to distill it into a minimal example because the reference counting logic in pandas will result in copies at various points in CoW mode if any foreign references exist, and I found it quite easy to wind up in cases where such references were preserved in orphaned reference cycles or hidden in other variables. In such cases pandas will defensively make copies that would cover up issues with a read-only input array.

Expected Behavior

When input data is read-only, in CoW mode pandas should check that blocks were constructed from a read-only input and make a copy when writing if necessary. That check could probably be inserted around here in setitem, but I don't know if that is the best place for it.

Installed Versions

❯ python
Python 3.13.11 | packaged by conda-forge | (main, Dec  6 2025, 11:24:03) [GCC 14.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit                : 1a3230dc5be4c87b8356765ea3b6568d37cb82fd
python                : 3.13.11
python-bits           : 64
OS                    : Linux
OS-release            : 5.4.0-208-generic
Version               : #228-Ubuntu SMP Fri Feb 7 19:41:33 UTC 2025
machine               : x86_64
processor             : x86_64
byteorder             : little
LC_ALL                : None
LANG                  : en_US.UTF-8
LOCALE                : en_US.UTF-8

pandas                : 3.0.0rc0
numpy                 : 2.4.0rc1
dateutil              : 2.9.0.post0
pip                   : 25.3
Cython                : None
sphinx                : None
IPython               : None
adbc-driver-postgresql: None
adbc-driver-sqlite    : None
bs4                   : None
bottleneck            : None
fastparquet           : None
fsspec                : None
html5lib              : None
hypothesis            : None
gcsfs                 : None
jinja2                : None
lxml.etree            : None
matplotlib            : None
numba                 : None
numexpr               : None
odfpy                 : None
openpyxl              : None
psycopg2              : None
pymysql               : None
pyarrow               : None
pyiceberg             : None
pyreadstat            : None
pytest                : None
python-calamine       : None
pytz                  : None
pyxlsb                : None
s3fs                  : None
scipy                 : None
sqlalchemy            : None
tables                : None
tabulate              : None
xarray                : None
xlrd                  : None
xlsxwriter            : None
zstandard             : None
qtpy                  : None
pyqt5                 : None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions