- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 19.2k
Description
Pandas version checks
- 
I have checked that this issue has not already been reported. 
- 
I have confirmed this bug exists on the latest version of pandas. 
- 
I have confirmed this bug exists on the main branch of pandas. 
Reproducible Example
import pandas as pd
df = pd.DataFrame({"a": [1, 1, 2], "b": [3, 4, 5]}).set_index("a")
df.columns = pd.MultiIndex.from_tuples([("b",)])
# Works
df.groupby("a")[("b",)].sum()
df.columns = pd.MultiIndex.from_tuples([("b", 1)])
# Fails
df.groupby("a")[("b", 1)].sum()
# "ValueError: Cannot subset columns with a tuple with more than one element. Use a list instead.".Issue Description
Prior to 1.0.0, passing a multi-element tuple to DataFrameGroupBy was treated as passing a list of the tuple elements (e.g., df_gb[("a", "b")] === df_gb[["a", "b"]] === df_gb["a", "b"]). The ability to pass multi-element tuples was deprecated with a FutureWarning in 1.0.0, and removed in 2.0.0 (see #30546).
A related behavior is that passing a tuple to a non-MultiIndexed DataFrame is allowed (see #36302)
Expected Behavior
There should be no difference between the two examples above. DataFrameGroupBy.__getitem__(tuple) should match DataFrame.__getitem__(tuple):
- If len(tuple) < df.columns.nlevels, return aDataGrameGroupByselecting the columns that match the first n levels (and reduce the column level depth bylen(tuple)
- If len(tuple) == df.columns.nlevels, return aSeriesGroupBy
- If len(tuple) > df.columns.nlevels, raise an error.
Installed Versions
INSTALLED VERSIONS
commit                : b48abb2
python                : 3.12.2.final.0
python-bits           : 64
OS                    : Linux
OS-release            : 5.15.146.1-microsoft-standard-WSL2
Version               : #1 SMP Thu Jan 11 04:09:03 UTC 2024
machine               : x86_64
processor             :
byteorder             : little
LC_ALL                : None
LANG                  : C.UTF-8
LOCALE                : C.UTF-8
pandas                : 3.0.0.dev0+631.gb48abb26a9.dirty
numpy                 : 1.26.4
pytz                  : 2024.1
dateutil              : 2.9.0.post0
setuptools            : 69.2.0
pip                   : 24.0
Cython                : 3.0.9
pytest                : 8.1.1
hypothesis            : 6.99.13
sphinx                : 7.2.6
blosc                 : None
feather               : None
xlsxwriter            : 3.2.0
lxml.etree            : 5.1.0
html5lib              : 1.1
pymysql               : 1.4.6
psycopg2              : 2.9.9
jinja2                : 3.1.3
IPython               : 8.22.2
pandas_datareader     : None
adbc-driver-postgresql: None
adbc-driver-sqlite    : None
bs4                   : 4.12.3
bottleneck            : 1.3.8
fastparquet           : 2024.2.0
fsspec                : 2024.3.1
gcsfs                 : 2024.3.1
matplotlib            : 3.8.3
numba                 : 0.59.1
numexpr               : 2.9.0
odfpy                 : None
openpyxl              : 3.1.2
pyarrow               : 15.0.2
pyreadstat            : 1.2.7
python-calamine       : None
pyxlsb                : 1.0.10
s3fs                  : 2024.3.1
scipy                 : 1.12.0
sqlalchemy            : 2.0.29
tables                : 3.9.2
tabulate              : 0.9.0
xarray                : 2024.2.0
xlrd                  : 2.0.1
zstandard             : 0.22.0
tzdata                : 2024.1
qtpy                  : None
pyqt5                 : None