CSVDataset behaves unexpectedly if src is a dataframe unexpected index #8201

ashgillman · 2024-11-13T02:50:43Z

Describe the bug
CSVDataset accepts pandas DataFrames as input for src. But it makes assumptions about the index.

This is because convert_tables_to_dicts uses .loc instead of .iloc. It generates ordinal indexes to subset on but treats them as names indices.

MONAI/monai/data/utils.py

Line 1494 in 0bb20a8

data_ = df.loc[rows] if col_names is None else df.loc[rows, col_names]

To Reproduce

import numpy
import pandas
import monai

df = pandas.DataFrame(numpy.random.random((50, 3)))
df_subset = df.iloc[numpy.arange(0, 50, 5)]
print(df_subset.shape)  # (10, 3)

ds = monai.data.CSVDataset(df_subset)
print(len(ds))  # 3

Expected behavior
print(len(ds)) should return 10.
It returns 3 because it looks up indices slice(10), which match indices 0, 5 and 10 from the subset.

Environment
Shouldn't be relevant?

Additional context
Simple fix:

MONAI/monai/data/utils.py

Line 1494 in 0bb20a8

data_ = df.loc[rows] if col_names is None else df.loc[rows, col_names]

The first .loc should be .iloc, and the second should be .iloc[rows][col_names]

The text was updated successfully, but these errors were encountered:

ashgillman · 2024-11-13T02:55:03Z

Workaround is to always ".reset_index()" on src DataFrames.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CSVDataset behaves unexpectedly if src is a dataframe unexpected index #8201

CSVDataset behaves unexpectedly if src is a dataframe unexpected index #8201

ashgillman commented Nov 13, 2024 •

edited

Loading

ashgillman commented Nov 13, 2024

CSVDataset behaves unexpectedly if src is a dataframe unexpected index #8201

CSVDataset behaves unexpectedly if src is a dataframe unexpected index #8201

Comments

ashgillman commented Nov 13, 2024 • edited Loading

ashgillman commented Nov 13, 2024

ashgillman commented Nov 13, 2024 •

edited

Loading