Closed
Description
Perhaps this is me misunderstanding the way Pandas handles indices for related objects behind the scenes, but I found the following behavior very unintuitive:
import pandas as pd
import numpy as np
np.random.seed(0)
idx = pd.MultiIndex.from_product([['John', 'Josh', 'Alex'], list('abcde')],
names=['Person', 'Letter'])
large = pd.DataFrame(data=np.random.randn(15, 2),
index=idx,
columns=['one', 'two'])
small = large.loc[['Jo'==d[0:2] for d in large.index.get_level_values('Person')]]
print small.index.levels[0]
print large.index.levels[0]
This returns:
Index([u'Alex', u'John', u'Josh'], dtype='object')
Index([u'Alex', u'John', u'Josh'], dtype='object')
rather than the expected
Index([u'John', u'Josh'], dtype='object')
Index([u'Alex', u'John', u'Josh'], dtype='object')
I could get the results I expected by running
small.index.get_level_values('Person').unique()
large.index.get_level_values('Person').unique()
but could someone explain why the behavior I'm seeing above with DataFrame.index.levels
is an appropriate result rather than a bug?
Metadata
Metadata
Assignees
Labels
No labels