Closed
Description
In [3]: df1 = pd.DataFrame(np.random.randint(0,10,(4,3)), columns=['a','b','c'])
In [4]: df1
Out[4]:
a b c
0 9 3 2
1 9 0 2
2 7 7 6
3 3 4 2
In [5]: df2 = df1.reindex(columns=['c','a','d'])
In [6]: df2
Out[6]:
c a d
0 2 9 NaN
1 2 9 NaN
2 6 7 NaN
3 2 3 NaN
In [7]: df2.columns
Out[7]: Index([u'c', u'a', u'd'], dtype='object')
In [8]: ci = pd.MultiIndex.from_product([['x','y'],['a','b','c']])
In [10]: df3 = pd.DataFrame(np.random.randint(0,10,(4,6)), columns=ci)
In [11]: df3
Out[11]:
x y
a b c a b c
0 3 1 5 3 8 7
1 2 7 6 8 7 4
2 8 3 5 7 1 1
3 7 5 8 7 8 7
In [12]: df4 = df3.reindex(columns=['c','a','d'], level=1)
In [13]: df4
Out[13]:
x y
c a c a
0 5 3 7 3
1 6 2 4 8
2 5 8 1 7
3 8 7 7 7
In [14]: df4.columns
Out[14]:
MultiIndex(levels=[[u'x', u'y'], [u'c', u'a', u'd']],
labels=[[0, 0, 1, 1], [0, 1, 0, 1]])
When passing a nonexistent column name to reindex
on a dataframe without multiindex columns, the result is:
- a
NaN
column with the "new" column name - the
columns
attribute matches the columns in the dataframe
The same action on a multiindex dataframe produces different results:
- there are no
NaN
columns (this may not be a problem) - the
columns
attribute of the resulting dataframe does not match the dateframe column names (this appears to be a bug)
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Darwin
OS-release: 16.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None
pandas: 0.19.1
nose: 1.3.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.1
numpy: 1.11.2
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: 1.4.8
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.7
blosc: None
bottleneck: 1.1.0
tables: 3.3.0
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: 2.4.0
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: 3.6.4
bs4: 4.5.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.1.4
pymysql: None
psycopg2: 2.6.2 (dt dec pq3 ext lo64)
jinja2: 2.8
boto: 2.43.0
pandas_datareader: None