Description
Pandas is incapable of renaming a pandas.Index object with tuples as the new value. Providing a tuple as new_name
in pandas.DataFrame.rename({old_name: new_name}, axis="index")
returns a pandas.MultiIndex
object, and providing it within a singleton tuple returns an undesirable result. See code below (work-around at bottom):...
import pandas as pd
import numpy as np
df = pd.DataFrame(data = np.arange(5), index=[(x, x) for x in range(5)], columns=["Value"])
print(df) # Note that df.index is a pd.Index object of 2-length tuples
# Wish to rename axis label, but keep the same style
df2 = df.rename({(1,1):(1,5)}, axis="index")
print(df2) # Woah! - df2.index is of MultiIndex type
print(df2.index) # ... and here's proof
# Maybe I can get around this by passing it as a singleton tuple...
df3 = df.rename({(1,1):((1,5),)}, axis="index")
print(df3) # ... apparently not
Will produce the output:
Value
(0, 0) 0
(1, 1) 1
(2, 2) 2
(3, 3) 3
(4, 4) 4
Value
0 0 0
1 5 1
2 2 2
3 3 3
4 4 4
MultiIndex(levels=[[0, 1, 2, 3, 4], [0, 2, 3, 4, 5]],
labels=[[0, 1, 2, 3, 4], [0, 4, 1, 2, 3]])
Value
(0, 0) 0
((1, 5),) 1
(2, 2) 2
(3, 3) 3
(4, 4) 4
Desired/Expected output:
Value
(0, 0) 0
(1, 5) 1
(2, 2) 2
(3, 3) 3
(4, 4) 4
Problem description
The current behaviour is a problem for two reasons:
- It is un-intuitive - I can't see why a user would expect renaming an index to change the index's type.
- There is no way rename Index objects with tuples
I have checked for similar issues by search of the word rename
, and at time of writing, pandas 0.22.0 is the latest released version.
Output of pd.show_versions()
pandas: 0.22.0
pytest: 3.0.3
pip: 9.0.1
setuptools: 28.8.0
Cython: 0.25.1
numpy: 1.11.2
scipy: 0.18.1
pyarrow: None
xarray: None
IPython: 5.1.0
sphinx: 1.4.8
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2016.7
blosc: None
bottleneck: 1.1.0
tables: 3.3.0
numexpr: 2.6.1
feather: None
matplotlib: 1.5.3
openpyxl: 2.4.9
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: 3.8.0
bs4: 4.5.1
html5lib: 1.0b10
sqlalchemy: 1.1.3
pymysql: None
psycopg2: None
jinja2: 2.8
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
Workaround
The workaround below uses set_value
function which the documentation tells the user to avoid using (unless you really know what you're doing):
df.index.set_value(df.index.get_values(), (1,1), (1, 5))
df.reset_index(inplace=True)
df.set_index("index", inplace=True)
df.index.name = None # Arguably not necessary...
print(df)
Produces the output:
Value
(0, 0) 0
(1, 5) 1
(2, 2) 2
(3, 3) 3
(4, 4) 4