Description
Code example where setting an index with drop=false produce an ambiguous dataframe
import pandas as pd
df = pd.DataFrame({'id': [1, 2, 3, 4],
'value': ['A', 'B', 'C', 'D']})
df = df.set_index(['id'], drop=False)
df = df.sort_values(by=['value', 'id'], ascending=[True, False])
# ValueError: 'id' is both an index level and a column label, which is ambiguous.
Problem description
The change introduced in sort_values in version 0.23.0, allows specifying index or column level names. A valid feature. The problem is that you can still set an index and not dropping the value. (drop = False). By design, these 2 features are contradicting. You should not allow to define a dataframe that is raising exception when you use some of its functionalities. Otherwise, every time that you set index not dropping, you have to rename the index. Such us:
df = df.set_index(['id'], drop=False)
df.index.names = ['id_index']
Expected Output
Either
sort_value should have a priority order to use index names (1st) or column names (2nd) if not found in index. And to stop raising an exception on ambiguous columns
Or
set_index should force an index name to be set if drop=False. And the column name should be different than any existing columns in the dataframe
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.7.1.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.24.1
pytest: None
pip: 10.0.1
setuptools: 39.1.0
Cython: None
numpy: 1.16.2
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.3
openpyxl: None
xlrd: 1.2.0
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None