Description
Script to reproduce
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(6))
df['index_1'] = [3, 3, 2, 2, 1, 1]
df['index_2'] = [2, 1, 2, 1, 2, 1]
df = df.set_index(['index_1', 'index_2'])
print(df)
print(df.sort_index(level='index_1', sort_remaining=False))
print(df.sort_index(level='index_1', ascending=False, sort_remaining=False))
iPython example output
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: df = pd.DataFrame(np.random.rand(6))
In [4]: df['index_1'] = [3, 3, 2, 2, 1, 1]
In [5]: df['index_2'] = [2, 1, 2, 1, 2, 1]
In [6]: df = df.set_index(['index_1', 'index_2'])
In [7]: df
Out[7]:
0
index_1 index_2
3 2 0.558019
1 0.096064
2 2 0.353176
1 0.153776
1 2 0.812181
1 0.313342
In [8]: df.sort_index(level='index_1', sort_remaining=False)
Out[8]:
0
index_1 index_2
1 2 0.812181
1 0.313342
2 2 0.353176
1 0.153776
3 2 0.558019
1 0.096064
In [9]: df.sort_index(level='index_1', ascending=False, sort_remaining=False)
Out[9]:
0
index_1 index_2
3 1 0.096064
2 0.558019
2 1 0.153776
2 0.353176
1 1 0.313342
2 0.812181
Problem description
Documentation for sort_index()
: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_index.html
The sort_index()
method will sort the dataframe by the index values. If the dataframe has a multiindex, then it sorts by the indexes in order. However, if you only want to sort by a subset of the indexes and leave the others in their current order, you can use the level
keyword argument to specify a level (or list of levels) to sort on, and set sort_remaining=False
to ignore the other levels. The sort_remaining
keyword seems to work correctly when ascending=True
(default), but when passing ascending=False
and sort_remaining=False
, Pandas continues to sort the other indices (in fact in ascending order, too).
Note that this happens regardless if you specify levels in any of the supported formats; viz.
df.sort_index(level=0, ascending=False, sort_remaining=False)
df.sort_index(level=[0], ascending=False, sort_remaining=False)
df.sort_index(level='index_1', ascending=False, sort_remaining=False)
df.sort_index(level=['index_1'], ascending=False, sort_remaining=False)
There have been a few issues/PRs on the sort_index()
method in the past year, so I'm not sure if one of those PRs broke this ability or whether this has been around longer or is unrelated.
Expected Output
In [9]: df.sort_index(level='index_1', ascending=False, sort_remaining=False)
Out[9]:
0
index_1 index_2
3 2 0.558019
1 0.096064
2 2 0.353176
1 0.153776
1 2 0.812181
1 0.313342
Note that the expected outcome could be achieved in this example with
df.reset_index('index_2').sort_index(ascending=False).reset_index().set_index(['index_1', 'index_2'])
or similar setting/resetting of indexes. Well, or you could just do nothing since the dataframe is already in that order, but you get the point.
Output of pd.show_versions()
pandas: 0.23.4
pytest: None
pip: 18.0
setuptools: 36.4.0
Cython: None
numpy: 1.13.1
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 6.1.0
sphinx: 1.7.4
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.1.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None