Skip to content

BUG: pd.Dataframe.sort_values() does not sort values of percentage well. #43680

Closed
@likilyn

Description

@likilyn

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

import pandas as pd

data1 = {"TIMES": [3, 5, 2, 100, 20]}
df1 = pd.DataFrame(data=data1)
print("===The original dataframe===")
print(df1)

df1["RATIO"] = df1["TIMES"]/df1["TIMES"].sum()
df1["PERCENT"] = df1["RATIO"].apply(lambda x: format(x, ".2%"))
df1 = df1.sort_values(by="RATIO", ascending=True)
print("===The dataframe sort by RATIO===")
print(df1)

df1 = df1.sort_values(by="PERCENT", ascending=True)
print("===The dataframe sort by PERCENT===")
print(df1)

Issue Description

This is what the code out. And when the dataframe is sorted by RATIO , everything is OK .But, when it comes to be sorted by PERCENT, something is wrong, and the sorted column is not correct as you can see below.

===The original dataframe===
TIMES
0 3
1 5
2 2
3 100
4 20
===The dataframe sort by RATIO===
TIMES RATIO PERCENT
2 2 0.015385 1.54%
0 3 0.023077 2.31%
1 5 0.038462 3.85%
4 20 0.153846 15.38%
3 100 0.769231 76.92%
===The dataframe sort by PERCENT===
TIMES RATIO PERCENT
2 2 0.015385 1.54%
4 20 0.153846 15.38%
0 3 0.023077 2.31%
1 5 0.038462 3.85%
3 100 0.769231 76.92%

Expected Behavior

I expect when the dataframe is sorted by PERCENT, it looks as same as sorted by RATIO.Like this:

===The original dataframe===
TIMES
0 3
1 5
2 2
3 100
4 20
===The dataframe sort by RATIO===
TIMES RATIO PERCENT
2 2 0.015385 1.54%
0 3 0.023077 2.31%
1 5 0.038462 3.85%
4 20 0.153846 15.38%
3 100 0.769231 76.92%
===The dataframe sort by PERCENT===
TIMES RATIO PERCENT
2 2 0.015385 1.54%
0 3 0.023077 2.31%
1 5 0.038462 3.85%
4 20 0.153846 15.38%
3 100 0.769231 76.92%

Installed Versions

INSTALLED VERSIONS ------------------ commit : 73c6825 python : 3.8.10.final.0 python-bits : 64 OS : Linux OS-release : 4.4.0-19041-Microsoft Version : #1237-Microsoft Sat Sep 11 14:32:00 PST 2021 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : C.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.3.3
numpy : 1.21.1
pytz : 2021.1
dateutil : 2.8.2
pip : 20.0.2
setuptools : 45.2.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : None
pandas_datareader: None
bs4 : 4.10.0
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions