Closed
Description
Code Sample, a copy-pastable example if possible
import pandas as pd
df=pd.DataFrame({'variable': ['a','a','b','b','c','c'],
'value' : [1000,2000,10,20,100,200]},
index=[1,2]*3)
df.nlargest(3,'value')
Problem description
The result should only have 3 rows. If the index has unique values, it is correct. But when the index has duplicate values, the incorrect result is produced.
The example produces the following output:
value variable
2 2000 a
2 2000 a
1 1000 a
2 200 c
2 200 c
1 100 c
2 20 b
2 20 b
1 10 b
Expected Output
value variable
2 2000 a
1 1000 a
2 200 c
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.19.0
nose: None
pip: 8.1.2
setuptools: 27.2.0
Cython: None
numpy: 1.11.2
scipy: None
statsmodels: None
xarray: None
IPython: 5.1.0
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.5.1
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None
Metadata
Metadata
Assignees
Labels
No labels