Skip to content

BUG: duplicated() on a empty DataFrame or a DataFrame with an empty subset of columns with a non-empty index #12869

Open
@sebov

Description

@sebov

Trying to investigate different subset of data frame's columns we get into trouble when 'duplicated' method is invoked for a data frame sliced to an empty subset of columns.

ValueError                                Traceback (most recent call last)
<ipython-input-672-9c619ea6d0ef> in <module>()
     14 print data_frame[cols].sum()
     15 print "---"
---> 16 print data_frame[cols].duplicated()
     17 
     18 

.../local/lib/python2.7/site-packages/pandas/util/decorators.pyc in wrapper(*args, **kwargs)
     89                 else:
     90                     kwargs[new_arg_name] = new_arg_value
---> 91             return func(*args, **kwargs)
     92         return wrapper
     93     return _deprecate_kwarg

.../local/lib/python2.7/site-packages/pandas/core/frame.pyc in duplicated(self, subset, keep)
   3100 
   3101         vals = (self[col].values for col in subset)
-> 3102         labels, shape = map(list, zip(*map(f, vals)))
   3103 
   3104         ids = get_group_index(labels, shape, sort=False, xnull=False)

ValueError: need more than 0 values to unpack

Code Sample, a copy-pastable example if possible

import pandas as pd
data_frame = pd.DataFrame({'a': [1]*5})
cols = ['a']
print data_frame[cols]
print "---"
print data_frame[cols].sum()
print "---"
print data_frame[cols].duplicated()
print "---"
cols = []
print data_frame[cols]
print "---"
print data_frame[cols].sum()
print "---"
print data_frame[cols].duplicated()

Expected Output

   a
0  1
1  1
2  1
3  1
4  1

---
a    5
dtype: int64

---
0    False
1     True
2     True
3     True
4     True
dtype: bool

---
Empty DataFrame
Columns: []
Index: [0, 1, 2, 3, 4]

---
Series([], dtype: float64)

---
0    False
1     True
2     True
3     True
4     True
dtype: bool

output of pd.show_versions()

commit: None
python: 2.7.9.final.0
python-bits: 64
OS: Linux
OS-release: 3.19.0-58-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.0
nose: 1.3.7
pip: 1.5.6
setuptools: 12.2
Cython: 0.23.4
numpy: 1.11.0
scipy: 0.16.1
statsmodels: None
xarray: None
IPython: 4.0.3
sphinx: None
patsy: 0.4.0
dateutil: 2.5.2
pytz: 2016.3
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: 0.7.6
lxml: None
bs4: 4.3.2
html5lib: 0.999
httplib2: 0.9
apiclient: None
sqlalchemy: None
pymysql: 0.6.6.None
psycopg2: None
jinja2: 2.8
boto: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateduplicatedduplicated, drop_duplicates

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions