Skip to content

BUG: DataFrame.where with category dtype #16979

Closed
@rhaps0dy

Description

@rhaps0dy

Code Sample (it is copy-pastable)

import pandas as pd, numpy as np
df = pd.DataFrame(np.arange(2*3).reshape(2,3), columns=list('abc'))
mask = np.random.rand(*df.shape) < 0.5
df.where(mask)
# Output is correct:
#      a   b    c
# 0  NaN NaN  2.0
# 1  3.0 NaN  NaN

df.a = df.a.astype('category')
df.b = df.b.astype('category')
df.c = df.c.astype('category')
df.where(mask)
# ValueError: Wrong number of items passed 2, placement implies 1
# Expected output: the same as before, but now with dtype `category`.

df.a.where(mask[:,0])
# 0    NaN
# 1    3.0
# Name: a, dtype: float64
# should stay in dtype category

df.a.where(mask[:,0], other=None)
# 0    None
# 1    3
# Name: a, dtype: object
# Expected output: should stay in dtype category

Problem description

df.where should work with all dtypes, the documentation doesn't say it works only for some dtypes. Also, NaNs are already correctly handled as missing data in pd.Series of type 'category', so one should be able to assign NaNs to them. Same with converting the dtype.

While writing this report I found that doing it column-by-column works correctly, so I'll use that as a workaround.

Output of pd.show_versions()

# Paste the output here pd.show_versions() here

INSTALLED VERSIONS [1/1839]

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-81-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.20.2
pytest: None
pip: 9.0.1
setuptools: 36.0.1
Cython: None
numpy: 1.13.1
scipy: 0.19.0
xarray: None
IPython: 6.1.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.9999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

Ubuntu lsb_release -a:

No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.2 LTS
Release: 16.04
Codename: xenial

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions