Description
Code Sample (it is copy-pastable)
import pandas as pd, numpy as np
df = pd.DataFrame(np.arange(2*3).reshape(2,3), columns=list('abc'))
mask = np.random.rand(*df.shape) < 0.5
df.where(mask)
# Output is correct:
# a b c
# 0 NaN NaN 2.0
# 1 3.0 NaN NaN
df.a = df.a.astype('category')
df.b = df.b.astype('category')
df.c = df.c.astype('category')
df.where(mask)
# ValueError: Wrong number of items passed 2, placement implies 1
# Expected output: the same as before, but now with dtype `category`.
df.a.where(mask[:,0])
# 0 NaN
# 1 3.0
# Name: a, dtype: float64
# should stay in dtype category
df.a.where(mask[:,0], other=None)
# 0 None
# 1 3
# Name: a, dtype: object
# Expected output: should stay in dtype category
Problem description
df.where
should work with all dtypes, the documentation doesn't say it works only for some dtypes. Also, NaNs are already correctly handled as missing data in pd.Series
of type 'category', so one should be able to assign NaNs to them. Same with converting the dtype.
While writing this report I found that doing it column-by-column works correctly, so I'll use that as a workaround.
Output of pd.show_versions()
INSTALLED VERSIONS [1/1839]
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-81-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.20.2
pytest: None
pip: 9.0.1
setuptools: 36.0.1
Cython: None
numpy: 1.13.1
scipy: 0.19.0
xarray: None
IPython: 6.1.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.9999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None
Ubuntu lsb_release -a
:
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.2 LTS
Release: 16.04
Codename: xenial